RLAI Reinforcement Learning and Artificial Intelligence (RLAI)
Style recommendations for python
by Rich Sutton --Rich Sutton, October 24, 2004

This page is part of a larger discussion of programming style.  The ambition of this page is to opine and discuss elements of programming style when using the python programming language.

Comments and discussion of any sort are very welcome here.



There are many reasonable style conventions in python as in any language, and no one that everybody will agree to.  Nevertheless, it is useful to minimize their number.  One way to do this is to explicitly spell out a particular set of stylistic conventions which could be adhered to by several people.  Here i make a stab at this be explicitly stating the conventions that i'd like to see people, including myself, use more consistently.

Upper/lower case

I like lower case.  Maybe because of my background in Lisp, maybe because upper case feels like shouting, maybe because lower case looks more like prose and less like jargon, or maybe just because i have been typing one-handed a lot lately.

I recommend using lower case whenever possible.  An exception is class names, which by convention always start with a capital letter.  These need to be typed rarely, and it is helpful for them to stand out.  Another exception is letters that would be upper case in english prose such as letters that stand for whole words, as in acronyms.  Thus i might use "MDPs" for an array of Markov decision processes, and "RLinterface" for an interface for reinforcement learning.  Even so, i am likely to use "mdp" and "rli" for individual MDPs and RL interfaces and for dummy variables and function argument.  Another exception is when case is used to indicate two different things.  For example, i might use "S" to denotes a set whose members are denoted "s", or i might use both "n" and "N" for different things, just as is done in mathematics.

A common alternative to lower case is to use camel case, as in DrawObject or DeleteArray, or even CoerceToFloatingPoint.  This is not so bad, but it looks very code like, feels like shouting, leads to long names, and too often has exceptions or ambiguities. Should words like "a" and "to" be capitalized?  Is "mainloop" one word or two? How about "setup"?  Should one-word names be capitalized?

I think it is better to look to the base language for guidance.  Python uses lowercase exclusively for its routine names.  In rare cases python uses underscores ("_") between words of a routine name, but in the vast majority of cases it simply squishes them together, as in "isdigit", "readline", and "getattr".  Most often python will use single words for routine names, often with abbreviations, as in "del".  The use of objects facilitates this, as the meaning of a single word can become more apparent when combined with an object class.  I recommend that we follow the same conventions in our own code.  It means thinking a little harder in designing your names, but produces more readable code.

"Short words are best and the old words when short are best of all."
- Winston Churchill

The other common convention in Python is first word all lower case, the rest capitalized, as in drawObject.  This is my second favorite convention.  If one is using a naming style that favors multi-word names, then this is a good choice.

"self" in class definition code

When defining the methods of a class, one often refers to the instance of object that the method is processing.  Following the convention of C++ (and java?), some python programmers will use the word "self" to refer to the instance being processed.  Suppose we have an object "ship" with attributes "x" and "y".  Inside ship methods, do we refer to "self.x" and "self.y", or "ship.x" and "ship.y".  The latter seems much better to me, though i must admit i have not done a lot of object programming in python.  I have done a lot in lisp, which uses something analogous to the "ship.x" form (which of course may bias me). 

Some python programmers seem to think "self.x" is somehow right, and perhaps even should be enforced, but to me it seems like an empty label, decreasing the ability to understand a piece of code on its own. If a piece of code keeps referring to self then it can only be understood by constant reference to what self means in this piece of code; there is no name there to remind you.

All lines should be less than 80 characters