Copyright (C) 2007 Taylor R. Campbell All rights reserved. Copies and quotations of this document are freely permitted, provided that this copyright notice is preserved in copies and that the author is cited in quotations.
If any text precedes an opening bracket or follows a closing bracket, separate that text from that bracket with a space. Conversely, leave no space after an opening bracket and before following text, or after preceding text and before a closing bracket.
Unacceptable: (foo(bar baz)quux) (foo ( bar baz ) quux) Acceptable: (foo (bar baz) quux)Rationale: This is the same spacing found in standard typography of European text. It is more aesthetically pleasing.
*** Line Separation
Absolutely do *not* place closing brackets on their own lines.
Unacceptable: (define (factorial x) (if (< x 2) 1 (* x (factorial (- x 1 ) ) ) ) ) Acceptable: (define (factorial x) (if (< x 2) 1 (* x (factorial (- x 1)))))Rationale: The parentheses grow lonely if their closing brackets are all kept separated and segregated.
**** Exceptions to the Above Rule Concerning Line Separation
Do not heed this section unless you know what you are doing. Its title does *not* make the unacceptable example above acceptable.
When commenting out fragments of expressions with line comments, it may be necessary to break a line before a sequence of closing brackets:
(define (foo bar) (list (frob bar) (zork bar) ;; (zap bar) ))This is acceptable, but there are other alternatives. In Common Lisp, one can use the read-time conditional syntax, `#+' or `#-', with a feature conditional that is guaranteed to be false or true -- `#+(OR)' or `#-(AND)' --; for example,
(define (foo bar) (list (frob bar) (zork bar) #+(or) (zap bar)))Finally, it is acceptable to break a line immediately after an opening bracket and immediately before a closing bracket for very long lists, especially in files under version control. This eases the maintenance of the lists and clarifies version diffs. Example:
(define colour-names ;Add more colour names to this list! '( blue cerulean green magenta purple red scarlet turquoise ))** Indentation and Alignment
The operator of any form, i.e. the first subform following the opening round bracket, determines the rules for indenting or aligning the remaining forms. Many names in this position indicate special alignment or indentation rules; these are special operators, macros, or procedures that have certain parameter structures.
If the first subform is a non-special name, however, then if the second subform is on the same line, align the starting column of all following subforms with that of the second subform. If the second subform is on the following line, align its starting column with that of the first subform, and do the same for all remaining subforms.
In general, Emacs will indent Lisp code correctly. Run `C-M-q' (indent-sexp) on any code to ensure that it is indented correctly, and configure Emacs so that any non-standard forms are indented appropriately.
Unacceptable: (+ (sqrt -1) (* x y) (+ p q)) (+ (sqrt -1) (* x y) (+ p q)) Acceptable: (+ (sqrt -1) (* x y) (+ p q)) (+ (sqrt -1) (* x y) (+ p q))Rationale: The columnar alignment allows the reader to follow the operands of any operation straightforwardly, simply by scanning downward or upward to match a common column. Indentation dictates structure; confusing indentation is a burden on the reader who wishes to derive structure without matching parentheses manually.
*** Non-Symbol Indentation and Alignment
The above rules are not exhaustive; some cases may arise with strange data in operator positions.
Unfortunately, style varies here from person to person and from editor to editor. Here are some examples of possible ways to indent lists whose operators are lists:
Questionable: ((car x) ;Requires hand indentation. (cdr x) foo) ((car x) (cdr x) ;GNU Emacs foo) Preferable: ((car x) ;Any Emacs (cdr x) foo) ((car x) (cdr x) ;Edwin foo)Rationale: The operands should be aligned, as if it were any other procedure call with a name in the operator position; anything other than this is confusing because it gives some operands greater visual distinction, allowing others to hide from the viewer's sight. For example, the questionable indentation
((car x) (cdr x) foo)can make it hard to see that FOO and (CDR X) are both operands here at the same level. However, GNU Emacs will generate that indentation by default. (Edwin will not.)
If the form in question is meant to be simply a list of literal data, all of the subforms should be aligned to the same column, irrespective of the first subform.
Unacceptable: ("foo" "bar" "baz" "quux" "zot" "mumble" "frotz" "gargle" "mumph") Questionable, but acceptable: (3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2 6 4 3 3 8 3 2 7 9 5 0 2 8 8 4 1 9 7 1 6 9 3 9 9 3) Acceptable: ("foo" "bar" "baz" "quux" "zot" "mumble" "frotz" "gargle" "mumph") ("foo" "bar" "baz" "quux" "zot" "mumble" "frotz" "gargle" "mumph")Rationale: Seldom is the first subform distinguished for any reason, if it is a literal; usually in this case it indicates pure data, not code. Some editors and pretty-printers, however, will indent unacceptably in the example given unless the second subform is on the next line anyway, which is why the last way to write the fragment is usually best.
Naming is subtle and elusive. Bizarrely, it is simultaneously insignificant, because an object is independent of and unaffected by the many names by which we refer to it, and also of supreme importance, because it is what programming -- and, indeed, almost everything that we humans deal with -- is all about. A full discussion of the concept of name lies far outside the scope of this document, and could surely fill not even a book but a library.
Symbolic names are written with English words separated by hyphens. Scheme and Common Lisp both fold the case of names in programs; consequently, camel case is frowned upon, and not merely because it is ugly. Underscores are unacceptable separators except for names that were derived directly from a foreign language without translation.
Unacceptable: XMLHttpRequest foreach append_map Acceptable: xml-http-request for-each append-map*** Funny Characters
There are several different conventions in different Lisps for the use of non-alphanumeric characters in names.
***** Question Marks: Predicates
Affix a question mark to the end of a name for a procedure whose purpose is to ask a question of an object and to yield a boolean answer. Such procedures are called `predicates'. Do not use a question mark if the procedure may return any object other than a boolean.
Examples: pair? procedure? proper-list? Non-examples: member assoc any every
Pronounce the question mark as if it were the isolated letter `p'. For example, to read the fragment (PAIR? OBJECT) aloud, say: `pair-pee object.'
***** Exclamation Marks: Destructive Operations
Affix an exclamation mark to the end of a name for a procedure (or macro) whose primary purpose is to modify an object. Such procedures are called `destructive'.
Examples: set-car! append!
Avoid using the exclamation mark willy nilly for just *any* procedure whose operation involves any kind of mutation or side effect; instead, use the exclamation mark to identify procedures that exist *solely* for the purpose of destructive update (e.g., SET-CAR!), or to distinguish a destructive, or potentially destructive (in the case of linear-update operations such as APPEND!), variant of a procedure of which there also exists a purely functional variant (e.g., APPEND).
Pronounce the exclamation mark as `bang'. For example, to read the fragment (APPEND! LIST TAIL) aloud, say: `append-bang list tail.'
***** Asterisks: Variants, Internal Routines, Mutable Globals
Affix an asterisk to the end of a name to make a variation on a theme of the original name.
Example: let -> let*
Prefer a meaningful name over an asterisk; the asterisk does not explain what variation on the theme the name means.
Affix an asterisk to the beginning of a name to make an internal routine for that name. Again, prefer a meaningful name over an asterisk.
Affix asterisks to the beginning and end of a globally mutable variable. This allows the reader of the program to recognize very easily that it is badly written!
***** `WITH-' and `CALL-WITH-': Dynamic State and Cleanup
Prefix `WITH-' to any procedure that establishes dynamic state and calls a nullary procedure, which should be the last (required) argument. The dynamic state should be established for the extent of the nullary procedure, and should be returned to its original state after that procedure returns.
Examples: with-input-from-file with-output-to-file
Exception: Some systems provide a procedure (WITH-CONTINUATION ), which calls in the given continuation, using that continuation's dynamic state. If returns, it will return to , not to the continuation of the call to WITH-CONTINUATION. This is acceptable, although the name WITHIN-CONTINUATION (which one finds in MIT Scheme) is better.
Prefix `CALL-WITH-' to any procedure that calls a procedure, which should be its last argument, with some arguments, and is either somehow dependent upon the dynamic state or continuation of the program, or will perform some action to clean up data after the procedure argument returns. Generally, `CALL-WITH-' procedures should return the values that the procedure argument returns, after performing the cleaning action.
- CALL-WITH-INPUT-FILE and CALL-WITH-OUTPUT-FILE both accept a pathname and a procedure as an argument, open that pathname (for input or output, respectively), and call the procedure with one argument, a port corresponding with the file named by the given pathname. After the procedure returns, CALL-WITH-INPUT-FILE and CALL-WITH-OUTPUT-FILE close the file that they opened, and return whatever the procedure returned.
- CALL-WITH-CURRENT-CONTINUATION is dependent on the continuation with which it was called, and passes as an argument an escape procedure corresponding with that continuation.
- CALL-WITH-OUTPUT-STRING, a common but non-standard procedure definable in terms of OPEN-OUTPUT-STRING and GET-OUTPUT-STRING from SRFI 6 (Basic String Ports), calls its procedure argument with an output port, and returns a string of all of the output written to that port. Note that it does not return what the procedure argument returns, which is an exception to the above rule.
Generally, the distinction between these two classes of procedures is that `CALL-WITH-...' procedures should not establish fresh dynamic state and instead pass explicit arguments to their procedure arguments, whereas `WITH-...' should do the opposite and establish dynamic state while passing zero arguments to their procedure arguments.
Write heading comments with at least four semicolons; see also the section below titled `Outline Headings'.
Write top-level comments with three semicolons.
Write comments on a particular fragment of code before that fragment and aligned with it, using two semicolons.
Write margin comments with one semicolon.
The only comments in which omission of a space between the semicolon and the text is acceptable are margin comments.
Examples: ;;;; Frob Grovel ;;; This section of code has some important implications: ;;; 1. Foo. ;;; 2. Bar. ;;; 3. Baz. (define (fnord zarquon) ;; If zob, then veeblefitz. (quux zot mumble ;Zibblefrotz. frotz))** General Layout
This section contains rules that the author has found generally helpful in keeping his programs clean and presentable, though they are not especially philosophically interesting.
Contained in the rationale for some of the following rules are references to historical limitations of terminals and printers, which are now considered aging cruft of no further relevance to today's computers. Such references are made only to explain specific measures chosen for some of the rules, such as a limit of eighty columns per line, or sixty-six lines per page. There is a real reason for each of the rules, and this real reason is not intrinsically related to the historical measures, which are mentioned only for the sake of providing some arbitrary measure for the limit.
*** File Length
If a file exceeds five hundred twelve lines, begin to consider splitting it into multiple files. Do not write program files that exceed one thousand twenty-four lines. Write a concise but descriptive title at the top of each file, and include no content in the file that is unrelated to its title.
Rationale: Files that are any larger should generally be factored into smaller parts. (One thousand twenty-four is a nicer number than one thousand.) Identifying the purpose of the file helps to break it into parts if necessary and to ensure that nothing unrelated is included accidentally.
*** Top-Level Form Length
Do not write top-level forms that exceed twenty-one lines, except for top-level forms that serve only the purpose of listing large sets of data. If a procedure exceeds this length, split it apart and give names to its parts. Avoid names formed simply by appending a number to the original procedure's name; give meaningful names to the parts.
Rationale: Top-level forms, especially procedure definitions, that exceed this length usually combine too many concepts under one name. Readers of the code are likely to more easily understand the code if it is composed of separately named parts. Simply appending a number to the original procedure's name can help only the letter of the rule, not the spirit, however, even if the procedure was taken from a standard algorithm description. Using comments to mark the code with its corresponding place in the algorithm's description is acceptable, but the algorithm should be split up in meaningful fragments anyway.
Rationale for the number twenty-one: Twenty-one lines, at a maximum of eighty columns per line, fits in a GNU Emacs instance running in a 24x80 terminal. Although the terminal may have twenty-four lines, three of the lines are occupied by GNU Emacs: one for the menu bar (which the author of this guide never uses, but which occupies a line nevertheless in a vanilla GNU Emacs installation), one for the mode line, and one for the minibuffer's window. The writer of some code may not be limited to such a terminal, but the author of this style guide often finds it helpful to have at least four such terminals or Emacs windows open simultaneously, spread across a twelve-inch laptop screen, to view multiple code fragments.
*** Line Length
Do not write lines that exceed eighty columns, or if possible seventy-two.
Rationale: Following multiple lines that span more columns is difficult for humans, who must remember the line of focus and scan right to left from the end of the previous line to the beginning of the next line; the more columns there are, the harder this is to do. Sticking to a fixed limit helps to improve readability.
Rationale for the numbers eighty and seventy-two: It is true that we have very wide screens these days, and we are no longer limited to eighty-column terminals; however, we ought to exploit our wide screens not by writing long lines, but by viewing multiple fragments of code in parallel, something that the author of this guide does very often. Seventy-two columns leave room for several nested layers of quotation in email messages before the code reaches eighty columns. Also, a fixed column limit yields nicer printed output, especially in conjunction with pagination; see the section `Pagination' below.
*** Blank Lines
Separate each adjacent top-level form with a single blank line (i.e. two line breaks). If two blank lines seem more appropriate, break the page instead. Do not place blank lines in the middle of a procedure body, except to separate internal definitions; if there is a blank line for any other reason, split the top-level form up into multiple ones.
Rationale: More than one blank line is distracting and sloppy. If the two concepts that are separated by multiple blank lines are really so distinct that such a wide separator is warranted, then they are probably better placed on separate pages anyway; see the next section, `Pagination'.
Separate each file into pages of no more than sixty-six lines and no fewer than forty lines with form feeds (ASCII #x0C, or ^L, written in Emacs with `C-q C-l'), on either side of which is a single line break (but not a blank line).
Rationale: Keeping distinct concepts laid out on separate pages helps to keep them straight. This is helpful not only for the writer of the code, but also for the reader. It also allows readers of the code to print it onto paper without fiddling with printer settings to permit pages of more than sixty-six lines (which is the default number for many printers), and pagination also makes the code easier to navigate in Emacs, with the `C-x [' and `C-x ]' keys (`backward-page' and `forward-page', respectively). To avoid excessively small increments of page-by-page navigation, and to avoid wasting paper, each page should generally exceed forty lines.
`C-x l' in Emacs will report the number of lines in the page on which the point lies; this is useful for finding where pagination is necessary.
*** Outline Headings
Use Emacs's Outline Mode to give titles to the pages, and if appropriate a hierarchical structure. Set `outline-regexp' (or `outline-pattern' in Edwin) to "\f\n;;;;+ ", so that each form feed followed by an line break followed by at least four semicolons and a space indicates an outline heading to Emacs. Use four semicolons for the highest level of headings in the hierarchy, and one more for each successively nested level of hierarchy.
Rationale: Not only does this clarify the organization of the code, but readers of the code can then navigate the code's structure with Outline Mode commands such as `C-c C-f', `C-c C-b', `C-c C-u', and `C-c C-d' (forward, backward, up, down, respectively, headings).
When writing a file or module, minimize its dependencies. If there are too many dependencies, consider breaking the module up into several parts, and writing another module that is the sum of the parts and that depends only on the parts, not their dependencies.
Rationale: A fragment of a program with fewer dependencies is less of a burden on the reader's cognition. The reader can more easily understand the fragment in isolation; humans are very good at local analyses, and terrible at global ones.
This section requires an elaborate philosophical discussion which the author is too ill to have the energy to write at this moment.
Compose concise but meaningful names. Do not cheat by abbreviating words or using contractions.
Rationale: Abbreviating words in names does not make them shorter; it only makes them occupy less screen space. The reader still must understand the whole long name. This does not mean, however, that names should necessarily be long; they should be descriptive. Some long names are more descriptive than some short names, but there are also descriptive names that are not long and long names that are not descriptive. Here is an example of a long name that is not descriptive, from SchMUSE, a multi-user simulation environment written in MIT Scheme:
Not only is it long (sixty-four characters) and completely impenetrable, but halfway through its author decided to abbreviate some words as well!
Do not write single-letter variable names. Give local variables meaningful names composed from complete English words.
Rationale: It is tempting to reason that local variables are invisible to other code, so it is OK to be messy with their names. This is faulty reasoning: although the next person to come along and use a library may not care about anything but the top-level definitions that it exports, this is not the only audience of the code. Someone will also want to read the code later on, and if it is full of impenetrably terse variable names without meaning, that someone will have a hard time reading the code.
Give names to intermediate values where their expressions do not adequately describe them.
Rationale: An `expression' is a term that expresses some value. Although a machine needs no higher meaning for this value, and although it should be written to be sufficiently clear for a human to understand what it means, the expression might mean something more than just what it says where it is used. Consequently, it is helpful for humans to see names given to expressions.
Example: A hash table HASH-TABLE maps foos to bars; (HASH-TABLE/GET HASH-TABLE FOO #F) expresses the datum that HASH-TABLE maps FOO to, but that expression gives the reader no hint of any information concerning that datum. (LET ((BAR (HASH-TABLE/GET FOO #F))) ...) gives a helpful name for the reader to understand the code without having to find the definition of HASH-TABLE.
Index variables such as i and j, or variables such as A and D naming the car and cdr of a pair, are acceptable only if they are completely unambiguous in the scope. For example,
(do ((i 0 (+ i 1))) ((= i (vector-length vector))) (frobnicate (vector-ref vector i)))
is acceptable because the scope of i is very clearly limited to a single vector. However, if more vectors are involved, using more index variables such as j and k will obscure the program further.
Avoid functional combinators, or, worse, the point-free (or `point-less') style of code that is popular in the Haskell world. At most, use function composition only where the composition of functions is the crux of the idea being expressed, rather than simply a procedure that happens to be a composition of two others.
Rationale: Tempting as it may be to recognize patterns that can be structured as combinations of functional combinators -- say, `compose this procedure with the projection of the second argument of that other one', or (COMPOSE FOO (PROJECT 2 BAR)) --, the reader of the code must subsequently examine the elaborate structure that has been built up to obscure the underlying purpose. The previous fragment could have been written (LAMBDA (A B) (FOO (BAR B))), which is in fact shorter, and which tells the reader directly what argument is being passed on to what, and what argument is being ignored, without forcing the reader to search for the definitions of FOO and BAR or the call site of the final composition. The explicit fragment contains substantially more information when intermediate values are named, which is very helpful for understanding it and especially for modifying it later on.
The screen space that can be potentially saved by using functional combinators is made up for by the cognitive effort on the part of the reader. The reader should not be asked to search globally for usage sites in order to understand a local fragment. Only if the structure of the composition really is central to the point of the narrative should it be written as such. For example, in a symbolic integrator or differentiator, composition is an important concept, but in most code the structure of the composition is completely irrelevant to the real point of the code.
If a parameter is ignored, give it a meaningful name nevertheless and say that it is ignored; do not simply call it `ignored'.
In Common Lisp, variables can be ignored with (DECLARE (IGNORE ...)). Some Scheme systems have similar declarations, but the portable way to ignore variables is just to write them in a command context, where their values will be discarded, preferably with a comment indicating this purpose:
(define (foo x y z) x z ;ignore (frobnitz y))
Rationale: As with using functional combinators to hide names, avoiding meaningful names for ignored parameters only obscures the purpose of the program. It is helpful for a reader to understand what parameters a procedure is independent of, or if someone wishes to change the procedure later on, it is helpful to know what other parameters are available. If the ignored parameters were named meaninglessly, then these people would be forced to search for call sites of the procedure in order to get a rough idea of what parameters might be passed here.
When naming top-level bindings, assume namespace partitions unless in a context where they are certain to be absent. Do not write explicit namespace prefixes, such as FOO:BAR for an operation BAR in a module FOO, unless the names will be used in a context known not to have any kind of namespace partitions.
Rationale: Explicit namespace prefixes are ugly, and lengthen names without adding much semantic content. Common Lisp has its package system to separate the namespaces of symbols; most Schemes have mechanisms to do so as well, even if the RnRS do not specify any. It is better to write clear names which can be disambiguated if necessary, rather than to write names that assume some kind of disambiguation to be necessary to begin with. Furthermore, explicit namespace prefixes are inadequate to cover name clashes anyway: someone else might choose the same namespace prefix. Relegating this issue to a module system removes it from the content of the program, where it is uninteresting.
Write comments only where the code is incapable of explaining itself. Prefer self-explanatory code over explanatory comments. Avoid `literate programming' like the plague.
Rationale: If the code is often incapable of explaining itself, then perhaps it should be written in a more expressive language. This may mean using a different programming language altogether, or, since we are talking about Lisp, it may mean simply building a combinator language or a macro language for the purpose. `Literate programming' is the logical conclusion of languages incapable of explaining themselves; it is a direct concession of the inexpressiveness of the computer language implementing the program, to the extent that the only way a human can understand the program is by having it rewritten in a human language.
Do not write interface documentation in the comments for the implementation of the interface. Explain the interface at the top of the file if it is a single-file library, or put that documentation in another file altogether. (See the `Documentation' section below if the interface documentation comments grow too large for a file.)
Rationale: A reader who is interested only in the interface really should not need to read through the implementation to pick out its interface; by putting the interface documentation at the top, not only is such a reader's task of identifying the interface made easier, but the implementation code can be more liberally commented without fear of distracting this reader. To a reader who is interested in the implementation as well, the interface is still useful in order to understand what concepts the implementation is implementing.
In this example of a single-file library implementing the skip list data structure, the first page explains the purpose and dependencies of the file (which are useful for anyone who intends to use it, even though dependencies are really implementation details), and the next few pages explain the usage of skip lists as implemented in that file. On the first page of implementation, `Skip List Structure', there are some comments of interest only to a reader who wishes to understand the implementation; the same goes for the rest of the file, none of which must a reader read whose interest is only in the usage of the library.
Avoid block comments (i.e. #| ... |#). Use S-expression comments (`#;' in Scheme, with the expression to comment on the next line; `#+(OR)' or `#-(AND)' in Common Lisp) to comment out whole expressions. Use blocks of line comments for text.
Rationale: Editor support for block comments is weak, because it requires keeping a detailed intermediate parse state of the whole buffer, which most Emacsen do not do. At the very least, #|| ... ||# is better, because most Emacsen will see vertical bars as symbol delimiters, and lose trying to read a very, very long symbol, if they try to parse #| ... |#, whereas they will just see two empty symbols and otherwise innocuous text between them if they try to parse #|| ... ||#. In any case, in Emacs, `M-x comment-region RET', or `M-;' (comment-dwim), is trivial to type.
The only standard comments in Scheme are line comments. There are SRFIs for block comments and S-expression comments, but support for them varies from system to system. Expression comments are not hard for editors to deal with because it is safe not to deal with them at all; however, in Scheme S-expression comments, which are written by prefixing an expression with `#;', the expression to be commented should be placed on the next line. This is because editors that do not deal with them at all may see the semicolon as the start of a line comment, which will throw them off. Expression comments in Common Lisp, however, are always safe.
In Common Lisp, the two read-time conditionals that are guaranteed to ignore any form following them are `#+(OR)' and `#-(AND)'. `#+NIL' is sometimes used in their stead, but, while it may appear to be an obviously false conditional, it actually is not. The feature expressions are read in the KEYWORD package, so NIL is read not as CL:NIL, i.e. the boolean false value, but as :NIL, a keyword symbol whose name happens to be `NIL'. Not only is it not read as the boolean false value, but it has historically been used to indicate a feature that might be enabled -- in JonL White's New Implementation of Lisp! However, the New Implementation of Lisp is rather old these days, and unlikely to matter much...until Alastair Bridgewater writes Nyef's Implementation of Lisp.
On-line references and documentation/manuals are both useful for independent purposes, but there is a very fine distinction between them. Do not generate documentation or manuals automatically from the text of on-line references.
Rationale: /On-line references/ are quick blurbs associated with objects in a running Lisp image, such as documentation strings in Common Lisp or Emacs Lisp. These assume that the reader is familiar with the gist of the surrounding context, but unclear on details; on-line references specify the details of individual objects.
/Documentation/ and /manuals/ are fuller, organized, and cohesive documents that explain the surrounding context to readers who are unfamiliar with it. A reader should be able to pick a manual up and begin reading it at some definite point, perusing it linearly to acquire an understanding of the subject. Although manuals may be dominated by reference sections, they should still have sections that are linearly readable to acquaint the reader with context.