Paul Haahr / Essays / Java Style / Typography

Typography

Make your code look like the examples in the JavaSoft books

Rather than descending into a long discussion of what style of indentation and typography looks good, is more readable, etc., I'll state that I believe that there are many possible styles of indentation and code layout which are perfectly readable, but the virtues of uniformity trump all the minor differences about where, for example, opening braces go. Thus I strongly believe that all programmers should use a common typographic style for Java, so code can be freely interchanged.

 

The Java Language Specification is available online.

The obvious source for this common style are the Java language books from JavaSoft. I use The Java Language Specification as my template. The other good choice would be The Java Programming Language; since the two use rather similar styles, it doesn't really matter.

Further, consistent typography of names is important to allow seamless mixing of code from multiple sources, so the naming conventions of the core Java libraries should be followed in other programs.

 

How much of my advocacy is based on the ``good reasons'' I gave above and how much is because I agree with the JavaSoft authors? I believe what I wrote, but the published style does look a lot like my style for C code.

On the other hand, I dislike internally capitalized names. Internal caps have no historical precedent in English typography and grate on my English-trained eyes. But, they are the style for Java. The alternative is chaos.

I would summarize the formatting rules for this style as:

  • Multiple words in identifiers are separated by internalCapitalization.
  • Class and interface names are capitalized.
  • Package, method, field, and local variable names are not capitalized.
  • Constants fields are named with ALL_CAPS and multiple internal words are separated by underscores.
  • Open braces appear at the end of the line that starts the construct (class, method, or statement) which contains the brace-group.
  • Close braces appear on a separate line from the body they contain and are horizontally aligned with beginning of the line containing the open brace. The continuation of a statement after a close brace (with else, while, catch, or finally) appears on the same line as the brace.
  • Multiple semicolon-separated statements never appear on the same line.
  • Put spaces around all infix (binary) operators.
  • No spaces appear between a method name and the open parentheses for its arguments. Spaces never appear between a two sequential open parentheses. In all other cases, whitespace separates whatever precedes an open parentheses from the parenthesized group.
  • No spaces appear after an open parentheses or before a close parentheses.
  • No space appears before a comma. Whitespace always separates a comma from the following token.
  • No space appears before or after the period separating an object from a method or field name.

Write searchable code

It is often useful to be able to search through source code with an editor or other utility. Some code-writing practices can make that easier. In particular, consistent typographic style allows searching for given idioms. For example, when breaking apart long lines of code, do not put a carriage return between the keyword new and the class name. Thus, you can find all places where instances of class BigStuff are created by searching for ``new BigStuff''. (Contributed by Stan Chesnutt.)

 

If one really wanted to throw technology at viewing code, rather than just providing stretchable editor windows, it would make sense to start using proportionally spaced fonts for editing code, and developing program editors that dealt with proportional text well, which includes sensibly mapping tabs and leading spaces to a user notion of tab stops.

Break lines longer than eighty columns

Yes, everyone uses window systems these days. And, yes, everyone can stretch a window if they want to, to make it wide enough to fit your code, at least if they use a small enough font and are not working on a laptop. But wide lines of code -- like wide pages of text -- are hard to follow. For English text, the usual recommendation is to have pages with no more than sixty or seventy characters per line, because the reader's eye begins to get lost as it moves across longer lines.

Eighty columns is, of course, an arbitrary limit tied to prehistoric notions of what a terminal provided. But many people use editor windows exactly that wide because they have a reasonable expectation of being able to fit everything in that width. The simplest way to annoy them is to use very long lines.

When faced with long lines, try to treat line breaks as half-way between spaces and parentheses in terms of grouping related operations. That is, break lines at the highest reasonable level in the parse tree. For example, split lines between looser-binding rather than tighter-binding operators, or at commas in argument lists, rather than in the middle of a single argument.

Use four-space or narrower indentation

For more than ten years, when writing C code, I used the Unix convention of eight-space tabs for indenting code. I tried that for a few months with Java, but gave up. The straw that broke the camel's back was that typical Java method bodies are indented two levels vs. one level for C functions, because one level of indentation is (or should be) consumed by the class definition. In the end, I gave up on the equivalence between eight-space tabs and basic indentation levels, because that's just a silly historical artifact.

The change was a good one. I found it too easy with a large indentation level to have most of my code crammed along the right margin of my window, and was feeling frustrated staying within eighty columns. And my eye bounced around too much from left to right as I was reading programs.

With four-space tabs, I've found that, in practice, 99% of the first line of statements in my compiler start in column 24 or before; that is, fewer than one percent of statements or declarations are indented more than four levels. With eight-space tabs, those statements would start at column 48, leaving fewer than 32 characters for content.

A single column of whitespace is probably insufficient as the basic level of indentation, because it offers too little contrast from one line to the next. But anything from two to four spaces should be clear and reasonable.

Use blank lines to identify related lines of code

Just as text with extremely large paragraphs is hard to read, functions with unseparated blocks of code much longer than ten or twenty lines are typically hard to read. Blank lines can turn an undifferentiated mass into visibly distinct regions, each with its own purpose.

If a body of code consists of several blank-separated regions, it's often a good idea to put a short comment at the beginning of each section to explain its purpose. Or, often even better, split into several separate, well-named, methods

Parentheses always help

I used to know the rules for operator precedence in C. I've forgotten half of them, intentionally. Any time I have to think about operator precedence, I now just insert an extra set of parentheses and move on. In theory, it's possible to over-parenthesize Java code, but I've never seen it happen.


Back: Basic Principles
Next: Naming