Paul Haahr / Essays / Java Style / Methods and Statements

Methods and Statements

Each method should do one thing

There's no good reason to restrict methods from returning multiple related values when, say, computing both a sine and cosine, except that Java doesn't support that.

A sure sign that a method is designed improperly is that its functionality (as opposed to its implementation) can't be succinctly described as a single action or the computation of a single value. If at all possible, the name of the method should reflect this single action or value as closely as possible.

On the other hand, it may just be the description that's wrong. For example, a description of a method as ``get a slice of bread, put it in the toaster, push down the button, and wait for the bread to pop up'' sounds like too much is going on; changing the description to ``make toast'' makes it clear that there is a single abstract action going on. The first description tells about how the method does its job, not what it does.

Smaller methods are easier to understand

It's probably self-evident, but several small methods, all with good names and documentation comments, are easier to understand than a single large method with the same functionality.

Smaller methods are also usually easy to debug than monolithic methods, because the correctness of a smaller piece of code can be verified more easily than a larger one. Further, method call boundaries are usually easy points to verify that behavior is as expected with the addition of assertions or wrapper-methods which check inputs and outputs.

Using shorter methods also has the advantage that one is more likely to be able to reuse a smaller piece of functionality than a large one.

 
I definitely don't hew strictly to this rule. One method in our compiler is over 1500 lines long, built around a 200-entry switch statement. In this case, I found no better way to structure the program.

My preference is to keep methods to no more than five or ten statements. This is not a hard-and-fast rule, but methods which are fewer than ten lines can usually be understood in a glance, where longer ones will usually require active concentration.

Partition methods to have short parameter lists

Long parameter lists are often an indication of poor separation of concerns -- too much data is flowing between methods as separate values, rather than being encapsulated in objects or within a single method.

In general, having more parameters also increases the burden on callers and makes a method more specific to a given situation, making in unlikely that it can be reused. Again, this indicates that the wrong piece of functionality may have been selected to isolate as a separate method.

When you come across a method with too many parameters, it's useful to consider whether several of the parameters should be consolidated into a single object (especially if the same set of parameters is being used for more than one method) or whether there's an object which already provides accessors for getting at those values.

Declare variables when you're first ready to use them

C programmers usually declare several of variables at the start of a function, and then use them, for a variety of purposes, throughout the function; others will declare variables at the start of the block in which it is needed. These are the only options that C provides. Java (following an innovation from C++) allows declaration anywhere in a block, which introduces a local variable with scope extending to the end of the block. This allow programmers to declare variables when they're first used, with the type, name, and initial value all in one place. This is the right thing, because it allows a reader to focus on all the relevant properties of a local variable at the right time.

In general, if you find that you don't know what value to initialize a variable with, move the declaration later in the method until the point at which you do know its initial value.

A similar rule applies to for loops: if possible, declare and initialize the loop variable in the initialization clause of the for statement. (Note that variables declared in for statements are limited in scope to the for itself in Java, which differs from the original rules in C++, but matches ISO C++.)

Occasionally, the initial value of a local variable can't known at the point it needs to be declared, because the value is computed in all of the the branches of a conditional statement and used after the conditional. In that case, omit the initialization clause and put the declaration at the latest sensible point in the containing block. For example, this is typical case where I don't use an initializer for a local variable:

    DependencyRecorder dependencies;
    if (options.getBooleanValue("dependencies"))
        dependencies = new DependencySet(method.getDefiningClass());
    else
        dependencies = NullDependencyRecorder.RECORDER;
In this case, it would have been fine to use the ternary conditional operator (?:), but I usually prefer if statements when the arms of the conditional are too long to fit on one or two lines. Another option is to extract the conditional statement into its own method and use a call to that method to initialize the local variable.

When I initially wrote this document, there were fourteen cases in the code I had written where local variables were declared without initialization, and all were immediately followed by compound statements which initialized the variables in question: six were followed by if statements, five by switch, and three by try blocks.

Create new variables rather than reassigning old ones

Local variables are useful for providing names for intermediate results. The value of doing so is diminished if a local variable is used to hold several different values with different interpretations during its lifetime. (This does not apply to loop or accumulator variables, which, by their nature, are meant to change throughout their lifetimes, but their meaning should always be the same, relative to the current iteration of a loop.)

Java makes it very easy to introduce new local variables at almost any point in a method. Use this freedom to create locals with good names for all the intermediate results you need to hold on to. (If you find that you want to use the same name for multiple local variables at the same scope level in different phases of method, that may be an indication that it's time to split the method.)

Don't modify parameter variables

This rule is a specific case of the previous rule.

Parameter variables are useful to maintain, with their original values, for the duration of a method. When modifying a parameter (even to ``clean it up'' for use in the method), one introduces a possible confusion between the original value and the modified version. In addition, the original values are often useful for debugging, printing informational messages, or checking results.

As Martin Fowler observes in Refactoring, people often confuse a modified local parameter variable with the value in the caller, which remains unchanged thanks to Java's call-by-value semantics. Since this confusion is so easy to avoid, there's no point in letting it occur.

In other words, just treat all parameters as if they were final. (Actually making them final seems like overkill to me, but I would have been quite happy if Java parameters were implicitly final.)

Treat side-effects cautiously

Side-effects, by which I mean changes to the values stored in fields of objects or elements of arrays, are clearly intended to be used frequently in Java. However, the presence of side-effects can make it harder to reason about a program, because there is invisble state to the side of computations which changes. That means a reader needs to keep track of both the visible aspects of a program and the hidden values off to the side that may change.

In more technical terms, the presence of side-effects breaks ``referential transparency,'' which means that the same expression, given the same input values, should always have the same value. Note that this corresponds closely to the mathematical notion of functions.

 

Unfortunately, multithreading is never as simple as one wants it to be, especially on modern architectures. See the recent work on the Java Memory Model before you believe too much of what I've written.

Restricting the use of side-effects can also make it easier to take a single-threaded computation and break its work among multiple threads, because any shared data structures which undergoes modification requires synchronization in both readers and writers. By leaving objects unchanged, worries about which version of an object is seen by which thread go away.

Typically, a programmer in Java who is trying to avoid side-effects will create new objects rather than modify old ones. Good rule of thumbs are that non-compound statements should have one side-effect each and expressions should rarely have side-effects.

Bertrand Meyer, in Object-Oriented Software Construction, argues that side-effects on visible state of an object (as opposed to side-effects on, say, a cache) should only occur in ``procedures'' (void methods) and never in ``functions'' (methods returning values). While I'm not as strict as Meyer -- I believe methods which return a value popped from a stack or read from an input stream are perfectly reasonable -- I think his style has strong merits.

Programmers who've worked in functional languages for any period of time instinctively gravitate towards a low side-effect style.


Back: Class structure
Next: Errors and Exceptions