Friday, March 7, 2008

Naming Conventions Guide

General Naming Conventions

There are a lot of common naming conventions for classes, functions and objects. Usually these are broken into several broad categories: c-style naming, camelCase, and CamelCase. C-style naming separates words in a name using underscores: this_is_an_identifer. There are two forms of camelCase: one that begins with a lowercase letter and then capitalizes the first letter of every ensuing word, and one that capitalizes the first letter of every single word.

One popular convention is that leading capital letter CamelCase is used for the names of structs and classes, while normal camelCase is used for the names of functions and variables (although sometimes variables are written in c-style to make the visual separation between functions and variables more clear).

It can be useful to use prefixes for certain types of data to remind you what they are: for instance, if you have a pointer, prefixing it with "p_" tells you that it's a pointer. If you see an assignment between a variable starting with "p_" and one that doesn't begin with "p_", then you immediately know that something fishy is going on. It can also be useful to use a prefix for global or static variables because each of these has a different behavior than a normal local variable. In the case of global variables, it is especially useful to use a prefix in order to prevent naming collisions with local variables (which can lead to confusion).




Length of identifiers

A fundamental element of all naming conventions are the rules related to identifier length (i.e., the finite number of individual characters allowed in an identifier). Some rules dictate a fixed numerical bound, while others specify less precise heuristics or guidelines.
Identifier length rules are routinely contested in practice, and subject to much debate academically.
Some considerations:
-shorter identifiers may be preferred as more expedient, because they are easier to type
-extremely short identifiers (such as 'i' or 'j') are very difficult to uniquely distinguish using automated search and replace tools
-longer identifiers may be preferred because short identifiers cannot encode enough information or appear too cryptic
-longer identifiers may be disfavored because of visual clutter




Hungarian Notation
The original idea behind Hungarian notation, however, was more general and useful: to create more abstract "types" that describe how the variable is used rather than how the variable is represented. This can be useful for keeping pointers and integers from intermixing, but it can also be a powerful technique for helping to separate concepts that are often used together, but that should not be mixed.



Abbrevations
Abbreviations are dangerous--vowels are useful and can speed up code reading. Resorting to abbreviations can be useful when the name itself is extremely long because names that are too long can be as hard to read as names that are too short. When possible, be consistent about using particular abbreviations, and restrict yourself to using only a small number of them. Common abbreviations include "itr" for "iterator" or "ptr" for pointer. Even names like i, j, and k are perfectly fine for loop counter variables (primarily because they are so common). Bad abbreviations include things like cmptRngFrmRng, which at the savings of only a few letters eliminates a great deal of readability. If you don't like typing long names, look into the auto-complete facilities of your text editor. You should rarely need to type out a full identifier. (In fact, you rarely want to do this: typos can be incredibly hard to spot.)

references:

No comments: