Hour 18: Editing C, C++, and Java Files: Automatic Indentation

Sams Teach Yourself Emacs in 24 Hours

Hour 18: Editing C, C++, and Java Files

Sections in this Hour:
Advanced C-Based Language Editing		File and Tag Browsing
Automatic Indentation		Summary
Navigating C Preprocessor Directives		Q&A
Viewing Code with Expanded Macros		Exercises

Automatic Indentation

One of the most powerful features of Emacs while editing C-based languages is its capability to indent code by pressing Tab. This can be the most distracting behavior for new Emacs users who have become accustomed to other editors, but it is, by far, a huge time saver in the long run. You will find that complete control over indentation is allowed so if the default style is unappealing or incorrect for your situation, it can be adjusted to suit your needs.

To start with, load up a file in C, C++, Objective C, or Java. Start at the beginning of a function or other element and press Tab. Continue pressing Tab on each line afterward. You will find that not only does it not matter where the cursor is on the line, but the line chooses a location and stays there instead of continually adding more spaces before the text. Try changing the indentation of the line by deleting or adding spaces at the beginning of the line, and press Tab again. The line shifts back to its desired location.

Tip - The first concern with automatic indentation is how to insert a Tab character when Tab is behaving in a nontraditional fashion. You can insert a Tab at any time by pressing C-q first, which will quote the next keystroke in directly. If you are entering a string literal, you should always use the \t character modifier instead of a literal Tab character.

I hope that you have now decided that this is the one true way of indenting and you're ready to move on to some serious code formatting. A common situation would be to wrap a block of code in a conditional, which requires indenting all lines that are wrapped. Although that is certainly simple to do with the Tab key, it is more appropriate to use the indent region command C-M-\ instead. The indent region command is quite handy with the specialized mark commands such as C-M-SPC (mark syntactic expression), C-M-h (mark function), or C-x h (mark whole buffer).

Caution - On some Intel-based PCs, Control-Alt-Spacebar is the default keybinding for power-saving mode. This can be annoying at best; on some systems, this causes a no-questions-asked reboot which is no fun at all.

There are additional bindings available for indenting blocks of code without setting the mark. For example, if you want to reindent the contents of an if block, you can place the cursor before the opening { and press C-M-q, which indents the expression following the cursor. You can also use C-c C-q to indent all the lines in the current function.

So look at how you can handle your example. In typical Emacs fashion there are several ways to do it. To type this in logically, you'd probably start at the beginning of the code you need to conditionalize, and type


if(condition) {

You'd then want to go to the end of the code, and add in a } on a line by itself. This leaves all the old code indented incorrectly.

It would certainly be easy to use C-c C-q to reindent the entire function you are in, but that could take a long time for very long functions. It would also have been wise to leave a mark with C-SPC after typing the opening {, but you might have forgotten. You could always drop a mark, use backward syntactic expression C-M-b, and use the indent region command. Of course, if you moved back to the beginning, you could use the indent syntactic expression command C-M-q. As it turns out, the fastest way to do this without moving the cursor is to use the mark syntactic expression command with a negative argument and use indent region. As you learned earlier, C-u - and C-M-- are roughly equivalent in providing a negative argument. Next, C-M-SPC marks the block of code you entered. Now you can use indent region with C-M-\ and everything is as you want it.

Tip - If you want to force very large amounts of code to be indented in a known amount, the rigid indentation function can be much faster. Try C-x Tab on a region with and without a universal argument to see how this works.

Indenting with Style

Despite the rigid indentation enforced while editing C-based language files, Emacs is not as tyrannical as one might think. If the default GNU editing style is either not to your liking or wrong due to corporate coding standards, you can change it to your heart's content.

First, you might want to try out some of the built-in styles. The set style command is available with C-c . or with the extended command M-x c-set-style. The most common styles are GNU, K&R, and, of course, the Java style. Each of these styles controls not only how individual commands are indented, but also how braces, switch statements, and other elements are handled.

Chances are that your preferences are covered in the several styles available by default. If your corporate environment demands a stringent pattern, however, you might need to create your own indentation style. This requires some work in your .emacs initialization file. Please refer to Hour 24, "Installing Emacs Add-Ons," for details on writing Lisp if the upcoming section is unclear.

The easiest way to create your own style is to find a style that is as close to what you want as you can find. This becomes the parent indentation style and minimizes the amount of work you need to do. Next, track those elements you don't like for reference. Are the braces spaced incorrectly between the opening command, and enclosing statements? Are your K&R function headers indented too much? Emacs wants to know.

Next, prepare to edit your configuration file found in ~/.emacs. The command you add will look something like this:


(require 'cc-mode) 
(c-add-style "my-style" 
         '( "parent style" 
            ( variable . value ) 
            ( c-offsets-alist . 
            ( ( offset-name-1 . Offset ) 
              ( offset-name-2 . Offset ) ) ) 
          ) )

The text in the string parent style represents the style you found earlier while experimenting with the different indentation types. Everything not specified in the following section will be derived from this parent style saving you the time needed to enter it into your .emacs file.

All the elements in the list following the parent style is called an alist, which is short for association list. Each element of an alist is a dotted pair , meaning that it is a list with two elements of the form (KEY . DATA). KEY is some name used as reference, and DATA is the data associated with it. The key element of each association is a variable name, and its data is the value it will take. The variable c-offsets-alist is an alist describing how the language should be indented. The key is a symbol describing the syntactic content of the current C line. The data element of each association is how much to indent when this item is found.

Some common variables to set in your style are the following:

c-basic-offset--The smallest unit of indentation in your style

c-backslash-column--When adding \ characters to long macros, the desired column

c-offsets-alist--The alist of symbols representing a C-based language's syntax, and the offsets associated with each

Caution - All variables available in a style specification are also available as generic variables. This means you can use custom on them, or modify them directly in your .emacs file. Any such customization is overridden by the style you create.

The most important of these variables is most likely the c-offsets-alist. Each key for the C indent engine can be listed here, giving the ultimate control of the indentation engine. Some common syntactic elements you might want to change are the following:

statement--A generic C command statement

statement-cont--Continuation of a C statement that is longer than a single line

block-open--When a code block unrelated to a control statement is started

substatement-open--A code block below a control statement such as if or for

There are many more elements which you can access through Emacs's internal documentation for the variable c-offsets-alist. You can reference this by using the help command C-h v c-offsets-alist RET.

Tip - All variables listed in a style can be accessed via Emacs's built-in help using C-h v, and the names of all the useful ones start with C-. making it easy to find what you need.

The data element for each syntax type you specify can be a number, representing the number of characters of indentation desired. If you want this element to line up under the previous element, a 0 is appropriate. You can also specify data as the symbol + or -, which specify one increment or decrement of c-basic-offset. ++ and -- mean that many increments of c-basic-offset. * and / mean one half of c-basic-offset.

If you need some syntactic element to always start at the first column, such as macros, an offset isn't appropriate because you want to use a constant. Unfortunately, the method of bringing text to the first column requires putting in a very large negative number. This effectively backs up the indentation until it cannot get any smaller.

If you are feeling exceptionally brave after finishing this book, you can also set the data element of a statement to a function. This function is described with Emacs's documentation for the variable c-offsets-alist. Such a function can provide near infinite control over all permutations of a given syntactic element. The default comment indentation data are functions, which is how such great comment indentations are made available.

Now that you have defined your own indentation style mode, you should apply it to all files that you edit so that you do not have to use the set style command C-c . every time. To do this, set the variable c-default-style to the string my-style in your .emacs file. The code would look like this:


(setq c-default-style "my-style")

Proactive Editing

The C-based language editing mode provides some additional editing modes beyond most other language editing modes. These include

Auto new line

Hungry delete

Each of these can be toggled easily or have their behaviors modified slightly. What both of these modes do is increase the amount of stuff Emacs does in certain situations. Auto new line automatically adds extra white space in certain situations, and hungry delete performs extra white space deletions.

Auto new line mode is toggled with the key sequence C-c C-a. When active, the major-mode string changes to include /a. Thus, if you are editing C, it reads C/a; if you are editing Java, it says Java/a. It could say /ha instead, and this means that auto new line mode is on at the same time as hungry delete mode.

When active, auto new line changes the behavior of the brace keys, semicolon, and comma. Whenever one of these is entered, Emacs adds a new line after it when appropriate. You can change the occurrences when this happens by examining the variable c-hanging-semi&comma-criteria. To do this you must be prepared to write a function that analyzes the current location and returns 'stop to prevent a new line from being added or a non-nil value to cause a new line to appear. Please see Hour 24 for additional details on how to accomplish this.

Hungry delete takes the opposite approach. Where auto new line adds extra carriage returns, hungry delete removes as much white space as possible. To activate hungry delete, you can toggle it with C-c C-d, and a /h or /ha should be added to the mode description. To use hungry delete when it is active, all you have to do is press the Backspace key, and hungry delete remove as much white space as possible. White space deleted includes spaces, tab characters, and new lines. Using hungry delete could prove distracting to new users, but it can prove valuable in some situations where there is lots of pesky white space.

Much of what hungry delete does can be replicated through use of a few other built-in commands. For completeness, you can also use the function just-one-space, which is bound to M-SPC, and delete-blank-lines which is bound to C-x C-o. Unlike these functions, however, hungry delete remove all white spaces of both types.

These two modes complement each other well, so you can toggle them to both come on at the same time with C-c C-t command. When both are active, the mode name will have the string /ha appended.

If you would like these modes to be automatically activated whenever you enter a file containing a C-based language, you will need to update your .emacs file. To modify what happens when a C language file is loaded, write a function that turns on these modes. The functions to use are c-toggle-auto-state, c-toggle-hungry-state, and c-toggle-auto-hungry-state. Each of these functions takes a single numeric argument. To force the mode on, a 1 is used, and to force it off, a -1 is used; otherwise the mode is toggled.

You can do all this with a command in your .emacs file that looks like this:


(add-hook 'c-mode-hook (lambda () (c-toggle-auto-hungry-state 1)))

Two packages are available that perform tasks similar to hungry delete, but for any language mode. One is called greedy-delete.el, and the other is tinyeat.el. Please see those files for details on their use.

Comment Acceleration

Emacs has several advanced commands that are useful when working with comments. You might have already noticed that comments in C are indented quite cleverly and wonder what more there could be. Emacs can manage the following comment basics for you:

Text filling

Creating new comments

Removing comments

When working with large amounts of text in a comment, the built-in fill command seems a good choice, but there is actually a special fill comment command in the C menu which rearranges all the text in a comment so that it neatly fills the space available. This is very much like the fill paragraph command discussed earlier in Hour 12, "Visible Editing Utilities," but is designed especially for comments. If you do not have access to the menubar, you can invoke this function using the extended command c-fill-paragraph like this M-x c-fill-paragraph RET.

Emacs is also expert at adding new comments. There are two methods for doing this. The first is the indent-for-comment command which is bound to M-;. This is specially designed for dropping a new comment on the right side of the code. Not only does it move the cursor to a specified column and insert comment characters, it also takes an existing comment and moves it to the correct location. Try it out in some code to get the feel for what it does. Notice that if you use this command on a line with a lone closing brace, the comment appears directly after the closing brace. If you use this command on a line containing code, it moves out to some column regardless of the code line length.

When indenting comments for code, the location where Emacs places the comment is called the comment column. You can modify this by moving the cursor to the column you want, and using the set comment column command bound to C-x ;. The comment column is similar in nature to the fill column, but is specifically for your comments.

You can set the comment column in your .emacs file with a command like this:


(setq-default comment-column 70)

Here 70 is the column number for your comments.

Tip - Emacs defaults to a fill column of 70. Classically, printers, terminals, news media, and other things UNIX default to an 80-column area for fixed width fonts. Although Emacs lets you do whatever you want, it is advisable for readability purposes to stick within these bounds.

Emacs also has some comment commands for dealing with regions of code. The comment region command works in all language modes and has some clever properties. The need to comment out a region of code in C might seem unnecessary with preprocessor directives, but Emacs makes it easy: Mark the region you need commented out and use the comment region command bound to C-c C-c. Every line in C gets surrounded with its own /* and */ characters. In C++ and Java, each line is started with the // comment start.

If you want to remove the comment characters from your code in the future, you are in luck. Instead of having to remove all those characters yourself, you can pass the universal argument (with C-u) to the comment region command with C-u C-c C-cSand have all those comment characters removed.

Sams Teach Yourself Emacs in 24 Hours

Contents Index

Hour 18: Editing C, C++, and Java Files

Previous Hour Next Hour

Sections in this Hour:
Advanced C-Based Language Editing		File and Tag Browsing
Automatic Indentation		Summary
Navigating C Preprocessor Directives		Q&A
Viewing Code with Expanded Macros		Exercises