Sams Teach Yourself Emacs in 24 Hours

ContentsIndex

Hour 9: Regular Expressions

Previous HourNext Hour

Sections in this Hour:

 

Regular Expression Searches

You have now learned everything about matching text with regular expressions, so now look at how the regular expression search mechanisms in Emacs actually work.

There are two functions for regular expression searches in Emacs. An incremental one and an nonincremental one. The incremental one is bound to C-S-s (isearch-forward-regexp), and the nonincremental one is bound to C-S-s RET.

There is nothing special about these two functions except that they use regular expression in the search string. There is, however, one feature in incremental regular expression searches that you must be aware of. When you search incrementally, Emacs moves forward to a location that matches your search. (This is known behavior from ordinary incremental searches). In regular expression incremental searches Emacs can, however, suddenly move back, if there is an earlier location in the buffer which matches the text.

Moving Backward in a Regular Expression Incremental Search

This task shows you how a regular expression incremental search can match some text that is prior to the current match. Follow these steps:

1. Go to the beginning of the buffer and press M-C-s (isearch-forward-regexp). This starts a regular expression incremental search. Now type text. This makes Emacs move forward to the first match of this word (see Figure 9.4).

2. Now continue the regular expression by typing \|. This makes Emacs move back to the beginning of the search, because it concludes that whatever you type after the \| might match text from the beginning (see Figure 9.5).

Figure 9.4
Searching for text makes Emacs move to this location.

3. Finally type ot. Emacs moves to the first word, which matches the regular expression text\|ot (see Figure 9.6). This point is in fact prior to the match text, which it moved to before you typed the alternative character in the regular expression.

Figure 9.5
Appending \| to the regular expression text makes Emacs move backward, because anything can be appended to the string that can match from the beginning of the search.

Figure 9.6
Typing ot makes Emacs search forward for the first occurrences of text or ot. This happens to be before the first location Emacs found in Figure 9.4.

If you press C-s while you are typing text for the regular expression search, Emacs moves to the next location that matches the regular expression typed so far, which is similar to its behavior in ordinary incremental searches. This has the side effect, however, that the point where you type C-s is marked as the start of the search. This affects the matches found if you later press \| (that is, insert an alternative).

An Alternative to Word Search

Hour 7 describes how you can search for words only, making Emacs ignore special symbols such as commas, exclamation marks, and so on. This function also has the feature that it treats any number of spaces, tab spaces, and line breaks as one white space. This had the advantage of enabling you to search for marked-up text that you have a copy of on paper. (That is, you might not know where line breaks are located in the binary version of your document.)

Unfortunately, this solution also has the drawback that it is not incremental. You can search incrementally with a regular expression search if you simply insert \< and \> around the word you are searching for and \W in between.

Yet another alternative is to set the variable search-whitespace-regexp to a regular expression that matches what you would expect to be white space.

This can be obtained by inserting the following into your .emacs file.


(setq search-whitespace-regexp "[ \t\r\n]+")

With the preceding in your .emacs file you can obtain an incremental search which sees any number of white spaces, tab spaces, and line breaks as a simple white space. Thus you can use C-S-s (isearch-forward-regexp) as an ordinary search, with two exceptions: You need to escape the special regular expression symbols, and you do not need to care about how the space which you see in between the words is actually represented in your text.

Tip - If you get so fond of using regular expression search that you use it more often than ordinary search, you can make it more accessible by shifting the meaning of C-s and C-S-s and, likewise, for C-r and C-S-r. Do this by inserting the following into your .emacs file:

;;; shift the meaning of C-s and C-M-s
(global-set-key [(control s)] 'isearch-forward-regexp)
(global-set-key [(control meta s)] 'isearch-forward)
(global-set-key [(control r)] 'isearch-backward-regexp)
(global-set-key [(control meta r)] 'isearch-backward)


Sams Teach Yourself Emacs in 24 Hours

ContentsIndex

Hour 9: Regular Expressions

Previous HourNext Hour

Sections in this Hour: