The MPW Shell contains a full-strength, high-speed text editor
with scripting capabilities. It's nothing to write love letters with,
because it's targeted at the ASCII format of compiler source files,
but it provides the power to automate complex and repetitive tasks
in ASCII text. The key to the system lies in a few editing-related
commands, together with its regular expressions and selection
expressions.
In the MPW Shell, any search command can take one of two kinds of arguments. The
first is a plain string, which matches exactly its contents and nothing else, using a
simple character-by-character match. The other is a regular expression, which is a
pattern that can be recognized by a finite state machine. You can't parse programming
languages with regular expressions, but you can use them to recognize many patterns,
including wildcards, repeating sequences, and sets of characters. Regular expressions
are bracketed with either slashes or backslashes, for searching forward or backward
respectively. So, for instance, the regular expression \wombat\ would search
backward from the current location for the string "wombat".
There are about 20 special constructs within regular expressions, all of which are
cryptically described when you execute the command line "Help Patterns" within the
MPW Shell. I'll mention some of the more useful ones here. The wildcard characters
are the question mark (?) and the equivalence symbol (~, Option-X). The question
mark matches any one character except the end of a line, while the equivalence symbol
matches any number of such characters. For instance, /w?mb~t/ would match
"wombat" as well as "wambiklort" and "wymbt", but not "wafkambiliot", nor "wkmb"
at the end of a line. Restricted sets of symbols can be given in brackets; for instance,
you can search for alphanumeric characters with the pattern [a-zA-Z0-9]. The
reverse of a set can be specified with the "not" symbol (~, Option-L); for instance,
/[~a-z]/ finds any character except a lowercase letter. The start of a line can be
specified with the bullet symbol (*, Option-8) and the end of a line with the infinity
symbol ([[infinity]], Option-5).
These keyboard shortcuts are for American QWERTY keyboards. Other
keyboards have different layouts. For instance, on a direct neural interface
keyboard, think "blue wildebeest" and raise your right ear to type the bullet
symbol.*
Repeating patterns can be specified in three ways. Following any pattern with a plus
sign (+) means one or more instances of that pattern; for instance, the regular
expression /[0-9]+/ would match any sequence of digits. An optional repeating
pattern can be similarly specified with an asterisk (*), which means zero or more
repetitions. The rarely seen double angle brackets can be used to specify exactly how
many repetitions of a pattern are allowed. They're typed as Option-backslash (<<) and
Option-Shift-backslash (>>) and enclose a single number to mean exactly that many
repetitions, or two numbers separated by a comma to specify a minimum and
maximum number of repetitions, or a single number followed by a comma to mean at
least that many repetitions. For instance, the pattern /[a-zA-Z]<<3,7>>/ would find
all strings composed of alphabetical characters and from three to seven letters long.
There are a number of ways of "escaping" special characters when you want to look for
something that has special meaning within regular expressions, such as a question
mark or plus sign. You can escape any character with the lowercase delta
([[partialdiff]], Option-D), or use single or double quotes to escape strings. To find
the string "wombat+", for instance, you'd need to escape the plus sign:
/wombat[[partialdiff]]+/.
Finally, one of the most useful constructs consists of a tagged regular expression. This
allows you to associate a number between 0 and 9 with a pattern that's matched,
referring to it later with the "registered" symbol (reg., Option-R) followed by a digit.
This is very handy when you're doing replacements. For instance, you can replace any
angle-bracketed string with a parenthesized string with the following command, which
would turn "<wombat>" into "(wombat)":
Replace /<([~<>]*)reg.1>/ (reg.1)
This searches for any number of characters (except angle brackets) that are between
angle brackets, assigns them the number 1, and then replaces the angle brackets with
parentheses. Note that the syntax of tagged patterns requires the pattern to be
parenthesized.
Many editing commands (such as Replace) can take selection expressions as well as
regular expressions. Selection expressions provide more ways to select text than the
string matching provided by regular expressions. Common selection expressions
include the following:
The above expressions require no special delimiters (they're not directional like
regular expressions). Regular expressions are actually a kind of selection expression
and are delimited by slash or backslash characters as usual.
Some character-skipping variants of these options are also provided, such as the
position that's one character after the selection, denoted by following a selection
expression with an uppercase delta ([[Delta]], Option-J). These are useful in dealing
with context; for instance, you may want to select a string when it's followed by
another character, but not include the following character in the selection. (An
example is given later in the Subword script.) Text emitted by a program like a table
generator may be in a known format, such as a columnar arrangement, in which case
skipping a certain number of characters will take you to the selection you need.
Again, the MPW Shell will give you a terse summary of selection expressions when
you execute the command line "Help Selections". I'm not going to list all the minor
variants here, but feel free to while away the hours in rapturous contemplation of
their mysteries on your own.
The most common editing commands are two that you probably use already: Find and
Replace. Dialogs that stand in for these commands are built into the MPW Shell and
accessible from the Find menu. You can give any selection expression as a search
pattern in either of these dialogs by clicking the Selection Expression radio button
instead of the default Literal button. The same commands are the basis of most editing
scripts. As tools, Find and Replace take a selection expression as their primary
argument. Don't confuse Find and Search! The Search command puts out its results as
text, while Find actually changes the selection. In addition, Search takes a pattern --
that is, a regular expression -- while Find takes any selection expression. For
example, to go to the start of a file in a script, you could give the command "Find*",
but not "Search *".
Find is the basic navigation command in most editing scripts. For instance, you can
simulate the Select All command in the Edit menu like so:
Find *:[[infinity]] # select from start to end of target
The commands File and Open, along with the variables Target and Active, determine the
files your scripts will work on. "File" is actually an alias for the real command name,
Target. The File command opens a file and makes it the target window -- the window
behind the frontmost window. The target window is an important notion in MPW. It
exists so that you can use the Worksheet window to type commands that affect another
window; since the Worksheet would be in front, the window being affected would need
to be behind the Worksheet. During scripting, you may prefer to use the Open
command, which opens a file and makes it the frontmost window. The target window is
referred to as {Target} in scripts, while the frontmost window is called {Active}.
Editing commands work on the target window if you don't specify a window explicitly.
The Line command may also be used for navigation: it selects the numbered line in the
target window and then brings that window to the front. You probably know this
command already if you use compilers in the MPW Shell, since they put out error
messages in this form:
File "gwork.c"; Line 418 # Syntax error
Executing this command takes you to the line in your code where the error was
detected.
The Position command returns the current position in the target window, as a line
number, a character range, or both. The position could be saved to a variable for later
use as follows, using the backquote mechanism to execute a command and insert its
output inline:
Set SavedLineNumber `Position -l`
There are dozens of commands pertaining to text editing in the MPW scripting
language. Help on all of them is available in the MPW Shell. The usual Macintosh
text-editing menu commands are available in the MPW scripting language, including
New, Open, Close, Save, Revert, Print, and the standard Edit menu commands.
StreamEdit is a standalone editing tool that's rich and strange enough to deserve its own
co-->umn. It's a structured search and replacement language based on the UNIXreg.
command sed.
Some simpler standalone editing tools are provided. Sort has a rich function set and can
be used for many text-editing tasks. Canon takes a file of search and replace strings
and applies them to a file. It's used to automate terminology changes, such as the work
that was done to make the Mac OS API use fewer acronyms and abbreviations when the
new Inside Macintosh books were written. Translate, like the UNIX command tr, maps
characters onto other characters.
Text indentation can be handled with four tools: Adjust, Align, Entab, and Format.
Adjust shifts a line to the right or left by a specified number of spaces. Align sets the
margin of a range of selected lines to the margin of the first selected line. Entab
converts runs of spaces to tabs, and Format sets the column width used for tabs in a
text document, as well as other settings like font and size. (These settings are saved in
a resource in the file, which many ASCII text editors can recognize.)
Text-editing scripts often create temporary files, split single files into multiple files,
and perform other file-related tasks. MPW provides commands to help you manage
files. It has commands corresponding to almost all Finder operations, such as
Duplicate, Move, Delete, and NewFolder. There are also some specialized file
commands: FileDiv splits a file into multiple files based on a byte or line count or on
embedded form feed characters inserted during a previous editing pass; Catenate does
the opposite, joining files together.
A text-editing script often takes search and substitution text as parameters on the
command line. A few commands related to parameters are worth a quick mention here.
Echo is handy for concatenating parameters with other text. Quote is similar to Echo
but adds quote marks as needed to preserve the word breaks in its parameters. MPW
scripting requires quotes around any string that is meant to be a single parameter but
contains spaces (which would break the string into multiple parameters). Echo puts
out its arguments in a way that allows them to be broken up, while Quote preserves the
original word breaks by inserting quotes.
Echo "Richard Loves Pat" Richard Loves Pat Quote "Bill Loves Everyone" 'Bill Loves Everyone'
Here's a script I've found useful for some years. It's called Subword and it replaces a
word by another string everywhere it occurs in the target window.
Set Sep "[~a-zA-Z_0-9]" # word separators
Find * "{Target}" # start at top of file
Replace -c [[infinity]] [[partialdiff]]
"[[Delta]]/{Sep}{1}{Sep}/!1:[[Delta]]/{Sep}/" [[partialdiff]]
"{2}" "{Target}"
The selection in this Replace command is probably about as clear as the U.S. tax code,
so allow me to explain. The [[Delta]] means one character before the selection. The !1
means one character past the selection. The colon denotes everything between the
selections (inclusively). So this pattern says, in a nutshell, select the pattern in the
first parameter ({1}) when it's bracketed by separators, but exclude the separators.
Normally I don't use this script directly. I incorporate it into other scripts as a
utility. The bulk of the work of converting between similar languages like Pascal and C
can be done by an editing script, for example. Subword can be used to convert
keywords, as could Canon. I use another script which is essentially Subword without
the separators for changing symbols like equality operators.
Scripts to preconvert between Pascal and C can be found on this issue's CD. They don't
generate compiler-ready text, but I've found that they facilitate a manual conversion
at the rate of hundreds of lines per hour, allowing source bases in the thousands of
lines to be accurately translated in a day or three. So the next time you're faced with a
dull text-processing task, look over the tools MPW gives you, and see whether you can
save yourself a few days of tedious manual labor!
TIM MARONEY recently changed his Apple badge color from green to white: he's gone
from contract programming to a technical leadership role developing user interface
software. Tim entertains himself in a variety of ways, such as straining his surgically
altered eyeballs on the small print of obscure footnotes and collectible trading card
games, and contorting his limbs in yogic asanas. He designed the iron crystal that now
resides at the core of the earth and contributed significant ideas to the original (now
obsolete) implementation of Planck-scale gravitational phenomena in the universe.*
Thanks to Dave Evans, Scott Fraser, Arno Gourdol, and Alex McKale for reviewing this
column.*