ComputorEdge

Introduction to Regular Expressions (RegEx) in AutoHotkey


By Jack Dunning

Regular Expressions (commonly called RegEx or RegExp) in AutoHotkey is not a beginning level script writing topic and there certainly is nothing regular about Regular Expressions. I've spent a number of months exploring the programming tool and have developed a healthy respect for its flexibility and power. Many (including myself) have avoided using RegEx due to its enigmatic code which at times appears almost incomprehensible. It's not like normal program code with 
If-Then-Else statements and Loops. Writing a RegEx is not merely a matter of following a logical sequence. It often requires a non-linear look at the problem. I've found that what helps me most is the analogy I picture in my brain pan. That image gives me a basis for what a RegEx is trying to do. ("Try" is a good word when describing RegExs. Whereas the usual programming either works or doesn't work, RegEx "tries" to find pattern matches. If none are found, it moves on.)

Regular Expression TrainWikipedia describes a Regular Expression as "a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. 'find and replace'-like operations." I would describe RegEx as a data mining machine. RegEx is like a train rolling down a track of computer characters looking for patterns which match a specific set of given parameters. If it finds characters which match the pattern set, it grabs them and puts them on board the train.

RegEx AutoHotkeyAs the RegEx train runs down the line, it continues picking up characters—as long as they fit the written instruction set. Some groups of characters may be saved for later reuse (backreferences). At times RegEx may look back at previous characters for validation or forward to coming data for confirmation (backward and forward assertions—see Chapter Twelve of A Beginner's Guide to Using Regular Expressions in AutoHotkey). While a particular RegEx may be forgiving in what it will accept on board, if the pattern does not completely match the given set of criteria, the entire group (including all previously collected characters) is kicked off the train and RegEx continues rolling along looking for the another possible set of matching characters. This continues until it either hits the ends of the line or finds a complete solution to its data schedule. Then RegEx stops. The RegEx data mining machine can be started up again by placing it in a Loop which restarts the same search from a point just beyond its current solution.

This data mining train is the image I visualize when working with a RegEx. The key to understanding RegEx is knowing what the conductor is trying to do when it interprets the special symbols in a RegEx set of instructions to bring the right character passengers on board. It took me a while to comprehend that the primary purpose of the AutoHotkey RegEx functions are data extractions, RegExMatch(), or data correction, RegExReplace() as discussed in Chapter Three of A Beginner's Guide to Using Regular Expressions in AutoHotkey.

Practical Uses for RegEx in AutoHotkey

Maybe the most important question is, "If Regular Expressions can be so confusing, why bother?" Often when doing simple text searches or replacements it's quicker and easier to use functions built into a scripting language. RegEx may be adding needless complication. However, a RegEx might do with one expression what takes several lines of code when using those other functions. It may take slightly longer to complete (a few more microseconds), but the added flexibility could make the seemingly impossible a reality. RegEx has more power and flexibility than a standard search and/or replace.

IP Location ListFor example, IP addresses are many and varied—although they all conform to the same pattern. Each IP consists of four numbers (one to three digits long and between zero and 255) separated by a dot. With the proper RegEx, the engine can search through a document pulling out only the IP addresses. Then those extracted addresses can be used to find where the IP is located. AutoHotkey with RegEx creates a Web IP lookup app finding extracted IP address locations throughout the world. (This example and many more of the following examples using RegEx in AutoHotkey are discussed in the e-book A Beginner's Guide to Using Regular Expressions in AutoHotkey.)

Another use for a RegEx may be to find duplicate words in a document. This can be done with other functions, but it would take a few lines of code with conditionals (If-Then), whereas only one RegEx line is needed. How about swapping the first and last words in selected text?

Maybe you want to strip all of the HTML code out of a Web page leaving only the text? Or, possibly you need to extract a list of all of the Web links found in a Web page? Regular Expressions are the best way to ensure that a properly formatted, valid e-mail address is entered into a data field.

Pulling the numbers out of alphanumeric data is relatively simple with RegEx. Or, maybe a key symbol (escape character) needs to be inserted in front of (or behind) each in a group of special characters.

If it's a pattern you need to locate (and possibly manipulate) in your haystack of data, then RegEx may be your best bet for finding that needle. This may be all the incentive you need to explore the mysteries of Regular Expressions.

The Mechanics of RegEx in AutoHotkey

Jack's AutoHotkey Blog

Critical to using a RegEx is understanding how it works. RegEx is a system for finding matches within strings (text) which may be file names, variables, or the contents of a file. (Such a search string is called the "Haystack" in the online documentation for the AutoHotkey Regex functions, while the search expression is called the "NeedleRegEx"—as in "needle in a haystack.").

Knowing what a RegEx will actually do depends on how well you understand what it is trying to do. A RegEx starts at the beginning of a string and looks at each character one by one until if finds a match for the entire expression. If it finds a match (NeedleRegEx) it stops looking, otherwise it continues until it reaches the end of the input string (Haystack). In AutoHotkey the function used for matching a RegEx is RegExMatch() which returns the numeric location of the first character of first occurrence of a match. (A numeric location is found by counting the number of characters from the beginning of the Haystack to the first character in the NeedleRegEx.)

For example, in its simplest form the NeedleRegEx might be a lowercase a (or any other letter, number, or character). The RegEx engine will search the Haystack looking for an a. If found, it stops and returns the location of the letter:

FoundPos := RegExMatch(Haystack, "a")
FoundPos is the location of the first occurrence and Haystack is the input string. Note that the RegEx itself (a the needle we want to find) appears within double quotes. If Haystack is "the quick brown fox jumped over the lazy dog" the a in "lazy" is found at position number 38 (FoundPos) (or the 38th character in the string including spaces). If there is no a in Haystack, the needle is not matched andFoundPos returns 0 (zero).

To make the RegEx slightly more complicated we add another character to our RegEx:

FoundPos := RegExMatch(Haystack, "ab")
Now our needle in the Haystack is the ab letter combination. RegEx will again look for the letter auntil it finds a match. Only then will it look at the letter "b" for a match of the next character. If there is no following "b" then it drops everything and continues looking for the next "a" again. For example, ifHaystack is "Abby has always been absent from the abbey", then FoundPos is 22.

What? That FoundPos coincides with the "ab" in "absent", not the first "ab" in "Abby." This brings us to an important concept—RegEx is case sensitive. If you want to find a capital letter, it better be capitalized in the RegEx. The word "Abby" in the haystack is skipped as a match because the "A" is uppercase while the needle is "a" lowercase.

Note: There is an option to make the RegEx case insensitive, but that will be left for another chapter. That's the problem with RegEx. There are so many possibilities and options that it's easy to get confused.

As RegEx moves through the Haystack it stops at each letter "a", then checks for a letter "b" immediately following it, but none are found until reaching the word "absent" starting at position 22. Having found a complete match, RegExMatch() stops.

This is the essence of how RegEx works. If more characters are added to the expression'sNeedleRegEx, then more is required to find a match. However, in the problem of validating numbers (for example in the Calorie Count app originally discussed in the book AutoHotkey Applications) the digits can be any numbers, but no letters.

Using Ranges in RegEx Matches

The simple way to match any number in the RegEx is to give a range of options. This is done by enclosing all the optional characters within square brackets […]. For example, placing all the vowels within square brackets makes each one a possible match:

FoundPos := RegExMatch(Haystack, "c[aeiou]t")
This function would return FoundPos for "cat", "cot", or "cut"—whichever one is found first. Proceeding through the Haystack, the RegEx engine stops at each occurrence of the letter "c", then tries to match the next character with either "a", "e", "i", "o", or "u", but no other character. If one of those options is not found, the search continues looking for another "c" character. If found, the vowels are checked again. If there is a match, the third character is checked to see if it is the "t" character. If yes, the RegEx engine stops searching and returns the location of the "c" character. If no, it continues moving down the Haystack until it either finds a complete match or reaches the end of the line.

In our situation we want to use the numeric digits [0123456789]. (The order of the digits inside of the square brackets doesn't matter.) If we wanted to match two digits in a row then [0123456789][0123456789] would do the job. The problem is that we don't know how many digits in a row we need to match. It could be one, two, three or more—at least theoretically. At those times when you don't know how many characters will occur in a row, rather than repeating the range for each matching character, adding the plus + sign after the range (or character) will do the job:

FoundPos := RegExMatch(Haystack, "[0123456789]+")
This RegEx search function will match one or more digits in a row until a non-digit is encountered—returning the location of the first digit in FoundPos.

Tip: Ranges of numbers or letters can be shortened by using a hyphen. For example, [0-9] is the same as [0123456789]. [A-Z] is the same as all capital letters while [a-z] is all lowercase letters. All letters and digits can be represented by [a-zA-Z0-9]. To shorten the expression for the numeric digit range even more use \d in place of [0-9]. Our shortened function becomes:

FoundPos := RegExMatch(Haystack, "\d+")
This will match one or more numeric digits in a row.

There are many more symbols and operators used in Regular Expression. For a overview see this AutoHotkey RegEx Quick Reference. This is a simplified introduction. To truly understand how to use Regular Expressions there are numerous online tutorials, but there is no substitute for doing it yourself.

For a more detailed example of how AutoHotkey Regular Expressions can solve difficult search-and-replace problems, see "A Perfect Place to Use an AutoHotkey Regular Expression (RegEx in Text Replacement)."

A Beginner's Guide to Using Regular Expressions in AutoHotkey: Exploring the Mysteries of RegEx

RegEx AutoHotkeyThis Beginner's Guide to Using Regular Expressions in AutoHotkey is not a beginning level AutoHotkey book, but an introduction to using Regular Expressions in AutoHotkey (or most other programming languages). To get the most from this book you should already have a basic understanding of AutoHotkey (or another programming language). Regular Expressions (RegEx) are a powerful way to search and alter documents without the limitations of most of the standard matching functions. At first, the use of RegEx can be confusing and mysterious. This book clears up the confusion with easy analogies for understanding how RegEx works and examples of practical AutoHotkey applications. "Regular Expressions in AutoHotkey" will take you to the next level in AutoHotkey scripting while adding more flexibility and power to your Windows apps. (This book is also available at Amazon.com)


For More Information

If you're interested in testing AutoHotkey to see if it might be right for you, then go to "Installing AutoHotkey and Writing Your First Script." This page shows you how to get up and running with AutoHotkey, plus it offers links to other articles on how to use AutoHotkey.


To see more of the many possible applications for AutoHotkey check out  "Free AutoHotkey Scripts and Apps for Learning."

If you want more information in either the Amazon Kindle format, EPUB format for use on the iPad and other types of tablet computers (or on your PC), or PDF for printing on notebook size paper, then check out the following e-books by Jack Dunning:

See how to get this e-book FREE, AutoHotkey Tricks You Ought to Do with Windows!


Beginner's GuideNow available in e-book format, Jack's A Beginner's Guide to AutoHotkey, Absolutely the Best Free Windows Utility Software Ever!: Create Power Tools for Windows XP, Windows Vista, Windows 7 and Windows 8.

Building Power Tools for Windows XP, Windows Vista, Windows 7 and Windows 8, AutoHotkey is the most powerful, flexible, free Windows utility software available. Anyone can instantly add more of the functions that they want in all of their Windows programs, whether installed on their computer or while working on the Web. AutoHotkey has a universality not found in any other Windows utility—free or paid.

Now in its second edition (October 2013), Jack takes you through his learning experience as he explores writing simple AutoHotkey scripts for adding repetitive text in any program or on the Web, running programs with special hotkeys or gadgets, manipulating the size and screen location of windows, making any window always-on-top, copying and moving files, and much more. Each chapter builds on the previous chapters. (The second edition now includes a chapter index of the AutoHotkey commands used in the book, plus Internet links directly to each commmand to the official AutoHotkey Web site.)

Also available at Amazon.com for the Kindle and Kindle software.

*                    *                    *

Digging DeeperJack's latest AutoHotkey book which is comprised of updated, reorganized and indexed chapters from many of his sample applications is now available at Amazon for Kindle hardware (or free software) users. The book is organized and broken up into parts by topic. The book is not for the complete beginner since it builds on the information in A Beginner's Guide to AutoHotkey. However, if a person is reasonably computer literate, they could go directly to this book for ideas and techniques without the first book.

Jack shows how to build real world AutoHotkey applications. The AutoHotkey commands used are included in a special index to the chapters in which they appear. Even I can't remember everything I wrote."

Also available at Amazon.com for the Kindle and Kindle software.

To get more detailed information about AutoHotkey and see a List of AutoHotkey commands visit the AutoHotkey Web site.

Some More AutoHotkey Uses

AutoHotkey is a scripting language which can make almost everything easier on Windows computers. It can be a simple one-line script in a text file which enters your e-mail address after only typing a couple of characters (i.e. "m@" when typed becomes "myemailaddress@mymailserver.com"). There are some power apps which can make your computer life much easier. For example:

Autocorrect over 5,000 commonly misspelled words in any Windows program or on the Web.

Set a reminder for a later meeting.

Use QuickLinks to replace the missing Windows 8 Start Menu (or just to make life easier in any version of Windows).

ComputorEdge