• Register
Post tutorial Report RSS Regex - Regular Expressions - Part One

Regular expressions can search and manipulate text in a string variable, document or file. Unlike common search and replace functions found in word processors and search engines; regular expressions are a sophisticated text processing instructions used to perform smarter searches and replacements of text. It is ideal for programmers to detect errors, validate forms, find all occurances of a series of words and replace them given a certain condition.

Posted by on - Intermediate Other

I am starting to make use of regular expressions (or regex) in the Zero One plugin, engine and the X-Producer as a feature, and as a tool for my own use when working on my software.

Regular expressions can search and manipulate text in a string variable, document or file. Unlike common search and replace functions found in word processors and search engines; regular expressions are a sophisticated text processing instructions used to perform smarter searches and replacements of text. It is ideal for programmers to detect errors, validate forms, find all occurances of a series of words and replace them given a certain condition.

Here is a video of regular expressions being used in the X-Producer; Sports Fiction's game editor.

This is no new technology, since it has been around for over 6 decades. It has many applications in software development.

To introduce regex without going into much detail, we will first look at some basic examples of searches and matches:
All of the following regular expressions start and end with front-slashes, as is the case in many regex tools, but may sometimes be unecessary due to the slashes being added for you, behind the scenes so to say.

In the text: Print User.Name

Regex: /Print User.Name/

Matches: Print User.Name

Reason: All the characters in the expression are literal, and the text is identical

In the text: Print User.Name

Regex: /P.... User.Name/

Matches: Print User.Name

Reason: All the characters in the expression are literal, except for the dots, which return any character, except newline/return when set not to detect new lines.

In the text: Post User.Name

Regex: /P.... User.Name/

Matches:

Reason: Nothing is returned because what is expected is a 5 character word followed by a space at the start of the text beginning with a capital P. Post is a 4 character word followed by a space. In actual fact, what is expected before the word User, is 6 characters including the initial capital p, and the space.

In the text: Print User.Name

Regex: /[a-zA-Z]/

In global mode; matches: Print User.Name

In normal mode; matches: P

Reason: The square brackets represent a single occurance of any alphebetical character from a-z, in lower or uppercase. Only in global mode will this match the whole text, otherwise only the first occurance of a character from a-z will be returned.

So far
We have seen sneak preview of regex; not quite as useful as it gets, but simple enough for most to grasp, and sets us up for a more complex and sophisticated regular expression introduction to follow. A great way to search for matches in your strings using the plugin, or your asset files or X-Queries using the X-Producer.

As your knowledge of regular expressions increase, so will the number solutions you have available for finding patterns of text, validating the matches and replacing the matches where necessary; in string variables, opened documents and files saved on the system.

In your first encounter of the regular expression syntax, it may seem like overkill to take the time to write an instruction which might have been handled by a simple search and replace command, but there are many text operations which a normal search and replace tool will not be able to handle, patterns relative to adjacent content which you as a human being can easily recognise, like the following:

Tom won the bingo features 3 words that happen to be 3 characters long

Word.exe, Word.txt, Powerpoint.exe features two filenames classified as .exe (executable programs)

mario64@nintendo.com, mario64atnintendo.com contains one valid email address, the one with the @ sign in between mario64 and nintendo.com

HTML files mostly consist of XML syntax which contain the following XML pattern somewhere before XML pattern

and Green is my favourite color is a sentence which contains the US variation of the word colour

These patterns are easily recognisable by the human eye, but how could you perform a search and replace with such criteria on a computer? Well for one, you could resort to programming an application dedicated for the purpose, but that is the extreme solution. A more efficient solution is to use Regular Expressions.

Regex can be tested online using one of a number of regular expression development tools. The one that I am using is at www.regexr.com/ which contains a global mode option, some other cool settings and a pattern cheatsheet to name a few of its features.

I will no provide an early example of using regex in your search operations, we will have demonstrated a number of simple regular expressions which consistently match the 3 words in the first sentence above (Tom won the bingo) which happen to contain 3 characters followed by a space character using repetition.

Repetition

In the text: Tom won the bingo

Regex: /Tom won the/

Matches: Tom won the

Reason: We have simply requested a literal match of all the characters as indicated between the front-slashes. These front-slashes may not be required in certain Regex input fields or function parameters; something indicated by the supplied documentation for the given tool.

In the text: Tom won the bingo

Regex: /./

Matches: Tom won the bingo

Reason: We have used the wildcard (.) to match everything. Now what we need is to limit the match to 3 letter words

One way to limit the wildcard to 3 characters is to introduce a repetition. Repetitions indicate how many times a certain pattern can or must occur. The repetition syntax we will introduce at this stage will be a number enclosed by curly braces, after the pattern like so:

pattern{1} for 1 repetition
pattern{2} for 2
pattern{3} for 3 repetitions and so on

So to match any 3 character pattern, followed by a space; we declare the pattern as the wildcard repeated 3 times, followed by a space

In the text: Tom won the bingo

Regex: /.{3} /

Matches: Tom won the

Reason: The wildcard will take 3 characters. After the wildcard and its repetition parameter, we have a space; therefore a space is expected after the sequence of 3 characters. We have a match of all of the 3 letter words, including the spaces but not the word bingo, which happens to contain 5 letters.

In some programming languages and systems the syntax will differ. For example with the Visual Studio IDE, search and replace repetition is performed like so:

pattern^1 for 1 repetition
pattern^2 for 2 repetitions
pattern^3 for 3 repetitions

Now continuing with the traditional syntax. /.{3} / matches 3 characters followed by a space. This method is not perfect because the wildcard will also return non word characters and control characters; therefore:

In the text: Tom ### the bingo

Regex: /.{3} /

Matches: Tom ### the

Reason: We have used the wildcard (.) to match everything, including the non word characters, in this instance, the hashes (### ).

In the text: Tom ### the $$$

Regex: /.{3} /

Matches: Tom ### the

Reason: We have used the wildcard (.) to match everything, including the hashes (### ), but not the dollar signs ($$$), because the dollar signs are not followed by a space.

In the text: A B # C D E

Regex: /.{3} /

Matches: A B # C

Reason: We have used the wildcard (.) to match everything, including the spaces and the hash. It did not match D E because this string does not end with a space character. It did match #, followed by space, followed by C, followed by space. This is because the first three characters qualify as any wildcard character, the last character is a literal space character, and had to be a space character to qualify because of the literal space character at the end of our pattern.

Repetitions in Global mode

These examples are based on global matching, which similar to the 'Find All' tool in a text editor, which finds all matches. If this mode where turned off, the following would occur:

In the text: Tom won the bingo

Regex: /./

Matches: Tom

Reason: We have used the wildcard (.) to match everything; but only the first match is returned, the engine will not search for the other matches.

In the text: A B # C D E

Regex: /.{3} /

Matches: A B

Reason: We have used the wildcard (.) to match everything, including the spaces and the hashes. It did not match '# C ' because the search was terminated after the first qualified match, 'A B '

Global expressions tend to be suffixed with a lowercase 'g', written as follows: /expression/g

But for learning purposes, we are omitting the g from our patterns to make them look less cluttered. Global mode tends to be controlled by a checkbox in the Regex enhanced tool or its configuration file.

The following two expressions perform the same action with global mode turned on.

In the text: Object_0001 & Object_0002

Regex: /Object_.{4}/

Matches: Object_0001 Object_0002

Reason: We have requested that there needs to be the string 'Object' with a capital 'O'; and this string must precede an underscore (_), and the underscore must be followed by 4 characters; in this instance the text contains 4 digits. The match will not pick up the ampersand (&) and its adjacent spaces, since these were not a part of the Regex pattern.

We have matched both Object_0001 and Object_0002 because global mode is switched on.

In the text: Object_0001 & Object_0002

Regex: /Object_.{4}/g

Matches: Object_0001 Object_0002

Reason: We have requested all occurrences because of the '/g' at the end. In addition, we have requested that there needs to be the string 'Object' with a capital 'O'; and this string must precede an underscore (_), and the underscore must be followed by 4 characters.

With global mode turned off, only the later example which follows will match both occurrences.

But how would we ensure that such a pattern ends with a series of 4 digits? And how can we limit characters to word characters? These are some of the things we will discuss in the next Regex update.

Next
In the next tutorial I will go into more depth, introducing more of the advanced Regex functions with practical examples. Until then, take care.

Post a comment
Sign in or join with:

Only registered members can share their thoughts. So come on! Join the community today (totally free - or sign in with your social account on the right) and join in the conversation.

Tutorial
Tags
Tutorial
Browse
Tutorials
Report
Report
Share
Related Games
Related Engines
DarkBASIC Professional
DarkBASIC Professional Commercial