Lua - Pattern Matching



Pattern Matching is a very useful feature to search and manipulate strings. Lua provides a built-in support for pattern matching through string library methods. Lua provides a concise and efficient way to manipulate string based on specified pattern.

Lua pattern matching follows a set of special characters and a set of rules which determines the patterns which we can use to search strings. It is not as extensive as regular expressions in other programming languages like java, c++ but is sufficient enough for most text processing tasks. It is lightweight and being built-in, has direct support of language as no extra library is to be included.

Pattern Syntax

Special Characters

Following is the list of special characters which can be used to create powerful patterns in Lua.

  • . (dot) − To match any single character.

  • %a − To match any letter.

  • %c − To match any control character.

  • %d − To match any digit.

  • %g − To match any printable character except space.

  • %l − To match any lowercase letter.

  • %p − To match any punctuation character.

  • %s − To match any space character (space, tab, newline, etc.).

  • %u − To match any uppercase letter.

  • %w − To match any alphanumeric character (letters or digits).

  • %x − To match any hexadecimal digit.

  • %z − To match the character with representation 0.

  • %x (where x - non-alphanumeric character) − to represent the character x itself. Mainly used to escape special characters. For example, %. matches a literal dot, %+ matches a literal plus sign, and so on.

Set

[Set] representation is used match a character class to match a character within a certain range or set. For example, to match lower case letters, we can [a-z] or for digits, we can use [0-9]. We can use ^ in the beginning to negate the expression. For example, [^0-9] can be used to match a pattern with no digits.

Quantifiers

Quantifiers are used to match occurrences of a particular character or group of character in a text.

  • * − To match zero or more occurrences.

  • + − To match one or more occurrences.

  • - − To match zero or more occurrences in greedy way.

  • ? − To match zero or one occurrence.

Anchor

Anchors are used to match position of pattern within a text.

  • ^ − To match beginning of the string.

  • $ − To match end of the string.

Capture

Parenthesis () are used to capture part of captured string.

Example - Using Patterns

main.lua

local text = "First 123 then 456."

-- Search any sequence of digits
local start, finish = string.find(text, "%d+")
print("Digit sequence:", string.sub(text, start, finish)) -- prints Digit sequence: 123

-- Search a pattern in the beginning of the string
local start, finish = string.find(text, "^First")
if start then
  print("Sentence starts with 'First'") -- prints Sentence starts with 'First'
end

-- Search a pattern at the end of the string
local start, finish = string.find(text, "%d+%$")
if not start then
  print("Sentence does not end with digits.") -- prints Sentence does not end with digits.
end

Output

When we run the above program, we will get the following output−

Digit sequence: 123
Sentence starts with 'First'
Sentence does not end with digits.
Advertisements