In the event that an RE could match more than one substring of a given
string, the RE matches the one starting earliest in the string.
If the RE could match more than one substring starting at that point,
- its choice is determined by its preference>:
- either the longest substring, or the shortest.
+ either the longest possible match or the shortest possible match will
+ be taken, depending on whether the RE is greedy> or
+ non-greedy>.
- Most atoms, and all constraints, have no preference.
- A parenthesized RE has the same preference (possibly none) as the RE.
- A quantified atom with quantifier
- {>m>}>
- or
- {>m>}?>
- has the same preference (possibly none) as the atom itself.
- A quantified atom with other normal quantifiers (including
- {>m>,>n>}>
- with m> equal to n>)
- prefers longest match.
- A quantified atom with other non-greedy quantifiers (including
- {>m>,>n>}?>
- with m> equal to n>)
- prefers shortest match.
- A branch has the same preference as the first quantified atom in it
- which has a preference.
- An RE consisting of two or more branches connected by the
- |> operator prefers longest match.
+ Whether an RE is greedy or not is determined by the following rules:
+
+
+ Most atoms, and all constraints, have no greediness attribute (because
+ they cannot match variable amounts of text anyway).
+
+
+
+ Adding parentheses around an RE does not change its greediness.
+
+
+
+ A quantified atom with a fixed-repetition quantifier
+ ({>m>}>
+ or
+ {>m>}?>)
+ has the same greediness (possibly none) as the atom itself.
+
+
+
+ A quantified atom with other normal quantifiers (including
+ {>m>,>n>}>
+ with m> equal to n>)
+ is greedy (prefers longest match).
+
+
+
+ A quantified atom with a non-greedy quantifier (including
+ {>m>,>n>}?>
+ with m> equal to n>)
+ is non-greedy (prefers shortest match).
+
+
+
+ A branch — that is, an RE that has no top-level
+ |> operator — has the same greediness as the first
+ quantified atom in it that has a greediness attribute.
+
+
+
+ An RE consisting of two or more branches connected by the
+ |> operator is always greedy.
+
+
+
+
+
+ The above rules associate greediness attributes not only with individual
+ quantified atoms, but with branches and entire REs that contain quantified
+ atoms. What that means is that the matching is done in such a way that
+ the branch, or whole RE, matches the longest or shortest possible
+ substring as a whole>. Once the length of the entire match
+ is determined, the part of it that matches any particular subexpression
+ is determined on the basis of the greediness attribute of that
+ subexpression, with subexpressions starting earlier in the RE taking
+ priority over ones starting later.
+
+
+ An example of what this means:
+
+SELECT SUBSTRING('XY1234Z', 'Y*([0-9]{1,3})');
+Result: 123
+SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
+Result: 1
+
+ In the first case, the RE as a whole is greedy because Y*>
+ is greedy. It can match beginning at the Y>, and it matches
+ the longest possible string starting there, i.e., Y123>.
+ The output is the parenthesized part of that, or 123>.
+ In the second case, the RE as a whole is non-greedy because Y*?>
+ is non-greedy. It can match beginning at the Y>, and it matches
+ the shortest possible string starting there, i.e., Y1>.
+ The subexpression [0-9]{1,3}> is greedy but it cannot change
+ the decision as to the overall match length; so it is forced to match
+ just 1>.
- Subject to the constraints imposed by the rules for matching the whole RE,
- subexpressions also match the longest or shortest possible substrings,
- based on their preferences,
- with subexpressions starting earlier in the RE taking priority over
- ones starting later.
- Note that outer subexpressions thus take priority over
- their component subexpressions.
+ In short, when an RE contains both greedy and non-greedy subexpressions,
+ the total match length is either as long as possible or as short as
+ possible, according to the attribute assigned to the whole RE. The
+ attributes assigned to the subexpressions only affect how much of that
+ match they are allowed to eat> relative to each other.
The quantifiers {1,1}> and {1,1}?>
- can be used to force longest and shortest preference, respectively,
+ can be used to force greediness or non-greediness, respectively,
on a subexpression or a whole RE.