Newline is certainly not a digit, nor a word character, so it is
sensible that it should match these complemented character classes.
Previously, \D and \W acted that way by default, but in
newline-sensitive mode ('n' or 'p' flag) they did not match newlines.
This behavior was previously forced because explicit complemented
character classes don't match newlines in newline-sensitive mode;
but as of the previous commit that implementation constraint no
longer exists. It seems useful to change this because the primary
real-world use for newline-sensitive mode seems to be to match the
default behavior of other regex engines such as Perl and Javascript
... and their default behavior is that these match newlines.
The old behavior can be kept by writing an explicit complemented
character class, i.e. [^[:digit:]] or [^[:word:]]. (This means
that \D and \W are not exactly equivalent to those strings, but
they weren't anyway.)
Discussion: https://postgr.es/m/
3220564.
1613859619@sss.pgh.pa.us
|
\d
- [[:digit:]]
+ matches any digit, like
+ [[:digit:]]
|
\s
- [[:space:]]
+ matches any whitespace character, like
+ [[:space:]]
|
\w
- [[:word:]]
+ matches any word character, like
+ [[:word:]]
|
\D
- [^[:digit:]]
+ matches any non-digit, like
+ [^[:digit:]]
|
\S
- [^[:space:]]
+ matches any non-whitespace character, like
+ [^[:space:]]
|
\W
- [^[:word:]]
+ matches any non-word character, like
+ [^[:word:]]
If newline-sensitive matching is specified, .
and bracket expressions using ^
will never match the newline character
- (so that matches will never cross newlines unless the RE
- explicitly arranges it)
+ (so that matches will not cross lines unless the RE
+ explicitly includes a newline)
and ^ and $
will match the empty string after and before a newline
respectively, in addition to matching at beginning and end of string
respectively.
But the ARE escapes \A and \Z
continue to match beginning or end of string only.
+ Also, the character class shorthands \D
+ and \W will match a newline regardless of this mode.
+ (Before
PostgreSQL 14, they did not match
+ newlines when in newline-sensitive mode.
+ Write [^[:digit:]]
+ or [^[:word:]] to get the old behavior.)
\fB^\fR
will never match the newline character
(so that matches will never cross newlines unless the RE
-explicitly arranges it)
+explicitly includes a newline)
and
\fB^\fR
and
and
\fB\eZ\fR
continue to match beginning or end of string \fIonly\fR.
+Also, the character class shorthands
+\fB\eD\fR
+and
+\fB\eW\fR
+will match a newline regardless of this mode.
.PP
If partial newline-sensitive matching is specified,
this affects \fB.\fR
/* build arcs for char class; this may cause color splitting */
subcolorcvec(v, cv, cstate, cstate);
-
- /* in NLSTOP mode, ensure newline is not part of the result set */
- if (v->cflags & REG_NLSTOP)
- newarc(v->nfa, PLAIN, v->nlcolor, cstate, cstate);
NOERR();
/* clean up any subcolors in the arc set */
NOERR();
bracket(v, left, right);
+
+ /* in NLSTOP mode, ensure newline is not part of the result set */
if (v->cflags & REG_NLSTOP)
newarc(v->nfa, PLAIN, v->nlcolor, left, right);
NOERR();
test_regex
-------------------------------
{0,REG_UNONPOSIX,REG_ULOCALE}
- {abc}
+ {"abc +
+ def"}
(2 rows)
select * from test_regex('[\D]+', E'abc\ndef345', 'LPE');
test_regex
----------------------------------------
{0,REG_UBBS,REG_UNONPOSIX,REG_ULOCALE}
- {abc}
+ {"abc +
+ def"}
(2 rows)
select * from test_regex('\w+', E'abc_012\ndef', 'LP');
test_regex
-------------------------------
{0,REG_UNONPOSIX,REG_ULOCALE}
- {***}
+ {"*** +
+ @@@"}
(2 rows)
select * from test_regex('[\W]+', E'***\n@@@___', 'LPE');
test_regex
----------------------------------------
{0,REG_UBBS,REG_UNONPOSIX,REG_ULOCALE}
- {***}
+ {"*** +
+ @@@"}
(2 rows)
-- doing 13 "escapes"