blueyi's notes

Follow Excellence,Success will chase you!

0%

vim中的正则表达式匹配规则

内容来源:help regexp
实例场景为在写总结时,经常习惯于一条条的以序号开头列出。例如:

  1. first
  2. second test 1.first
  3. third

由于坏习惯每次打完数字之后总是习惯给个空格,然后markdown渲染时就会出问题,会把这个前面内容序号当时所需渲染的序号,显然此处我们不需要渲染。那么就需要去掉这个多余的空格,假如刚好想到使用正则。正常情况下很简单嘛,使用零宽断言匹配数字之后跟的点号和空白,然后替换为一个点号即可。正则为(?<=^\d+)\.\s+,但在vim下无法使用,vim下的完整替换命令应该为:%s /\(^\d\+\)\@<=\.\s\+/\./g。简单解释如下:
vim下面的\+与我们正常regex中的+含义一样,\(\)用于分组,\@<=表示零宽正回顾后发断言。不像正常见到的那种=号后面表达想要匹配的表达式,VIM中\@=中的=号仅仅表示需要匹配前面括号中的内容。例如foo\(bar\)\@=会匹配foobar中的foo,而foo\(bar\)\@!\@!表示foo后面不匹配bar的位置,即会匹配所有后面不跟bar的foo。\@=等价于\&,使用\&时不需要带括号,如\(foo\)\@=等价于foo\&

详情如下:

vim中正则模式说明

Some characters in the pattern are taken literally. They match with the same character in the text. When preceded with a backslash however, these characters get a special meaning.
Other characters have a special meaning without a backslash. They need to be preceded with a backslash to match literally.
If a character is taken literally or not depends on the ‘magic’ option and the items mentioned next.
Use of “\m” makes the pattern after it be interpreted as if ‘magic’ is set, ignoring the actual value of the ‘magic’ option.
Use of “\M” makes the pattern after it be interpreted as if ‘nomagic’ is used.
Use of “\v” means that in the pattern after it all ASCII characters except ‘0’-‘9’, ‘a’-‘z’, ‘A’-‘Z’ and ‘_’ have a special meaning. “very magic”
Use of “\V” means that in the pattern after it only the backslash has a special meaning. “very nomagic”

1
2
3
4
5
6
7
8
9
10
11
12
13
after:	  \v	   \m	    \M	     \V		matches ~
'magic' 'nomagic'
$ $ $ \$ matches end-of-line
. . \. \. matches any character
* * \* \* any number of the previous atom
() \(\) \(\) \(\) grouping into an atom
| \| \| \| separating alternatives
\a \a \a \a alphabetic character
\\ \\ \\ \\ literal backslash
\. \. . . literal dot
\{ { { { literal '{'
a a a a literal 'a'

{only Vim supports \m, \M, \v and \V}
It is recommended to always keep the ‘magic’ option at the default setting, which is ‘magic’. This avoids portability problems. To make a pattern immune to the ‘magic’ option being set or not, put “\m” or “\M” at the start of the pattern.

vim匹配规则概览

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
       'magic' 'nomagic'	`matches of the preceding atom`
|/star| * \* 0 or more as many as possible
|/\+| \+ \+ 1 or more as many as possible (*)
|/\=| \= \= 0 or 1 as many as possible (*)
|/\?| \? \? 0 or 1 as many as possible (*)
|/\{| \{n,m} \{n,m} n to m as many as possible (*)
\{n} \{n} n exactly (*)
\{n,} \{n,} at least n as many as possible (*)
\{,m} \{,m} 0 to m as many as possible (*)
\{} \{} 0 or more as many as possible (same as *) (*)

|/\{-| \{-n,m} \{-n,m} n to m as few as possible (*)
\{-n} \{-n} n exactly (*)
\{-n,} \{-n,} at least n as few as possible (*)
\{-,m} \{-,m} 0 to m as few as possible (*)
\{-} \{-} 0 or more as few as possible (*)

|/\@>| \@> \@> 1, like matching a whole pattern (*)
|/\@=| \@= \@= nothing, requires a match |/zero-width| (*)
|/\@!| \@! \@! nothing, requires NO match |/zero-width| (*)
|/\@<=| \@<= \@<= nothing, requires a match behind |/zero-width| (*)
|/\@<!| \@<! \@<! nothing, requires NO match behind |/zero-width| (*)

Overview of ordinary atoms

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
      'magic'   'nomagic'	'matches'
|/^| ^ ^ start-of-line (at start of pattern) |/zero-width|
|/\^| \^ \^ literal '^'
|/\_^| \_^ \_^ start-of-line (used anywhere) |/zero-width|
|/$| $ $ end-of-line (at end of pattern) |/zero-width|
|/\$| \$ \$ literal '$'
|/\_$| \_$ \_$ end-of-line (used anywhere) |/zero-width|
|/.| . \. any single character (not an end-of-line)
|/\_.| \_. \_. any single character or end-of-line
|/\<| \< \< beginning of a word |/zero-width|
|/\>| \> \> end of a word |/zero-width|
|/\zs| \zs \zs anything, sets start of match
|/\ze| \ze \ze anything, sets end of match
|/\%^| \%^ \%^ beginning of file |/zero-width|
|/\%$| \%$ \%$ end of file |/zero-width|
|/\%V| \%V \%V inside Visual area |/zero-width|
|/\%#| \%# \%# cursor position |/zero-width|
|/\%'m| \%'m \%'m mark m position |/zero-width|
|/\%l| \%23l \%23l in line 23 |/zero-width|
|/\%c| \%23c \%23c in column 23 |/zero-width|
|/\%v| \%23v \%23v in virtual column 23 |/zero-width|

Character classes {not in Vi}: /character-classes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|/\i|	\i	\i	identifier character (see 'isident' option)
|/\I| \I \I like "\i", but excluding digits
|/\k| \k \k keyword character (see 'iskeyword' option)
|/\K| \K \K like "\k", but excluding digits
|/\f| \f \f file name character (see 'isfname' option)
|/\F| \F \F like "\f", but excluding digits
|/\p| \p \p printable character (see 'isprint' option)
|/\P| \P \P like "\p", but excluding digits
|/\s| \s \s whitespace character: <Space> and <Tab>
|/\S| \S \S non-whitespace character; opposite of \s
|/\d| \d \d digit: [0-9]
|/\D| \D \D non-digit: [^0-9]
|/\x| \x \x hex digit: [0-9A-Fa-f]
|/\X| \X \X non-hex digit: [^0-9A-Fa-f]
|/\o| \o \o octal digit: [0-7]
|/\O| \O \O non-octal digit: [^0-7]
|/\w| \w \w word character: [0-9A-Za-z_]
|/\W| \W \W non-word character: [^0-9A-Za-z_]
|/\h| \h \h head of word character: [A-Za-z_]
|/\H| \H \H non-head of word character: [^A-Za-z_]
|/\a| \a \a alphabetic character: [A-Za-z]
|/\A| \A \A non-alphabetic character: [^A-Za-z]
|/\l| \l \l lowercase character: [a-z]
|/\L| \L \L non-lowercase character: [^a-z]
|/\u| \u \u uppercase character: [A-Z]
|/\U| \U \U non-uppercase character [^A-Z]
|/\_| \_x \_x where x is any of the characters above: character class with end-of-line included (end of character classes)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|/\e|	\e	\e	<Esc>
|/\t| \t \t <Tab>
|/\r| \r \r <CR>
|/\b| \b \b <BS>
|/\n| \n \n end-of-line
|/~| ~ \~ last given substitute string
|/\1| \1 \1 same string as matched by first \(\) {not in Vi}
|/\2| \2 \2 Like "\1", but uses second \(\)
...
|/\9| \9 \9 Like "\1", but uses ninth \(\)
*E68*
|/\z1| \z1 \z1 only for syntax highlighting, see |:syn-ext-match|
...
|/\z1| \z9 \z9 only for syntax highlighting, see |:syn-ext-match|

x x a character with no special meaning matches itself

|/[]| [] \[] any character specified inside the []
|/\%[]| \%[] \%[] a sequence of optionally matched atoms

|/\c| \c \c ignore case, do not use the 'ignorecase' option
|/\C| \C \C match case, do not use the 'ignorecase' option
|/\Z| \Z \Z ignore differences in Unicode "combining characters".
Useful when searching voweled Hebrew or Arabic text.

|/\m| \m \m 'magic' on for the following chars in the pattern
|/\M| \M \M 'magic' off for the following chars in the pattern
|/\v| \v \v the following chars in the pattern are "very magic"
|/\V| \V \V the following chars in the pattern are "very nomagic"
|/\%#=| \%#=1 \%#=1 select regexp engine |/zero-width|

|/\%d| \%d \%d match specified decimal character (eg \%d123)
|/\%x| \%x \%x match specified hex character (eg \%x2a)
|/\%o| \%o \%o match specified octal character (eg \%o040)
|/\%u| \%u \%u match specified multibyte character (eg \%u20ac)
|/\%U| \%U \%U match specified large multibyte character (eg
\%U12345678)
|/\%C| \%C \%C match any composing characters

Example

1
2
3
4
5
6
7
8
9
10
11
12
13
\<\I\i*		or
\<\h\w*
\<[a-zA-Z_][a-zA-Z0-9_]*
An identifier (e.g., in a C program).

\(\.$\|\. \) A period followed by <EOL> or a space.

[.!?][])"']*\($\|[ ]\) A search pattern that finds the end of a sentence,
with almost the same definition as the ")" command.

cat\Z Both "cat" and "càt" ("a" followed by 0x0300)
Does not match "càt" (character 0x00e0), even
though it may look the same.

vim中的零宽断言举例

@= Matches the preceding atom with zero width. {not in Vi}
Like “(?=pattern)” in Perl.
Example matches ~
foo(bar)@= “foo” in “foobar”
foo(bar)@=foo nothing
/zero-width
When using “@=” (or “^”, “$”, “<“, “>“) no characters are included
in the match. These items are only used to check if a match can be
made. This can be tricky, because a match with following items will
be done in the same position. The last example above will not match
“foobarfoo”, because it tries match “foo” in the same position where
“bar” matched.

Note that using "\&" works the same as using "\@=": "foo\&.." is the
same as "\(foo\)\@=..".  But using "\&" is easier, you don't need the
braces.


                        */\@!*

@! Matches with zero width if the preceding atom does NOT match at the
current position. |/zero-width| {not in Vi}
Like “(?!pattern)” in Perl.
Example matches ~
foo(bar)@! any “foo” not followed by “bar”
a.{-}p@! “a”, “ap”, “app”, “appp”, etc. not immediately
followed by a “p”
if ((then)@!.)*$ “if “ not followed by “then”

Using "\@!" is tricky, because there are many places where a pattern
does not match.  "a.*p\@!" will match from an "a" to the end of the
line, because ".*" can match all characters in the line and the "p"
doesn't match at the end of the line.  "a.\{-}p\@!" will match any
"a", "ap", "app", etc. that isn't followed by a "p", because the "."
can match a "p" and "p\@!" doesn't match after that.

You can't use "\@!" to look for a non-match before the matching
position: "\(foo\)\@!bar" will match "bar" in "foobar", because at the
position where "bar" matches, "foo" does not match.  To avoid matching
"foobar" you could use "\(foo\)\@!...bar", but that doesn't match a
bar at the start of a line.  Use "\(foo\)\@<!bar".

Useful example: to find "foo" in a line that does not contain "bar": >
    /^\%(.*bar\)\@!.*\zsfoo

< This pattern first checks that there is not a single position in the
line where “bar” matches. If “.*bar” matches somewhere the @! will
reject the pattern. When there is no match any “foo” will be found.
The “\zs” is to have the match start just before “foo”.

*/\@<=*

@<= Matches with zero width if the preceding atom matches just before what
follows. |/zero-width| {not in Vi}
Like “(?<=pattern)” in Perl, but Vim allows non-fixed-width patterns.
Example matches ~
(an_s+)@<=file “file” after “an” and white space or an
end-of-line
For speed it’s often much better to avoid this multi. Try using “\zs”
instead |/\zs|. To match the same as the above example:
an_s+\zsfile
At least set a limit for the look-behind, see below.

"\@<=" and "\@<!" check for matches just before what follows.
Theoretically these matches could start anywhere before this position.
But to limit the time needed, only the line where what follows matches
is searched, and one line before that (if there is one).  This should
be sufficient to match most things and not be too slow.

In the old regexp engine the part of the pattern after "\@<=" and
"\@<!" are checked for a match first, thus things like "\1" don't work
to reference \(\) inside the preceding atom.  It does work the other
way around:
Bad example            matches ~
\%#=1\1\@<=,\([a-z]\+\)        ",abc" in "abc,abc"

However, the new regexp engine works differently, it is better to not
rely on this behavior, do not use \@<= if it can be avoided:
Example                matches ~
\([a-z]\+\)\zs,\1        ",abc" in "abc,abc"

@123<=
Like “@<=” but only look back 123 bytes. This avoids trying lots
of matches that are known to fail and make executing the pattern very
slow. Example, check if there is a “<” just before “span”:
/<@1<=span
This will try matching “<” only one byte before “span”, which is the
only place that works anyway.
After crossing a line boundary, the limit is relative to the end of
the line. Thus the characters at the start of the line with the match
are not counted (this is just to keep it simple).
The number zero is the same as no limit.

*/\@<!*

@<! Matches with zero width if the preceding atom does NOT match just
before what follows. Thus this matches if there is no position in the
current or previous line where the atom matches such that it ends just
before what follows. |/zero-width| {not in Vi}
Like “(?<!pattern)” in Perl, but Vim allows non-fixed-width patterns.
The match with the preceding atom is made to end just before the match
with what follows, thus an atom that ends in “.*” will work.
Warning: This can be slow (because many positions need to be checked
for a match). Use a limit if you can, see below.
Example matches ~
(foo)@<!bar any “bar” that’s not in “foobar”
(//.*)@<!in “in” which is not after “//“

@123<!
Like “@<!” but only look back 123 bytes. This avoids trying lots of
matches that are known to fail and make executing the pattern very
slow.

*/\@>*

@> Matches the preceding atom like matching a whole pattern. {not in Vi}
Like “(?>pattern)” in Perl.
Example matches ~
(a)@>a nothing (the “a“ takes all the “a”‘s, there can’t be
another one following)

This matches the preceding atom as if it was a pattern by itself.  If
it doesn't match, there is no retry with shorter sub-matches or
anything.  Observe this difference: "a*b" and "a*ab" both match
"aaab", but in the second case the "a*" matches only the first two
"a"s.  "\(a*\)\@>ab" will not match "aaab", because the "a*" matches
the "aaa" (as many "a"s as possible), thus the "ab" can't match.

Welcome to my other publishing channels