Non-matching Regexps
I used String.split(String regexp) a lot to do some String splittings into String[] Arrays. The elements of the array are build around the maching portions of the original string. This leads maybe to the problem that some matched characters are not included. If e.g. a requirement is to split around a semikolon when uppercase characters are following this is exactly the case. The first uppercase character is not included in the array element. To enable this, the use of a non-capturing group is appropriate.
Literature mention this as zerolength noncapturing groups with positive or negative lookahead/lookbehinds. In the above case (split around a semikolon but only when followed by an uppercase character) the syntax for a lookahead has to be used; and because of the uppercase character (which has not to be included in the match) can be expressed in a regexp in a positive way, we have to use a noncapturing positive lookahead - group.
Syntax for this is:
Y(?=X) Y ist the core regexp to split around and Y is the following regexp (assertion)
For some reasons, negated assertions are making problems. Because of this, a separate Syntax has to be used (a negative lookahead)
Y(!=X) this means regexp Y not followed by regexp X
For some more reading about this, the following URL provides excellent information
http://www.regular-expressions.info/lookaround.html
this leads to regexps like:
"[\\s/(:](?=\\w)|\"(?=[A-Z])|\\-(?!\\s)";
(double escaped because this is a java-string-literal)
Literature mention this as zerolength noncapturing groups with positive or negative lookahead/lookbehinds. In the above case (split around a semikolon but only when followed by an uppercase character) the syntax for a lookahead has to be used; and because of the uppercase character (which has not to be included in the match) can be expressed in a regexp in a positive way, we have to use a noncapturing positive lookahead - group.
Syntax for this is:
Y(?=X) Y ist the core regexp to split around and Y is the following regexp (assertion)
For some reasons, negated assertions are making problems. Because of this, a separate Syntax has to be used (a negative lookahead)
Y(!=X) this means regexp Y not followed by regexp X
For some more reading about this, the following URL provides excellent information
http://www.regular-expressions.info/lookaround.html
this leads to regexps like:
"[\\s/(:](?=\\w)|\"(?=[A-Z])|\\-(?!\\s)";
(double escaped because this is a java-string-literal)

0 Comments:
Kommentar veröffentlichen
<< Home