Java regular expression instance operation
iChochy 2021-06-14 06:06:53

Regular Expression Regular expressions , abbreviation RegExp, General expressions , It's implemented in many development languages , It can be retrieved quickly through regular expressions 、 matching 、 lookup 、 Replace the text in the string .

Simple example

Match URL

/*
* File:RegExp.java
* User:iChochy
* URL:https://ichochy.com
* Copyright (c) 2020
* Date:2020/09/07 18:11:07
*/
package com.ichochy.example;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExp {
public static void main(String[] args) {
String input = "https://ichochy.com";
// Regular expressions ,(.+) For one or more characters
String regex = "https://.+.com";
Boolean flag = Pattern.matches(regex,input);
System.out.println(flag); // Full text matching returns :true
}
}

Matches Method

Matcher.matches Method , Match the whole block , String exact match returns true.

/*
* File:RegExp.java
* User:iChochy
* URL:https://ichochy.com
* Copyright (c) 2020
* Date:2020/09/07 18:11:07
*/
package com.ichochy.example;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExp {
public static void main(String[] args) {
String input = "https://ichochy.com";
// Regular expressions ,(.+) For one or more characters
String regex = "https://.+.com";
Pattern pattern = Pattern.compile(regex);// Compile expressions
Matcher matcher = pattern.matcher(input);// Match expression
System.out.println(matcher.matches());// Full text matching returns :true
}
}

Find Method

Matcher.find Method , To find pattern matching , Match to return true.

/*
* File:RegExp.java
* User:iChochy
* URL:https://ichochy.com
* Copyright (c) 2020
* Date:2020/09/07 18:11:07
*/
package com.ichochy.example;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExp {
public static void main(String[] args) {
String input = " My website is :https://ichochy.com, You know what? ?";
// Regular expressions ,(.+) For one or more characters
String regex = "https://.+.com";
Pattern pattern = Pattern.compile(regex);// Compile expressions
Matcher matcher = pattern.matcher(input);// Match expression
System.out.println(matcher.find());// Find a match and return :true
System.out.println(matcher.matches());// Full text matching returns :false
System.out.println(matcher.find());// Find the match again and return :false
matcher.reset();// Reset matcher
System.out.println(matcher.find());// Reset find returns :true
}
}

find Method called multiple times , Problems with different results :

This method starts at the beginning of this matcher's region, or, if a previous invocation of the method was successful and the matcher has not since been reset, at the first character not matched by the previous match.

Matcher.find Method after the first search is successful , If Matcher No reset (Matcher.reset()), Then start to search after the last successful location , So there will be , The match again didn't work , return false.

Group grouping

Regular expressions are expressed by Brackets Group to match ,matcher.group(int group): Get matching information by group number

/*
* File:RegExp.java
* User:iChochy
* URL:https://ichochy.com
* Copyright (c) 2020
* Date:2020/09/07 18:11:07
*/
package com.ichochy.example;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExp {
public static void main(String[] args) {
String input = " My website is :https://ichochy.com, You know what? ?";
String regex = "(https://)(.+)(.com)";// Group expression
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
if(matcher.find()){// Find match successful
// Matching information :https://ichochy.com
System.out.println(matcher.group().toString());
//groupCount Group number
for (int i = 0; i < matcher.groupCount(); i++) {
// Matching information for each group , Be careful : The serial number is from 1 Start
System.out.println(matcher.group(i+1));
}
}
}
}

Regular expression rules

character

structure matching
x character x
\ Backslash character
\0n With octal value 0 The characters of n (0 <= n <= 7)
\0nn With octal value 0 The characters of nn (0 <= n <= 7)
\0mnn With octal value 0 The characters of mnn(0 <= m <= 3、0 <= n <= 7)
\xhh With a hexadecimal value 0x The characters of hh
\uhhhh With a hexadecimal value 0x The characters of hhhh
\t tabs ('\u0009')
\n New line ( Line break ) operator ('\u000A')
\r A carriage return ('\u000D')
\f Page identifier ('\u000C')
\a Call the police (bell) operator ('\u0007')
\e Escape character ('\u001B')
\cx Corresponding to x The controller of

Character class

structure matching
[abc] a、b or c( Simple class )
[^abc] Any character , except a、b or c( no )
[a-zA-Z] a To z or A To Z, Both ends of the letter are included ( Range )
[a-d[m-p]] a To d or m To p:[a-dm-p]( Combine )
[a-z&&[def]] d、e or f( intersection )
[a-z&&[^bc]] a To z, except b and c:[ad-z]( subtract )
[a-z&&[^m-p]] a To z, Instead of m To p:[a-lq-z]( subtract )

Predefined character class

structure matching
. Any character ( May or may not match the line terminator )
\d Numbers :[0-9]
\D The digital : [^0-9]
\s Blank character :[ \t\n\x0B\f\r]
\S Nonwhite space character :[^\s]
\w Word characters :[a-zA-Z_0-9]
\W Nonword character :[^\w]

POSIX Character class ( only US-ASCII)

structure matching
\p{Lower} Lowercase characters :[a-z]
\p{Upper} Capital letter characters :[A-Z]
\p{ASCII} all ASCII:[\x00-\x7F]
\p{Alpha} Alphabetic character :[\p{Lower}\p{Upper}]
\p{Digit} Decimal number :[0-9]
\p{Alnum} Alphanumeric character :[\p{Alpha}\p{Digit}]
\p{Punct} Punctuation :!"#$%&'()*+,-./:;<=>?@[]^_`{
\p{Graph} Visible characters :[\p{Alnum}\p{Punct}]
\p{Print} Printable characters :[\p{Graph}\x20]
\p{Blank} Spaces or tabs :[ \t]
\p{Cntrl} Control characters :[\x00-\x1F\x7F]
\p{XDigit} Hexadecimal number :[0-9a-fA-F]
\p{Space} Blank character :[ \t\n\x0B\f\r]

java.lang.Character class ( ordinary java Character type )

structure matching
\p{javaLowerCase} Equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase} Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace} Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored} Equivalent to java.lang.Character.isMirrored()

Unicode Classes of blocks and categories

structure matching
\p{InGreek} Greek block ( Simple block ) The characters in
\p{Lu} Capital ( Simple categories )
\p{Sc} Currency symbols
\P{InGreek} All characters ,Greek Except in blocks ( no )
[\p{L}&&[^\p{Lu}]] All the letters , Except for capital letters ( subtract )

Boundary matcher

structure matching
^ The beginning of a row
$ End of line
\b Word boundaries
\B Non word boundary
\A The beginning of the input
\G The end of the last match
\Z The end of the input , Only for the last Terminator ( If any )
\z The end of the input

Greedy quantifiers

structure matching
X? X, Not once or once
X* X, Zero or more times
X+ X, Once or more
X{n} X, just n Time
X{n,} X, At least n Time
X{n,m} X, At least n Time , But no more than m Time

Reluctant quantifiers

structure matching
X?? X, Not once or once
X*? X, Zero or more times
X+? X, Once or more
X{n}? X, just n Time
X{n,}? X, At least n Time
X{n,m}? X, At least n Time , But no more than m Time

Possessive quantifiers

structure matching
X?+ X, Not once or once
X*+ X, Zero or more times
X++ X, Once or more
X{n}+ X, just n Time
X{n,}+ X, At least n Time
X{n,m}+ X, At least n Time , But no more than m Time

Logical Operator

structure matching
XY X Heel Y
X Y
(X) X, As a capture group

Back quote

structure matching
\n Anything that matches nth Capture group

quote

structure matching
\ Nothing, But reference the following characters
\Q Nothing, But reference all the characters , until \E
\E Nothing, But the end comes from \Q The starting reference

Special construction ( Not capture )

structure matching
(?:X) X, As a non capture group
(?idmsux-idmsux) Nothing, But it will match the flag i d m s u x on - off
(?idmsux-idmsux:X) X, As with the given flag i d m s u x on - off
(?=X) X, Through zero width positive lookahead
(?!X) X, Through zero width negative lookahead
(?<=X) X, Through zero width positive lookbehind
(?<!X) X, Through zero width negative lookbehind
(?>X) X, As a separate non capture group

summary

You can see , Through flexible rules , Design the expression you want , To match complex strings , So as to operate quickly and conveniently .

Source text https://ichochy.com/posts/20200828/

Please bring the original link to reprint ,thank
Similar articles

2021-06-05

2021-06-05

2021-06-06

2021-06-09

2021-06-09

2021-06-09

2021-06-09

2021-06-10

2021-06-11

2021-06-13

2021-06-13

2021-06-15