Grouping in Regex
Grouping is a powerful feature of regular expressions that can simplify a complex pattern. For example, you can use grouping to match repeated sequences of characters, such as phone numbers or email addresses.
In regex, any subpattern enclosed within the parentheses () is considered a group. For example, (xyz)
creates a group that matches the exact sequence "xyz".
You can combine groups with other metacharacters/characters in a regex pattern. The following finds xyz
group that is followed by x
.
Notice that the x is out of parenthesis so it is not considered as a group character. Regex will match 'xyzx'.
The following uses alteration |
metacharacter:
The above pattern (yz|zy)
searches for either 'yz' or 'zy' sequence in the text.
Capturing Groups and Backreferences
The groups in the regex pattern can be captured. The matched text inside the parentheses is remembered and can be referenced later using backreferences. This allows you to reuse captured portions within the same regex pattern or to replace a string.
For example, the regex pattern (\d{2})-(\d{4})
matches a string with a two-digit number followed by a hyphen, and then a four-digit number. The parentheses around each group capture the digits and remember them for later use.
Backreferences allow us to reuse the captured text in the regular expression or in the replacement string. We can refer to a captured group using a backslash followed by a number that refers to the matched text of the corresponding capturing group.
For example, in the regular expression above, the backreference \1
refers to the first capturing group (the two-digit number), and the backreference \2
refers to the second capturing group (the four-digit number).
Non-Capturing Groups
If you don't need to capture the matched portion, you can create non-capturing groups by using ?:
immediately after the opening parenthesis (?: )
.
Non-capturing groups make your regular expressions more efficient by reducing the amount of memory needed to store captured groups. However, non-capturing groups cannot be referenced later in the regular expression or in replacement strings.