Regular expressions are a powerful tool — if you know how. Practical cheat sheet to bookmark.
Basic Characters¶
. \d \D \w \W \s \S \b
Quantifiers¶
* + ? {3} {2,5} {3,} *?
Groups¶
(abc) (?:abc) a|b \1
Lookahead/behind¶
(?=x) (?!x) (?<=x) (?<!x)
Examples¶
Email: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}
IP: \b(?:\d{1,3}.){3}\d{1,3}\b
Date: \d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])
Phone CZ: (?:+420)?\s?\d{3}\s?\d{3}\s?\d{3}
In Practice¶
grep -oP ‘\b\d{1,3}(.\d{1,3}){3}\b’ access.log
import re; emails = re.findall(r’[\w.+-]+@[\w-]+.[\w.]+’, text)
Advanced Techniques¶
Named groups ((?P<name>...) in Python) improve regex readability and allow access to captured groups by name. Non-capturing groups ((?:...)) group without capturing, which speeds up the engine. Lookahead ((?=...)) and lookbehind ((?<=...)) check context without including it in the result — useful for password validation (must contain a digit, but you do not want the digit in the match).
Atomic groups and possessive quantifiers prevent backtracking and protect against ReDoS (Regular Expression Denial of Service). Always test regex on edge cases and large inputs. Lazy quantifiers (*?, +?) match the minimum number of characters, unlike greedy variants that match the maximum. For complex parsing (HTML, JSON, programming languages), do not use regex — use a dedicated parser.
Tip¶
Test on regex101.com. And if regex exceeds 2 lines, consider a parser.