<meta name=robots content="noindex,follow"> <body>

MzScheme provides built-in support for regular expression pattern matching on strings and input ports, built on Henry Spencer's package. Regular expressions are specified as strings, using the same pattern language as the Unix utility egrep. String-based regular expressions can be compiled into a regexp value for repeated matches. The internal size of a regexp value is limited to 32 kilobytes; this limit roughly corresponds to a source string with 32,000 literal characters or 5,000 special characters.

The pregexp.ss library of MzLib (see Chapter 22 in PLT MzLib: Libraries Manual) provides a similar -- but more powerful -- form of matching.

Regexp ::= Pieces Match Pieces | Regexp|Regexp Match either Regexp, try left first Pieces ::= Piece Match Piece | PiecePiece Match first Piece followed by second Piece Piece ::= Atom* Match Atom 0 or more times, longest possible | Atom+ Match Atom 1 or more times, longest possible | Atom? Match Atom 0 or 1 times, longest possible | Atom*? Match Atom 0 or more times, shortest possible | Atom+? Match Atom 1 or more times, shortest possible | Atom?? Match Atom 0 or 1 times, shortest possible | Atom Match Atom exactly once Atom ::= (Regexp) Match sub-expression Regexp | [Range] Match any character in Range | [^Range] Match any character not in Range | . Match any character | ^ Match start of string | $ Match end of string | Literal Match a single literal character Literal ::= Any character except (, ), *, +, ?, [, ], ., ^, \, or | | \Aliteral Match Aliteral Aliteral ::= Any character Range ::= ] Range contains ] only | - Range contains - only | ]Lrange Range contains ] and everything in Lrange | -Lrange Range contains - and everything in Lrange | Lrange- Range contains - and everything in Lrange | ]Lrange- Range contains ], -, and everything in Lrange | Lrange Range contains everything in Lrange Lrange ::= Rliteral Range contains a literal character | Rliteral-Rliteral Range contains ASCII range inclusive | LrangeLrange Range contains everything in both Rliteral ::= Any character except ] or -

Figure 1: Grammar for regular expressions

The format of a regular expression is specified by the grammar in Figure 1. A few subtle points about the regexp language are worth noting:

(define r (regexp "(-[0-9]*)+"))  
(regexp-match r "a-12--345b") ; => '("-12--345" "-345") 
(regexp-match-positions r "a-12--345b") ; => '((1 . 10) (5 . 10)) 
(regexp-match "x+" "12345") ; => #f 
(regexp-replace "mi" "mi casa" "su") ; => "su casa" 
(define r2 (regexp "([Mm])i ([a-zA-Z]*)"))  
(define insert "\\1y \\2")  
(regexp-replace r2 "Mi Casa" insert) ; => "My Casa" 
(regexp-replace r2 "mi cerveza Mi Mi Mi" insert) ; => "my cerveza Mi Mi Mi" 
(regexp-replace* r2 "mi cerveza Mi Mi Mi" insert) ; => "my cerveza My Mi Mi"

Chapter 10 Regular Expressions

Chapter 10

Regular Expressions