Google

PLT MzScheme: Language Manual


Syntax and Macros

MzScheme supports the R5RS define-syntax, let-syntax, and letrec-syntax forms with syntax-rules, with minor pattern and template extensions described in section 12.1.

In addition to syntax-rules, MzScheme supports macros that perform arbitrary transformations on syntax. In particular, a transformer expression -- the right-hand side of a define-syntax, let-syntax, or letrec-syntax binding -- can be an arbitrary expression, and it is evaluated in a transformer environment. When the expression produces a procedure, it is associated as a syntax transformer to the identifier bound by define-syntax, let-syntax, or letrec-syntax. This more general, mostly hygienic macro system is based on syntax-case by Dybvig, Hieb, and Bruggeman (see ``Syntactic abstraction in Scheme'' in Lisp and Symbolic Computation, December 1993).

A transformer procedure consumes a syntax object and produces a new syntax object. A syntax object encodes S-expression structure, but also includes source-location information and lexical-binding information for each element within the S-expression. A syntax object is a first-class value, and it can exist at run-time. However, syntax objects are more typically used at syntax-expansion time -- which is the run-time of a transformer procedure.28

Unlike traditional defmacro systems, MzScheme keeps the top-level transformer environment separate from the normal top-level environment. The environments are separated because the expressions in the different environments are evaluated at different times (transformer expressions are evaluated at syntax-expansion time, while normal expressions are evaluated at run time). Separating each environment ensures that compilation and analysis tools can process programs properly. See section 12.3.3 for more information.

Also unlike traditional macro systems, a transformer procedure is invoked whenever its identifier is used in an expression position, not in application positions only. Even more generally, a transformer expression might not produce a procedure value, in which case the non-procedure is associated to its identifier as a generic expansion-time value. For example, a unit signature (see Chapter 35 in PLT MzLib: Libraries Manual) is associated to an identifier through an expansion-time value. See section 12.6 for more information on transformer applications and expansion-time values.

12.1  syntax-rules Extensions

MzScheme extends the pattern language for syntax-rules so that a pattern of the form

(... pattern

is equivalent to pattern where ... is treated like any other identifier. Similarly, a template of the form

(... template

is equivalent to template where ... is treated like any other identifier.

To mesh gracefully with modules, literal identifiers are compared with module-identifier=?, which is equivalent to the comparison behavior of R5RS in the absence of modules; see section 12.3.1 for more information on identifier syntax comparisons.

12.2  Syntax Objects

(read-syntax source-name-v [input-port offset-list]) is like read, except that it produces a syntax object with source-location information. The source-name-v is used as the source field of the syntax object; it can be an arbitrary value, but should generally be a string path for the source file. The offset-list argument is a list of three non-negative, exact integers; the first integer is the line offset for the source line of the returned syntax object, the second is the column offset (for data read on the first line), and the third is the position offset. The default value for offset-list is (list 0 0 0). See section 14.3 for more information about read and read-syntax, see section 11.2.3 for information about port locations, and see section 12.6.2 for information on the property attached to a syntax object by read-syntax.

The eval, compile, expand, expand-once, and expand-to-top-form procedures work on syntax objects. If one of these procedures is given a non-syntax S-expression, the S-expression is converted to a syntax object containing no source information.

The result of read-syntax is a syntax object with source-location information, but no lexical information. Syntax objects acquire lexical information during expansion, so that when a transformer is called, the provided syntax object has lexical information. In addition, the syntax object produced by expand, expand-once, or expand-to-top-form has lexical information that influences future expansion and compilation of the syntax object.

For example, if the following text is parsed by read-syntax,

(lambda (x) (+ x y)) 

the result is a syntax object that contains the S-expression structure '(lambda (x) (+ x y)), but also source information indicating that the first x is in column 10, etc. If expand is applied to the syntax object with a normal top-level environment, then the result will be a similar syntax object (with the source-location information intact), but the second x in the syntax object will have lexical information that ties it to the first x, and y in the syntax object will be annotated as a free variable. Even the syntax object's 'lambda will have lexical information tying it to the built-in lambda form.

Compilation (often as a prelude to interactive evaluation) strips away source and context information as it processes a syntax object. The compilation of a quote-syntax form is an exception:

(quote-syntax datum

The quote-syntax form produces a syntax object that preserves the source-location information for datum. It also encapsulates lexical-binding information accumulated by compilation in the quote-syntax expression's environment. A quote-syntax expression rarely appears in normal expressions; quote-syntax is more typically used within a transformer expression. Unlike quote, quote-syntax fails to compile (i.e., it loops forever) when datum is cyclic.

The syntax-object->datum procedure strips away location and lexical information from a syntax object to produce a plain S-expression. The datum->syntax-object procedure wraps syntax information onto an S-expression, copying the source-location information of a given syntax object and the lexical information of another syntax object. The syntax-e procedure unwraps only the immediate S-expression structure from a syntax object, leaving nested structure in place. These procedures are described in section 12.2.2.

Although procedures such as syntax-object->datum permit arbitrary manipulation of syntax objects, a syntax transformer is more likely to use the pattern-matching syntax-case and syntax forms, which are described in the following subsection.

12.2.1  Syntax Patterns

The syntax-case form pattern-matches and deconstructs a syntax object:

(syntax-case stx-expr (literal-identifier ...)  
  syntax-clause  
  ···)  
 
syntax-clause is one of 
  (pattern expr)  
  (pattern fender-expr expr

If stx-expr expression does not produce a syntax object value, it is converted to one using datum->syntax-object with the lexical context of the expression (see section 12.2.2). The syntax is then compared to the pattern in each syntax-clause until a match is found, and the result of the corresponding expr is the result of the syntax-case expression. If a syntax-clause contains a fender-expr, the clause matches only when both the pattern matches the syntax object and the fender-expr returns a true value. If no pattern matches, a ``bad syntax'' exn:syntax exception is raised.

A pattern is nearly the same as a syntax-rules pattern (see R5RS), with the ellipsis-escaping extension (see section 12.1). The difference is that the first identifier in pattern is not ignored, unlike the leading keyword in a syntax-rules pattern.

As in syntax-rules, a non-literal identifier in a pattern is bound to a corresponding part of the syntax object within the clause's expr and optional fender-expr. The identifier cannot be used directly, however; a use of the identifier in an expression position is a syntax error. Instead, the identifier can be used only in syntax expressions within the binding's scope.

A syntax expression has the form

(syntax template

where template is as in syntax-rules (extended, as usual, for escaped ellipses). The result of a syntax expression is a syntax object. Identifiers in the template that are bound by a syntax-case pattern are replaced with their bindings in the generated syntax object. A syntax expression that contains no pattern identifiers is equivalent to a quote-syntax expression.

The syntax-rules form can be expressed as a syntax-case form wrapped in lambda:

 (syntax-rules (literal-identifier ···) 
   ((ignored-identifier . pattern) template···=expands=> 
 (lambda (stx) 
   (syntax-case stx (literal-identifier ···) 
     ((generated-identifier . pattern) (syntax template)) 
     ···)) 

Note that implicit lambda of syntax-rules for the transformer procedure is made explicit with syntax-case. The define-syntax form supports define-style abbreviations for transformer procedures.

The following example shows one reason to use syntax-case instead of syntax-rules: custom error reporting.

(define-syntax (let1 stx) 
  (syntax-case stx () 
    [(_ id val body) 
     (begin 
       ;; If id is not an identifier, report an error in terms of let1 instead of let: 
       (unless (identifier? (syntax id)) 
         (raise-syntax-error #f "expected an identifier" stx (syntax id))) 
       (syntax (let ([id val]) body)))])) 
(let1 x 10 (add1 x)) ; => 11 
(let1 2 10 (add1 x)) ; => let1: expected an identifier at: 2 in: (let1 2 10 (add1 x)) 

Another reason to use syntax-case is to implement ``non-hygienic'' macros that introduce capturing identifiers:

(define-syntax (if-it stx) 
  (syntax-case stx () 
    [(src-if-it test then else) 
     (syntax-case (datum->syntax-object (syntax src-if-it) 'it) () 
       [it (syntax (let ([it test]) (if it then else)))])]))) 
(if-it (memq 'b '(a b c)) it 'nope) ; => '(b c) 

The nested syntax-case is used to bind the pattern variable it. The syntax for it is generated with datum->syntax-object using the context of src-if-it, which means that that the introduced variable has the same lexical context as if-it at the macro's use; in other words, it acts as if it existed in the input syntax, so it can bind uses of it in test.

The syntax-case* form is a generalization of syntax-case where the procedure for comparing literal-identifiers is determined by a comparison-proc-expr:

(syntax-case* stx-expr (literal-identifier ...) comparison-proc-expr  
  syntax-clause  
  ···

The result of comparison-proc-expr must be a procedure that accepts two arguments. The first argument is an identifier from stx-expr, and the second argument is an identifier from a syntax-clause pattern that is module-identifier=? to one of the literal-identifiers. A true result from the comparison procedure indicates that the first identifier matches the second.

12.2.1.1  Binding Pattern Variables

The with-syntax form is a let-like form for binding pattern variables:

(with-syntax ((pattern stx-expr)  
              ···)  
  expr

A vector of the patterns is matched against a vector of the stx-expr values, and all pattern identifiers are bound in expr. If the result of a stx-expr does not match its pattern, the exn:syntax exception is raised.

The if-it example can be written more simply using with-syntax:

(define-syntax (if-it stx) 
  (syntax-case stx () 
    [(src-if-it test then else) 
     (with-syntax ([it (datum->syntax-object (syntax src-if-it) 'it)]) 
       (syntax (let ([it test]) (if it then else))))])) 

Macros that expand to non-hygienic macros rarely work as intended. For example:

(define-syntax (cond-it stx) 
  (syntax-case stx () 
    [(_ (test body) . rest) 
     (syntax (if-it test body (cond-it . rest)))] 
    [(_) (syntax (void))])) 
(cond-it [(memq 'b '(a b c)) it] [#t 'nope]) ; => undefined variable it 

The problem is that cond-it introduces if-it (hygienically), so cond-it effectively introduces it (hygienically), which doesn't bind it in the source use of cond-it. In general, the solution is to avoid macros that expand to uses of non-hygienic macros.29

12.2.1.2  Quasiquoting Templates

The quasisyntax form is like syntax, except with a quasiquoting within the template:

(quasisyntax quasitemplate

A quasitemplate is the same as a template, except that unsyntax and unsyntax-splicing escape to an expression:

(unsyntax expr) 
(unsyntax-splicing expr

The expression must produce a syntax object (or syntax list) to be substituted in place of the unsyntax or unsyntax-splicing form within the quasiquoting template, just like unquote and unquote-splicing within quasiquote. (If the escaped expression does not generate a syntax object, it is converted to one in the same was as for the right-hand sides of with-syntax.) Nested quasisyntaxes introduce quasiquoting layers in the same way as nested quasiquotes.

Also analogous to quote and quasiquote, the reader converts #' to syntax, #` to quasisyntax, #, to unsyntax, and #,@ to unsyntax-splicing. See also section 14.3.

Example:

(with-syntax ([(v ...) (list 1 2 3)]) 
  #`(0 v ... #,(+ 2 2) #,@(list 5 6) 7)) ; => syntax for (0 1 2 3 4 5 6 7) 

12.2.1.3  Assigning Source Location

The syntax/loc form is like syntax, except that the immediate resulting syntax object takes its source-location information from a supplied syntax object:

(syntax/loc location-stx-expr template

Use syntax/loc instead of syntax whenever possible to help tools that report source locations. For example, the earlier if-it example should have been written with syntax/loc:

(define-syntax (if-it stx) 
  (syntax-case stx () 
    [(src-if-it test then else) 
     (with-syntax ([it (datum->syntax-object (syntax src-if-it) 'it)]) 
       (syntax/loc stx (let ([it test]) (if it then else))))])) 

The quasisyntax/loc form is the quasiquoting analogue of syntax/loc:

(quasisyntax/loc location-stx-expr template

12.2.2  Syntax Object Content

(syntax? v) returns #t if v is a syntax object, #f otherwise.

(syntax-source stx) returns the source for the syntax object stx, or #f if none is known. The source is represented by an arbitrary value (e.g., one passed to read-syntax), but it is typically a file path string. See also section 14.6.

(syntax-line stx) returns the line number (positive exact integer) for the start of the syntax object in its source, or #f if the line number or source is unknown. The result is #f if and only if (syntax-column stx) produces #f. See also section 11.2.3 and section 14.6.

(syntax-column stx) returns the column number (positive exact integer) for the start of the syntax object in its source, or #f if the source column is unknown. The result is #f if and only if (syntax-line stx) produces #f. See also section 11.2.3 and section 14.6.

(syntax-position stx) returns the character position (positive exact integer) for the start of the syntax object in its source, or #f if the source position is unknown. See also section 11.2.3 and section 14.6.

(syntax-span stx) returns the span (non-negative exact integer) in characters of the syntax object in its source, or #f if the span is unknown. See also section 14.6.

(syntax-original? stx) returns #t if stx has the property that read-syntax attaches to the syntax objects that it generates (see section 12.6.2), and if stx's lexical information does not indicate that the object was introduced by a syntax transformer (see section 12.3). The result is #f otherwise. This predicate can be used to distinguish syntax objects in an expanded expression that were directly present in the original expression, as opposed to syntax objects inserted by macros.

(syntax-source-module stx) returns a module path index or symbol (see section 12.6.4) for the module whose source contains stx, or #f if stx has no source module.

(syntax-e stx) unwraps the immediate S-expression structure from a syntax object, leaving nested syntax structure (if any) in place. The result of (syntax-e stx) is one of the following:

  • a symbol

  • a syntax pair (described below)

  • the empty list

  • a vector containing syntax objects

  • some other kind of datum, usually a number, boolean, or string

A syntax pair is a pair containing a syntax object as its first element, and either the empty list, a syntax pair, or a syntax object as its second element.

A syntax object that is the result of read-syntax reflects the use of dots (.) in the input by creating a syntax object for every pair of parentheses in the source, and by creating a pair-valued syntax object only for parentheses in the source. For example:

input read-syntax result
(a b) stx, where
 (syntax-e stx) is equivalent to (list a-stx b-stx)
 and (syntax-e a-stx) is equivalent to 'a
 and (syntax-e b-stx) is equivalent to 'b
(a . (b)) stx, where
 (syntax-e stx) is equivalent to (cons a-stx sb-stx)
 and (syntax-e a-stx) is equivalent to 'a
 and (syntax-e sb-stx) is equivalent to (list b-stx)
 and (syntax-e b-stx) is equivalent to 'b

(syntax->list stx) returns an immutable list of syntax objects or #f. The result is a list of syntax objects when (syntax-object->datum stx) would produce a list. In other words, syntax pairs in (syntax-e stx) are flattened.

(syntax-object->datum stx) returns an S-expression by stripping the syntactic information from stx. Graph structure is preserved by the conversion.

(datum->syntax-object ctxt-stx v [src-stx-or-list prop-stx]) converts the S-expression v to a syntax object, using syntax objects already in v in the result. Converted objects in v are given the lexical context information of ctxt-stx and the source-location information of src-stx-or-list; if the resulting syntax object has no properties, then it is given the properties of prop-stx. Any of ctxt-stx, src-stx-or-list, or prop-stx can be #f, in which case the resulting syntax has no lexical context, source information, and/or new properties. If src-stx-or-list is not #f or a syntax object, it must be a list of five elements:

  (list source-name-v line-k column-k position-k span-k

where source-name-v is an arbitrary value for the source name; line-k is a positive, exact integer for the source line, or #f; and column-k is a positive, exact integer for the source column, or #f; position-k is a positive, exact integer for the source position, or #f; and span-k is a non-negative, exact integer for the source span, or #f. The line-k and column-k values must both be numbers or both be #f, otherwise the exn:application;mismatch exception is raised. Graph structure is preserved by the conversion, but graph structure that is distributed among distinct syntax objects in v may be hidden from future applications of syntax-object->datum and syntax-graph? to the new syntax object.

(syntax-graph? stx) returns #t if stx might be preservably shared within a syntax object created by read-syntax or datum->syntax-object. In general, sharing detection is approximate -- datum->syntax-object can construct syntax objects with sharing that is hidden from syntax-graph? -- but syntax-graph? reliably returns #t for at least one syntax object in a cyclic structure. Meanwhile, deconstructing a syntax object with procedures such as syntax-e and comparing the results with eq? can also fail to detect sharing (even cycles), due to the way lexical information is lazily propagated; only syntax-object->datum reliably exposes sharing in a way that can be detected with eq?.

(identifier? v) returns #t if v is a syntax object and (syntax-e stx) produces a symbol.

(generate-temporaries stx-pair) returns a list of identifiers that are distinct from all other identifiers. The list contains as many identifiers as stx-pair contains elements. The stx-pair argument must be a syntax pair that can be flattened into a list. The elements of stx-pair can be anything, but string, symbol, and identifier elements will be embedded in the corresponding generated name (useful for debugging purposes). Generated identifiers can be used for definitions in a module top level, but section 14.6 describes some limitations with compiled modules.

12.3  Syntax and Lexical Scope

Hygienic macro expansion depends on information associated with each syntax object that records the lexical context of the site where the syntax object is introduced. This information includes the variables that are bound by lambda, let, letrec, etc., at the syntax object's introduction site, the required variables at the introduction site, and the macro expansion that introduces the object.

Based on this information, a particular identifier syntax object falls into one of three classifications:

  • lexical -- the identifier is bound by lambda, let, letrec, or some other form besides module or a top-level definition.

  • module-imported -- the identifier is bound through a require declaration or a top-level definition within module.

  • free -- the identifier is not bound (and therefore refers to a top-level variable, if the identifier is not within a module).

The identifier-binding procedure (described in section 12.3.2) reports an identifiers classification. Further information about a lexical identifier is available only in relative terms, such as whether two identifiers refer to the same binding (see bound-identifier=? in section 12.3.1). For module-imported identifiers, information about the module source is available.

In a freshly read syntax object, identifiers have no lexical information, so they are all classified as free. During expansion, some identifiers acquire lexical or module-import classifications. An identifier that becomes classified as lexical will remain so classified, though its binding might shift as expansion proceeds (i.e., as nested binding expressions are parsed, and as macro introductions are tracked). An identifier classified as module-imported might similarly shift to the lexical classification, but if it remains module-imported, its source-module designation will never change.

Lexical information is used to expand and parse syntax in a way that it obeys lexical and module scopes. In addition, an identifier's lexical information encompasses a second dimension, which distinguishes the environment of normal expressions from the environment of transformer expressions. The module bindings of each environment can be different, so an identifier may be classified differently depending on whether it is ultimately used in a normal expression or in a transformer expression. See section 12.3.3 and section 12.3.4 for more information on the two environments.

12.3.1  Syntax Object Comparisons

(bound-identifier=? a-id-stx b-id-stx) returns #t if the identifier a-id-stx would bind b-id-stx (or vice-versa) if the identifiers were substituted in a suitable expression context, #f otherwise.

(free-identifier=? a-id-stx b-id-stx) returns #t if a-id-stx and b-id-stx access the same lexical, module, or top-level binding and return the same result for syntax-e, #f otherwise.

(module-identifier=? a-id-stx b-id-stx) returns #t if a-id-stx and b-id-stx access the same lexical, module, or top-level binding in the normal environment. Due to renaming in require and provide, the identifiers may return distinct results with syntax-e.

(module-transformer-identifier=? a-id-stx b-id-stx) returns #t if a-id-stx and b-id-stx access the same lexical, module, or top-level binding in the identifiers' transformer environments (see section 12.3.3).

(check-duplicate-identifier id-stx-list) compares each identifier in id-stx-list with every other identifier in the list with bound-identifier=?. If any comparison returns #t, one of the duplicate identifiers is returned (the first one in id-stx-list that is a duplicate), otherwise the result is #f.

12.3.2  Syntax Object Bindings

(identifier-binding id-stx) returns one of three kinds of values, depending on the binding of id-stx in its normal environment:

  • The result is 'lexical if id-stx is bound in its context to anything other than a top-level variable or a module variable.

  • The result is a list of four items when id-stx is bound in its context to a module-defined variable: (list source-mod source-id nominal-source-mod nominal-source-id).

    • source-mod is a module path index or symbol (see section 12.6.4) that indicates the defining module.

    • source-id is a symbol for the variable's name at its definition site in the source module (as opposed to the local name returned by syntax-object->datum).

    • nominal-source-mod is a module path index or symbol (see section 12.6.4) that indicates the module required into the context of id-stx to provide its binding. It can be different from source-mod due to a re-export in nominal-source-mod of some imported identifier.

    • nominal-source-id is a symbol for the variable's name as exported by nominal-source-mod. It can be different from source-id due to a renaming provide, even if source-mod and nominal-source-mod are the same.

  • The result is #f if id-stx is not bound (or bound only to a top-level variable) in its lexical context.

(identifier-transformer-binding id-stx) is like identifier-binding, except that the reported information is for the identifier's bindings in the transformer environment (see section 12.3.3), instead of the normal environment. If the result is 'lexical for either of identifier-binding or identifier-transformer-binding, then the result is always 'lexical for both.

(identifier-binding-export-position id-stx) returns either #f or an exact non-negative integer. It returns an integer only when identifier-binding returns a list, when id-stx represents an imported binding, and when the source module assigns internal positions for its definitions. This function is intended for use by mzc.

(identifier-transformer-binding-export-position id-stx) is like identifier-binding-export-position, except that the reported information is for the transformer environment. This function is intended for use by mzc.

12.3.3  Transformer Environments

The top-level environment for transformer expressions is separate from the normal top-level environment. Consequently, top-level definitions are not available for use in top-level transformer definitions. For example, the following program does not work:

(define count 0) 
(define-syntax (let1 stx) 
  (syntax-case stx () 
    [(_ x v b) 
     (begin 
       (set! count (add1 count)) ; DOESN'T WORK 
       (syntax (let ([x v]) b)))])) 
(let1 x 2 (add1 x)) 

The variable count is bound in the normal top-level environment, but it is not bound in the transformer environment, so the attempt to expand (let1 x 2 (add1 x)) will result in an undefined-variable error.

The initial namespace created by the stand-alone MzScheme application imports all of MzScheme's built-in syntax, procedures, and constants into the transformer environment.30 To extend this environment, a programmer must place definitions into a module, and then use require-for-syntax to import the definitions into the top-level transformer environment.

Like a top-level definition, a top-level require expression imports into the normal environment, and the imported bindings are not made visible in the transformer environment. A top-level require-for-syntax imports into the transformer environment without affecting the normal environment. The require and require-for-syntax forms create separate instantiations of any module that is imported into both environments, in keeping with the separation of the environments.

When a lexical variable is introduced by a form other than module or a top-level definition, it extends the environment for both normal and transformer expressions within its scope, but the binding is only accessible by expressions resolved in the proper environment (i.e., the one in which it was introduced). In particular, a transformer expression in a let-syntax or letrec-syntax expression cannot access identifiers bound by enclosing forms, and an identifier bound in a transformer expression should not appear as an expression in the result of the transformer. Such out-of-context uses of a variable are flagged as syntax errors when attempting to resolve the identifier.

A let-syntax or letrec-syntax expression can never usefully appear as a transformer expression, because MzScheme provides no mechanism for importing into the meta-transformer environment that would be used by meta-transformer expressions to operate on transformer expressions. In other words, an expression of the form

(let-syntax ([identifier (let-syntax ([identifier expr])  
                                 body-expr)])  
  ...

is always illegal, assuming that let-syntax is bound in both the normal and transformer environments to the let-syntax of mzscheme. No syntax (not even function application) is bound in expr's environment. This restriction in the mzscheme language is of little consequence, however, since for-syntax exports allow the definition of syntax applicable to the above body-expr.

12.3.4  Module Environments

In the same way that the normal and transformer environments are kept separate at the top level, a module's normal and transformer environments are also separated. Normal imports and definitions in a module -- both variable and syntax -- contribute to the module's normal environment, only.

For example, the module expression

(module m mzscheme  
  (define (id x) x) 
  (define-syntax (macro stx) 
    (id (syntax (printf "hi~n"))))) 

is ill-formed because id is not bound in the transformer environment for the macro implementation. To make id usable from the transformer, the body of the module m would have to be executed -- which is impossible in general, because a syntax definition such as macro affects the expansion of the rest of the module body.

Consequently, if a procedure such as id is to be used in a transformer, it must either remain local to the transformer expression, or reside in a different module. For example, the above module is trivially repaired as

(module m mzscheme  
  (define-syntax macro 
    (let ([id (lambda (x) x)]) 
      (lambda (stx) 
        (id (syntax (printf "hi~n"))))))) 

The define-syntaxes form (see section 12.4) is useful for defining multiple macros that share helper functions. See also define-syntax-set in Chapter 14 in PLT MzLib: Libraries Manual.

In the mzscheme language, the base environment for a transformer expression includes all of MzScheme. The mzscheme language also provides a require-for-syntax form (in the normal environment) for importing bindings from another module into the importing module's transformer environment:

(require-for-syntax require-spec ...) 

A for-syntax import causes the referenced module to be executed at expansion time, instead of (or possibly in addition to) run time for the module being expanded. The syntax and variable identifiers exported by the for-syntax module are visible within the module's transformer environment, but not its normal environment. Like a normal expression, a transformer expression in a module cannot contain free variables.

Transformer expressions and imports for a module M are executed once each time a module is expanded using M's syntax bindings or using M as a for-syntax import. After the module is expanded, its transformer environment is destroyed, including bindings from modules used at expansion time.

Example:

 (module rt mzscheme 
   (printf "RT here~n") 
   (define mx (lambda () 7)) 
   (provide mx)) 
 
 (module et mzscheme 
   (printf "ET here~n") 
   (define mx (lambda () 700)) 
   (provide mx)) 
 
 (module m mzscheme 
   (require-for-syntax mzscheme) 
   (require rt)               ; rt provides run-time mx 
   (require-for-syntax et)    ; et provides exp-time mx 
 
   ; The mx below is run-time: 
   (printf "~a~n" (mx))       ; prints 7 when run 
 
   ; The mx below is exp-time: 
   (define-syntax onem (lambda (stx) (datum->syntax-object (mx) stx stx))) 
   (printf "~a~n" (onem))    ; prints 700 when run 
 
   ; The mx below is run-time: 
   (define-syntax twom (lambda (stx) (syntax (mx)))) 
   (printf "~a~n" (twom)))    ; prints 7 when run 
 
 ; "ET here" is printed during the expansion of m 
 
 (require m) ; prints "RT here", then 7, then 700, then 7 

This expansion-time execution model explains the need to execute declared modules only when they are invoked. If a declared module is imported into other modules only for syntax, then the module is needed only at expansion time and can be ignored at run time. The separation of declaration and execution also allows a for-syntax module to be executed once for each module that it expands through require-for-syntax.

The hierarchy of run times avoids confusion among expansion and executing layers that can prevent separate compilation. By ensuring that the layers are separate, a compiler or programming environment can expand, partially expand, or re-expand a module without affecting the module's run-time behavior, whether the module is currently executing or not.

Since transformer expressions may themselves use macros defined by modules with for-syntax imports (to implement the macros), expansion of a module creates a hierarchy of run times (or "tower of expanders"). The expansion time of each layer corresponds to the run time of the next deeper layer.

In the absence of let-syntax and letrec-syntax, the hierarchy of run times would be limited to three levels, since the transformer expressions for run-time imports would have been expanded before the importing module must be expanded. The let-syntax and letrec-syntax forms, however, allow syntax visible in a for-syntax import's transformers to appear in the expansion of transformer expressions in the module. Consequently, the hierarchy is bounded in principle only by the number of declared modules. In practice, the hierarchy will rarely exceed a few levels.

12.4  Binding Multiple and Fluid Syntax Identifiers

In addition to define-syntax, let-syntax, and letrec-syntax, MzScheme provides define-syntaxes, let-syntaxes, and letrec-syntaxes. These forms are analogous to define-values, let-values, and letrec-values, allowing multiple syntax bindings at once (see section 2.8).

(define-syntaxes (variable ···) expr) 
 
(let-syntaxes (((variable ···) expr···expr ···1) 
 
(letrec-syntaxes (((variable ···) expr···expr ···1

MzScheme also provides a letrec-syntaxes+values form for binding both values and syntax in a single, mutually recursive scope:

(letrec-syntaxes+values (((variable ···) expr) ···) 
                        (((variable ···) expr) ···expr ···1

The first set of bindings are syntax bindings (as in letrec-syntaxes), and the second set of bindings are normal variable bindings (as in letrec-values).

Examples:

;; Defines let/cc and let-current-continuation as the same macro: 
(define-syntaxes (let/cc let-current-continuation) 
  (let ([macro (syntax-rules () 
                 [(_ id body1 body ...)  
                  (call/cc (lambda (id) body1 body ...))])]) 
    (values macro macro))) 
 
(letrec-syntaxes+values ([(get-id) (syntax-rules () 
                                    [(_) id])]) 
                        ([(id) (lambda (x) x)] 
                         [(x) (get-id)]) 
   x) ; => the id identify procedure 

Finally, MzScheme provides fluid-let-syntax, which is roughly analogous to fluid-let.

(fluid-let-syntax ((variable expr···body-expr ···1

Instead of introducing a new binding, fluid-let-syntax alters the mapping for each variable while expanding the body-exprs. Each variable need not have been mapped to expansion-time values before, and the re-mapping is not restricted to instances of variable in the body-exprs; it applies when resolving any identifier that is bound-identifier=? to variable while the body-exprs are expanded. However, fluid-let-syntax does not mutate any state that is visible to other expansions (that are possibly running in other threads).

12.5  Special Syntax Identifiers

To enable the definition of syntax transformers for application forms and other data (numbers, vectors, etc.), the syntax expander treats #%app, #%top, and #%datum as special identifiers.

Any expandable expression of the form

(datum . datum

where the first datum is not an identifier bound to an expansion-time value, is treated as

(#%app datum . datum

so that the syntax transformer bound to #%app is applied. In addition, () is treated as (#%app). Similarly, an expression

identifier 

where identifier has no binding other than a top-level or local module binding, is treated as

(#%top . identifier

Finally, an expression

datum 

where datum is not an identifier or pair, is treated as

(#%datum . datum

The mzscheme module binds #%app, #%top, and #%datum as regular application, top-level variable reference, and implicit quote, respectively. A module can export different transformers with these names to support languages different from conventional Scheme.

In addition, #%module-begin is used as a transformer for a module body. The mzscheme module binds #%module-begin to a form that inserts a for-syntax import of mzscheme for syntax definitions. It also exports #%plain-module-begin, which can be substituted for #%module-begin to avoid the for-syntax import of mzscheme. Any other transformer used for #%module-begin must expand to mzscheme's #%module-begin or #%plain-module-begin.

When an expression is fully expanded, all applications, top-level variable references, and literal datum expressions will appear as explicit #%app, #%top, and #%datum forms, respectively. Those forms can also be used directly by source code. The #%module-begin form can never usefully appear in an expression, and the body of a fully expanded module declaration is not wrapped with #%module-begin.

The following example shows how the special syntax identifiers can be defined to create a non-Scheme module language:

(module lambda-calculus mzscheme  
   
  ; Restrict lambda to one argument:  
  (define-syntax lc-lambda  
    (syntax-rules ()  
      [(_ (x) E) (lambda (x) E)]))  
   
  ; Restrict application to two expressions: 
  (define-syntax lc-app  
    (syntax-rules ()  
      [(_ E1 E2) (E1 E2)]))  
   
  ; Restrict a lambda calculus module to one body expression:  
  (define-syntax lc-module-begin   
    (syntax-rules ()  
      [(_ E) (#%module-begin E)]))  
   
  ; Disallow numbers, vectors, etc.  
  (define-syntax lc-datum  
    (syntax-rules ()))  
   
  ; Provide (with renaming):  
  (provide #%top ; keep mzscheme's free-variable error  
           (rename lc-lambda lambda)  
           (rename lc-app #%app)  
           (rename lc-module-begin #%module-begin)  
           (rename lc-datum #%datum)))  
   
(module m lambda-calculus  
  ; The only syntax defined by lambda-calculus is  
  ; unary lambda, unary application, and variables.  
  ; Also, the module must contain exactly one expression.  
  ((lambda (y) (y y))  
   (lambda (y) (y y))))  
   
(require m)     ; executes m, loops forever 

12.6  Macro Expansion

A define-syntax, let-syntax, or letrec-syntax form associates an identifier to an expansion-time value. If the expansion-time value is a procedure of one argument, then the procedure is applied by the syntax expander when the identifier is used in the scope of the syntax binding.

The transformer for an identifier is applied whenever the identifier appears in an expression position -- not just when it appears after a parenthesis as (identifier ...). When it does appear as (identifier ...), the entire (identifier ...) expression is provided as the argument to the transformer. Otherwise only identifier is provided to the transformer.

A typical transformer is implemented as

(lambda (stx)  
  (syntax-case stx () 
    [(_ rest-of-pattern) expr])) 

so that identifier by itself does not match the pattern; thus, the exn:syntax exception is raised when identifier does not appear as (identifier ...).

(make-set!-transformer proc) also creates a transformer procedure. The proc argument must be a procedure of one argument; if the result of (make-set!-transformer proc) is bound as syntax to identifier, then proc is applied as a transformer when identifier is used in an expression position, or when it is used as the target of a set! assignment: (set! identifier expr).

Example:

(let ([x 1] 
      [y 2]) 
  (let-syntax ([x (make-set!-transformer 
                    (lambda (stx) 
                     (syntax-case stx (set!; Redirect mutation of x to y 
                       [(set! id v) (syntax (set! y v))])))] 
                       ; Normal use of x really gets x 
                       [id (identifier? (syntax id)) (syntax x)])))]) 
    (begin 
      (set! x 3) 
      (list x y)))) ; => '(1 3) 

(set!-transformer? v) returns #t if v is a value created by make-set!-transformer, #f otherwise.

If a transformer expression produces a non-procedure value, the value is associated with the identifier as a generic expansion-time value. Any use of the identifier in an expression position is rejected as a syntax error, but syntax transformers can access the value. For example, the define-signature form (see Chapter 35 in PLT MzLib: Libraries Manual) associates a component interface description to the defined identifier.

When a syntax transformer is applied, it can query the bindings of identifiers in the lexical environment of the expression being transformed. For example, the unit/sig form can access a named interface description with syntax-local-value:

  • (syntax-local-value id-stx [failure-thunk]) returns the expansion-time value of id-stx in the transformed expression's context. If id-stx is not bound to an expansion-time value (via define-syntax, let-syntax, etc.) in the environment of the expression being transformed, the result is obtained by applying failure-thunk. If failure-thunk is not provided, the exn:application:mismatch exception is raised.

  • (syntax-local-name) returns an inferred name for the expression position being transformed, or #f; see also section 6.2.4.

  • (syntax-local-context) returns either 'expression, 'top-level, 'internal-define, or 'module, indicating whether the expression is being expanded for a (non-definition) expression position, a top-level position, a (potential) internal-definition position, or a module top-level position, respectively.

A transformer can also expand or partially expand subexpressions from its input syntax object:

  • (local-expand stx context-symbol stop-id-stx-list) expands stx in the lexical context of the expression currently being expanded. The context-symbol argument is used as the result of syntax-local-context for immediate expansions; it must be one of the legal return values for syntax-local-context. When an identifier in stop-id-stx-list is encountered by the expander in a subexpression, expansions stops for the subexpression.

    If #%app, #%top, or #%datum (see section 12.5) appears in stop-id-stx-list, then application, top-level variable reference, and literal data expressions without the respective explicit form are not wrapped with the explicit form.

To track the introduction of identifiers by a macro (see section 12.3), the syntax expander adds a special ``mark'' to a syntax object that is provided to a transformer, and also marks the result of the transformer. Double marks cancel, and each transformer application has a distinct mark, so the only parts of the resulting syntax object with marks are the parts that were introduced by the transformer. A transformer can explicitly add a current mark to a syntax object using syntax-local-introduce:

  • (syntax-local-introduce stx) produces a syntax object that is like stx, except that a mark for the current expansion is added (possibly canceling an existing mark in parts of stx).

Explicit marking is useful on syntax objects that flow into or out of a transformer without being the transformer argument or result. For example, DrScheme's Check Syntax tool recognizes a 'bound-in-source property that specifies bound-binding identifier pairs in the source program that do not appear as bound and binding identifiers in the expansion. Example:

(define-syntax (match-list stx) 
  (syntax-case stx () 
    [(_ expr (id ...) result-id) 
     (let ([ids (syntax->list (syntax (id ...)))] 
           [result-id (syntax result-id)]) 
       ;; Make sure the expression is well formed: 
       (for-each (lambda (id) 
                   (unless (identifier? id) 
                     (raise-syntax-error #f "not an identifier" stx id))) 
                 (append ids (list result-id))) 
       ;; Find the matching variable and produce a list-ref expression: 
       (let loop ([ids ids] [pos 0]) 
         (cond 
           [(null? ids) (raise-syntax-error #f "no pattern binding" stx result-id)] 
           [(bound-identifier=? (car ids) result-id;; Found it; produce the list-ref expression, and 
            ;; tell the Check Syntax tool about the pattern-variable binding: 
            (with-syntax ([pos pos]) 
              (syntax-property 
               (syntax (list-ref expr pos)) ; the expansion result 
               'bound-in-source 
               (cons 
                (syntax-local-introduce (car ids)) 
                (syntax-local-introduce result-id))))] 
           [else (loop (cdr ids) (add1 pos))])))])) 
 
;; Test it: 
(match-list '(1 2 3) (a b c) b) ; => 2 

In this example, Check Syntax will draw a binding arrow from the first b to the second b. Without the calls to syntax-local-introduce, the identifiers stored in the property would appear to have originated from the transformer, instead of from the transformer's argument; consequently, Check Syntax would not draw the arrow, because it would not know that the bs exist in the source program.

12.6.1  Expanding Expressions to Primitive Syntax

(expand stx-or-sexpr) expands all non-primitive syntax in stx-or-sexpr, and returns a syntax object for the expanded expression. See below for the grammar of fully expanded expressions. Use syntax-object->datum to convert the returned syntax object into an S-expression.

(expand-once stx-or-sexpr) partially expands syntax in the stx-or-sexpr and returns a syntax object for the partially-expanded expression. Due to limitations in the expansion mechanism, some context information may be lost. In particular, calling expand-once on the result may produce a result that is different from expansion via expand.

(expand-to-top-form stx-or-sexpr) partially expands syntax in stx-or-sexpr to reveal the outermost syntactic form. This partial expansion is mainly useful for detecting top-level uses of begin. Unlike expanding the result of expand-once, expanding the result of expand-to-top-form with expand produces the same result as using expand on the original syntax.

The possible shapes of a fully expanded expression are defined by top-level-expr:

top-level-expr is one of 
  general-top-level-expr 
  (module identifier name (#%plain-module-begin module-level-expr ···)) 
  (begin top-level-expr ···) 
 
module-level-expr is one of 
  general-top-level-expr 
  (provide provide-spec ...) 
  (begin module-level-expr ···) 
 
general-top-level-expr is one of 
  expr 
  (define-values (variable ···) expr) 
  (define-syntaxes (variable ···) expr) 
  (require require-spec ···) 
  (require-for-syntax require-spec ···) 
 
expr is one of 
  variable 
  (lambda formals expr ···1) 
  (case-lambda (formals expr ···1) ···) 
  (if expr expr) 
  (if expr expr expr) 
  (begin expr ···1) 
  (begin0 expr expr ···) 
  (let-values (((variable ···) expr) ···) expr ···1) 
  (letrec-values (((variable ···) expr) ···) expr ···1) 
  (set! variable expr) 
  (quote datum) 
  (quote-syntax datum) 
  (with-continuation-mark expr expr expr) 
  (#%app expr ···1) 
  (#%datum . datum) 
  (#%top . variable

where formals is defined in section 2.9, and require-spec and provide-spec are defined in section 5.2.

When a variable expression appears in a fully-expanded expression, it either refers to a variable bound by lambda, case-lambda, let-values, or letrec-values, or it refers to an imported variable. (In other words, a variable not wrapped by #%top never refers to a top-level variable, and it never refers to a non-imported variable that is defined at the top-level of a module.)

The keywords in the above grammar are placeholders for identifiers that are module-identifier=? (or module-transformer-identifier=? for define-syntax expressions) to the same-named exports of mzscheme. Due to import renamings, the printed identifier names can be different in the expanded expression.

12.6.2  Syntax Object Properties

Every syntax object has an associated property list, which can be queried or extended with syntax-property:

  • (syntax-property stx key-v v) extends stx by associating an arbitrary property value v with the key key-v; the result is a new syntax object with the association (while stx itself is unchanged).

  • (syntax-property stx key-v) returns an arbitrary property value associated to stx with the key key-v, or #f if no value is associated to stx for key-v.

Both the syntax input to a transformer and the syntax result of a transformer may have associated properties. The two sets of properties are merged by the syntax expander: each property in the original and not present in the result is copied to the result, and the values of properties present in both are combined with cons-immutable (result value first, original value second).

Before performing the merge, however, the syntax expander automatically adjusts a property on the original syntax object using the key 'origin. If the source syntax has no 'origin property, it is set to the empty list. Then, still before the merge, the identifier that triggered the macro expansion (as syntax) is cons-immutabled onto the 'origin property so far.

The 'origin property thus records (in reverse order) the sequence of macro expansions that produced an expanded expression. Usually, the 'origin value is an immutable list of identifiers. However, a transformer might return syntax that has already been expanded, in which case an 'origin list can contain other lists after a merge.

For example, the expression

(or x y

expands to

(let ((or-part x)) (if or-part or-part (or y))) 

which, in turn, expands to

(let-values ([(or-part) x]) (if or-part or-part y)) 

The syntax object for the final expression will have an 'origin property whose value is (list-immutable (quote-syntax let) (quote-syntax or)).

When read-syntax generates a syntax object, it attaches a property to the object (using a private key) to mark the object as originating from a read. The syntax-original? predicate looks for the property to recognize such syntax objects.

See section 12.6.4 for information about properties generated by the expansion of a module declaration. See section 3.10.1 and section 6.2.4 for information about properties recognized when compiling a procedure. See section 14.6 for information on properties and byte codes.

12.6.3  Information on Structure Types

The define-struct form (see section 4.1) binds the name of a structure type to an expansion-time value that records the identifiers bound to the structure type, the constructor procedure, the predicate procedure, and the field accessor and mutator procedures. This information can be used during the expansion of other expressions.

For example, the define-struct variant for subtypes (see section 4.2) uses the base type name t to find the variable struct:t containing the base type's descriptor; it also folds the field accessor and mutator information for the base type into the information for the subtype. The match form (see Chapter 19 in PLT MzLib: Libraries Manual) uses a type name to find the predicates and field accessors for the structure type.

Besides using the information, other syntactic forms can even generate information with the same shape. For example, the struct form in an imported signature for unit/sig (see Chapter 35 in PLT MzLib: Libraries Manual) causes the unit/sig transformer to generate information about imported structure types, so that match and subtyping define-struct expressions work within the unit.

The expansion-time information for a structure type is represented as an immutable list of five items:

  • an identifier that is bound to the structure type's descriptor, or #f it none is known;

  • an identifier that is bound to the structure type's constructor, or #f it none is known;

  • an identifier that is bound to the structure type's predicate, or #f it none is known;

  • an immutable list of identifiers bound to the field accessors of the structure type, optionally with #f as the list's last element. A #f as the last element indicates that the structure type may have additional fields, otherwise the list is a reliable indicator of the number of fields in the structure type. Furthermore, the accessors are listed in reverse order for the corresponding constructor arguments. (The reverse order enables sharing in the lists for a subtype and its base type.)

  • an immutable list of identifiers bound to the field mutators of the structure type, optionally with #f as the list's last element. The meaning of #f and the order are the same as for the accessor identifiers.

The implementor of a syntactic form can expect users of the form to know what kind of information is available about a structure type. For example, the match implementation works with structure information containing an incomplete set of accessor bindings, because the user is assumed to know what information is available in the context of the match expression. In particular, the match expression can appear in a unit/sig form with an imported structure type, in which case the user is expected to know the set of fields that are listed in the signature for the structure type.

12.6.4  Information on Expanded and Compiled Modules

MzScheme provides an interface for obtaining information about a expanded or compiled module declaration's imports and exports. This information is intended for use by tools such as a compilation manager. The information usually identifies modules through a module path index, which is a semi-interned31 opaque value that encodes a relative module path (see section 5.4) and another index to which it is relative.

Where an index is expected, a symbol can usually take its place, representing a literal module name. A symbol is used instead of an index when a module is imported using its name directly with require instead of a module path.

An index that returns #f for its path and base index represents ``self'' -- i.e., the module declaration that was the source of the index -- and such an index is always used as the root for a chain of indices. For example, when extracting information about an identifier's binding within a module, if the identifier is bound by a definition within the same module, the identifier's source module will be reported using the ``self'' index. If the identifier is instead defined in a module that is imported via a module path (as opposed to a literal module name), then the identifier's source module will be reported using an index that contains the required module path and the ``self'' index.

  • (module-path-index? v) returns #t if v is a module path index, #f otherwise.

  • (module-path-index-split module-path-index) returns two values: a non-symbol S-expression representing a module path, and a base index (to which the module path is relative), symbol, or #f. A #f second result means ``relative to a top-level environment''. A #f for the first result implies a #f for the second result, and means that module-path-index represents ``self'' (see above).

  • (module-path-index-join module-path module-path-index) combines module-path and module-path-index to create a new module path index. The module-path argument can be anything except a symbol, and the module-path-index argument can be a index, symbol, or #f.

Information for an expanded module declaration is stored in a set of properties attached to the syntax object:

  • 'module-direct-requires -- an immutable list of module path indices (or symbols) representing the modules explicitly imported into the module.

  • 'module-direct-for-syntax-requires -- an immutable list of module path indices (or symbols) representing the modules explicitly for-syntax imported into the module.

  • 'module-variable-provides -- an immutable list of provided items, where each item is one of the following:

    • symbol -- represents a locally defined variable that is provided with its defined name.

    • (cons-immutable provided-symbol defined-symbol) -- represents a locally defined variable that is provided with renaming; the first symbol is the exported name, and the second symbol is the defined name.

    • (list*-immutable module-path-index provided-symbol defined-symbol) -- represents a re-exported and possibly re-named variable from the specified module; module-path-index is either an index or symbol, indicating the source module for the binding. The provided-symbol is the external name for the re-export, and defined-symbol is the originally defined name in the module specified by module-path-index.

  • 'module-syntax-provides -- like 'module-variable-provides, but for syntax exports instead of variable exports.

  • 'module-indirect-provides -- an immutable list of symbols for variables that are defined in the module but not exported; they may be exported indirectly through macro expansions.

  • 'module-kernel-reprovide-hint -- either #f, #t, or a symbol. If it is #t, then the module re-exports all of the functionality from MzScheme's internal kernel module. If it is a symbol, then all kernel exports but the indicated one is re-exported, and some other export is provided with the indicated name. This ad hoc information is used in an optimization by the mzc compiler.

  • 'module-self-path-index -- a module path index whose parts are both #f. This information is used by the mzc compiler to manage syntax objects (which contain module-relative information keyed on the module's own index).

(compiled-module-expression? v) returns #t if v is a compiled expression for a module declaration, #f otherwise. See also section 14.6.

(module-compiled-name compiled-module-code) takes a module declaration in compiled form (see section 14.6) and returns a symbol for the module's declared name.

(module-compiled-imports compiled-module-code) takes a module declaration in compiled form (see section 14.6) and returns two values: an immutable list of module path indices (and symbols) for the module's explicit imports, and an immutable list of module path indices (and symbols) for the module's explicit for-syntax imports.


28 In general, modules and for-syntax imports create a hierarchy of run times and expansion times. See section 12.3.4 for more information.

29 In this particular case, Shriram Krishnamurthi points out changing if-it to use (datum->syntax-object (syntax test) 'it) solves the problem in a sensible way.

30 In contrast, a namespace created by (scheme-report-environment 5) imports only syntax-rules into the transformer environment.

31 Multiple references to the same relative module tend to use the same index value, but not always.