PRECCX USER MANUAL

P.T. Breuer and J.P. Bowen

(10 August 1994)

Welcome to PRECCX v2.42
Copyright notice
INTRODUCTION
INSTALLATION
DESCRIPTION ...
USING PRECCX ...
LIFE CYCLE ...

Welcome to PRECCX v2.42


PRECCX stands for PREttier Compiler Compiler (eXtended). PRECCX converts
context-grammar definition scripts (with a .y extension) into ANSI C code
scripts (with a .c extension) that can in turn be compiled into working
parsers, interpreters or compilers using a standard ANSI C compiler.



Copyright notice


PRECCX v2.42, Copyright © 1989-1994, P.T. Breuer        




INTRODUCTION


PRECCX is a compiler compiler which converts PRECCX context-grammar
definition scripts (with a .y extension) into ANSI C code scripts (with
a .c extension).

The output code compiles under the GNU Software Foundations ANSI C
compiler, gcc, and will compile with other fully compliant
ANSI compilers. PRECCX is coded in ANSI C generated by PRECCX itself
from its own definition script, and is fully portable.

PRECCX extends the facilities of the Unix yacc utility and its PC
equivalents, including those based on GNU's bison, by allowing

*   infinite lookahead in place of yacc 1-token lookahead;

*   incremental compilation and linking of scripts arranged in separate
    modules.

*   full support for extended BNF descriptions, which makes
    for clearer and more efficient definition scripts.

*   parameterized grammar definitions, allowing context-dependent
    grammars.

*   variables which stand for grammars as parameters, allowing `macros'
    to replace repeated grammar constructions.

*   NEW in 2.42 * synthesized attributes which may be seamlessly used as
    parse parameters during the remainder of the parse in which they
    were synthesized. This is a powerful linguistic device.

Lexical analysers such as lex and flex are suitable pre-filters for
PRECCX. They are called in the same way that yacc calls them -- whenever
a new token is required -- and lexers constructed for yacc should be
`plug compatible' with PRECCX. And there is a trivial default lexer
built-in to the utility which passes characters through to PRECCX as
tokens.

It might be thought that infinite lookahead is the big technical
advance offered by PRECCX, but the parameterized grammars are more
significant. The higher order capability offered by parametrization and
the modular arrangement of scripts make parser specifications much more
maintainable. The infinite lookahead makes the semantics declarative --
a non-terminal can be replaced by its definition without altering the
semantics, a definition which is self-consistent is consistent in the
context of the rest of the script, etc.

* NEW * Now attributes can be synthesized `on the fly' and passed
seamlessly into the parser, extending PRECCX's declarative programming
paradigm from parameterized attribute grammar specifications to fully
cover mixed and/or synthetic attributed grammars too.

* NEW * PRECCX is now almost wholly compiling into C at the back end
instead of interpreting for a virtual machine, with consequent increase
in robustness.

INSTALLATION


INSTALLATION INSTRUCTIONS.

a) Put the following files where your C compiler will find them:

NEEDED BY YOU - HEADERS
cc.h            /* precc header file (no inherited attributes)  */
ccx.h           /* precc header file (for inherited attributes) */

              - LIBRARY
libcc1.a   (UNIX)
libcc2.a   (UNIX)
libcc4.a   (UNIX)
preccx1C.lib (DOS)
preccx2C.lib (DOS)
preccx4C.lib (DOS)
preccx1L.lib (DOS)
preccx2L.lib (DOS)
preccx4L.lib (DOS)

I can't help you with the placing because it will be specific to your
installation and your compiler. But I have the headers in \home\include
for DOS and ~/include for UNIX, and the libraries in \home\c\lib for
DOS and ~/c/lib for UNIX.

b) link or copy one of the libraries to

libcc.a   (UNIX)
preccx.lib (DOS)

You should choose libcc1.a/preccx1C.lib if you are going to use 1-byte
TOKENs (this is normal) and the Compact memory model for C, and another
accordingly if you have a larger size or different model in mind.

c)
              - EXECUTABLES
preccx[.exe]    /* the precc executable                         */

Put this somewhere where your operating system will find it. That has to
be in a directory named in your PATH variable. Or make a .BAT file (DOS)
or .sh file (UNIX) which contains the single line

    preccx %1 %2              (DOS)
    preccx $1 $2             (UNIX)

replacing `preccx' with the exact location of the preccx[.exe]
executable. The .BAT/.sh file should then be placed somewhere in the
PATH instead. I have a .BAT file in \bin\bat.

d) Read the literature:

   preccx.1   (UNIX: use nroff -man preccx.1 | more)
   preccx.man  (DOS: use more <preccx.man)

Also provided are a long technical paper in an ASCII and a .DVI version.
The latter should be viewable using dviscr, dvivga or other publicly
available DVI previewers.

   preccx.t[xt]
   preccx.dvi

YOU ARE READY TO GO.

DESCRIPTION  ...


VERSION


 PRECCX  - PREttier Compiler Compiler eXtnded v2.42

2.42 is in the 2.x line of PRECCX utilities, which extend the 1.x line
with conteXt dependent and higher-order parsing over plain infinite LA
parsing. The 2.2x subfamily essentially marks the change to a more
`standard' LEX interface (see HISTORY section). The 2.3x subfamily
consolidates certain forward and backward compatibility changes. The
2.4x subfamily has gradually introduced full support for synthesized
attributes and bidirectional data-flow between synthesized and
inherited parser attributes. The 2.42 version marks a change in the
back-end to allow almost full compilation into C instead of run-time
interpretation in a virtual machine.


DIFFERENCES FROM YACC ...


INTRODUCTION


PRECCX is intended to extend the Unix yacc utility, so it may be wise to
get to know (or learn) about that first. But the technology is entirely
different, which leads to some fundamental differences in the way that
definition scripts have to be written.

One _can_ convert yacc scripts to PRECCX ones quite simply (see below)
in general, with some particular points of difficulty (again, see
below), but why the big differences? Couldn't PRECCX have been written
to use a scripting language exactly like yacc's? Well, no. The
differences are essential because otherwise PRECCX would be restricted
to yacc-style semantics. PRECCX scripts cannot be converted to yacc
scripts because of the extra expressiveness of the semantics involved.
PRECCX does not build finite state machines.

But PRECCX is upwardly compatible with yacc. As far as possible, the
PRECCX scripting language has been designed to look like an extension of
yacc's, with the result that PRECCX can be thought of as yacc with
parameters, arbitrarily complex compound expressions, infinite
lookahead and a neater way of dealing with attributes. But the
fundamental differences mean that subexpressions cannot be translated
across independently of their context (yacc scripts are heavily context
dependent), and some special features of yacc do not translate easily,
such as precedence declarations, because they depend vitally on yacc
semantics.

ALTERNATION ORDERING


Yacc scripts all have the pattern

        a : b c d e ... | f g h i ... | ... ;

where the b,c,d, etc. may be actions, terminals or non-terminals. The
way to write this for PRECCX will be

@   a = ... | f g h i ... | b c d e ..

and the re-ordering is required to make the longest pattern come first
in the definition for PRECCX, whereas it will normally appear last in
the yacc script.

PRECCX is `infinite lookahead', so it will investigate each branch of
the grammar to the maximum depth. But it is often the case that
grammars do not have explicit termination markers (such as an ENDIF or
ENDBLOCK) and then it may be that an initial segment of the token
stream will satisfy the grammar specification as well as the full
stream will. To preclude this possibility, the potentially longest
matches must come first in PRECCX definition scripts, to force the
longest matches to be sought first.

So one must write

@       a = b c
@     | b

where in yacc one would have written

        a : b
          | b c
        ;

(incidentally, @a= b [c] is better style in PRECCX; more concise,
clearer and more efficient). Yacc may well have advised of a
shift/reduce conflict in such scripts (this depends on the context). In
those terms, PRECCX can be understood to always shift (look for more
tokens) instead of reduce (jump to another rule with what it's got),
and will backtrack if it eventually discovers that another
interpretation is required.

ERROR TRAPS


The yacc built-in `error' construct has to be rendered by hand for
PRECCX since resynchronising the input stream after an error is
problematic. A two token resync would be rendered

@       error = ? ?
@           {: printf("...."); :}

and a skip to the end of the block would be

@       error = >ENDBLOCK<*
@           {: printf("...."); :}

assuming that ENDBLOCK matches a token value returned by the yylex()
tokeniser. The >x<* construct means `not-an-x as often as necessary'.

PRECCX does not jump automatically to the error handler. The handler
has to be installed using the following notation:

@      foo !{error} bar gum

and then the error parser will be invoked on an attempt to backtrack
through the `!' (pronounced `cut') point. The construction is logically
the same as

@      foo ! { bar gum | error }

but has the advantage of being expressed without the trailing part of
the specification having to be known, and it also performs some space
saving technical manoevers with the C call stack.

The bare `!' mark induces a call to the default error handler on an
attempt to backtrack through it. This handler is called

     btk_error()

and is a C function (see the file on_error.c) that has direct access to
the token buffer. It can be replaced either by linking in a new
function or by changing the C macro that calls it. This macro is

     BTK_ERROR(x)

(the x always has the value -1). See section ERROR TRAPPING for more
details.

TERMINALS


PRECCX also differs in the way that it declares terminals. The yacc
declaration

%token FOO

is equivalent to a PRECCX definition

@ _FOO_ = <FOO>

if thereafter yacc references to FOO are replaced by PRECCX
references to _FOO_.

PRECEDENCE


There is at present no equivalent for the declaration of yacc precedences
and associativity. Instead, these have to be coded explicitly for PRECCX
using the preferred ordering.


ATTRIBUTES


When the -old switch is used, PRECCX partially supports the yacc method
of accessing attributes in actions using numerical references.  E.g.

@  sum = summand <'+'> summand     {: total = $1+$3; :}

but PRECCX 2.42 no longer fully supports the style.  In particular,
references to $0 or lower are not tolerated (they were valid in earlier
versions). The correct way to handle attributes now is using names, and
the named attributes can then be dereferenced in the action.  E.g.

@  sum = summand\x <'+'> summand\y {: total = $x+$y; :}

and attributes should be used whenever possible instead of
side-effecting actions. The intent in the above is to create a new
attribute whose value is the sum of the summands:

@  sum = summand\x <'+'> summand\y  {@ $x+$y @}

PRECCX 2.42 no longer supports the yacc method of assigning an
attribute using an action. In previous versions of PRECCX the yacc $$
notation was supported:

@  sum = summand <'+'> summand   {: $$ = $1+$3; :} /* WRONG */

but it is no longer. Attach an attribute instead, as above.


EXECUTION TIME


You probably don't want to know this! But there is a difference in
PRECCX between the time at which the parse occurs and the time at which
actions are executed.

The parse occurs first and the actions are `built' during this phase,
and executed either at the end of the parse, or at an explicit `!'
command in the parse definition. This is in contrast to yacc where
parse and the execution of actions are interleaved -- but then PRECCX
has to be able to backtrack across actions, and therefore cannot
execute them immediately they are encountered. I think the complication
of having to remember that the two phases are distinct is more than
compensated for by the infinite lookahead that it allows. The
distinction means that actions cannot be used to alter the parameters
to the parse directly.

Don't worry. PRECCX does pass values between the two phases correctly,
so the definition

@ foo(n) = ... {: a(n+1); :} ...

uses the n in the action that was the parameter to the parse.
But one _CANNOT_ alter the n during the action and have it passed
back to the parse. It is simply too late. All that happens is that
a local copy of n is altered, so

@ foo(n) = ... {: n=n+1 :} ... foo(n) ...

does *not* recurse with n+1 as the parameter to foo.

@ foo(n) = ... foo(n+1) ...

is the correct way to pass altered parameters.

UNION


The equivalent to the yacc

%union

declaration is found in the capacity to set the type of values that
PRECCX manipulates as attached attributes at runtime. Use

# define VALUE whateveryoulike

provided sizeof(whateveryoulike)=sizeof(long). In other words, PRECCX
does not really cater for anything other than (long) integer values as
attributes.  For structures rather than unions, this means passing
addresses rather than values, unless the structures are exceptionally
brief. This fits in with the general philosophy of C, but it is a bit
restrictive. You will have to make your own memory handler if you are
going to build any really substantially structured attributes and
then pass around handles to the structures. PRECCX only guarantees to
"unbuild" the attribute it sees if it backtracks, not whatever it points
to if it is a handle, so you will have to do your own memory
deallocation too.

But you would have had to do that anyway in any substantial C utility.

FEATURES ...


CONTEXTS


Each grammar definition may be parameterized with contexts.
For example, `n' is the context in the following definition:

@ decl(n) = space(n) expression <'\\n'> decl(n+1)*

This definition defines a grammar term decl(n) which expresses the
idea that it starts n spaces in from the left hand margin. It is
followed by terms decl(n+1) which start a little further in.
Some languages determine whether a declaration is local (and to what)
or global in scope by relative indentation, and this is how to
express this kind of constraint.

Note that it will be necessary to cast parameters to the type PARAM
(long) if they are not of the same size (as long) under your model of C.
E.g.

@ decl1 = decl((PARAM)1)

This is rarely necessary.

SYNTHESIZED ATTRIBUTES


PRECCX 2.42 can synthesize attributes on the fly. An attribute is built
by following the clause for which it is the attribute by an `@',
followed by the expression for the attribute, followed by a final `@'.
The expression _MUST_ _NOT_ be side-effecting because PRECCX may
execute the expression more than once if it backtracks. E.g.

@ foo = bar gum {@ 1 @}
@     | nay     {@ 2 @}

attaches the attribute 1 to the first clause and 2 to the second.

Attributes already attached to the terms of the clause may be referenced
and then dereferenced as follows:

@ arfarf = arf\x arf\y {@ $x+$y @}

The dereferencing $ in front of the x in $x is only necessary to ensure
proper casting of types from (PARAM to VALUE) in all circumstances. It
will usually not be required, but it is safer to use it. The x can and
should always be used as a parameter without the $. E.g.

@ bowwow = bow\x wow(x)

This is where the real power of synthesized attributes comes in.  An
attribute synthesized during the parse can be used as a parameter in the
remainder of the parse. This makes it possible, for example, to
identify a single token:

@ foo = ?\x what(x)

whereas otherwise a construction like

@ foo = <'a'> what('a')
@     | <'b'> what('b')
@     | ...

would have been necessary.

Note that the attributes can be passed into actions too:

@ foo = ?\x {: printf("%c",(int)$x); :}

but remember that actions are not executed until later. In particular,
it is no use expecting an action to alter an attribute value.


INFINITE LOOKAHEAD


PRECCX has infinite lookahead and backtracking in place of the yacc
1-token lookahead, This means that PRECCX parsers distinguish correctly
between sentences of the form `foo bah gum' and `foo bah NAY' on a
single pass.  If you cannot imagine why one should want to decide
between the two, think about `if ... then' and `if ... then ... else'.
One can write the grammar definition down straight away in PRECCX as

@ statement1 =   <'i'> <'f'>             boolexpr
@                <'t'> <'h'> <'e'> <'n'> statement
@              [ <'e'> <'l'> <'s'> <'e'> statement ]

but this is much harder to do for yacc-style parsers.

COMPLEX EXPRESSIONS.


Complex compound expressions like

        explain {{this | that} {several | no} times}+

are legal almost  anywhere within PRECCX definition scripts. The
definition can be substituted for the definee anywhere in a script
except in the parameter list of a higher-order parser application.
Grouping parentheses may be required.


MODULAR SCRIPTS, COMPILATION, LINKING


Parts of a script can be PRECCX'ed separately, compiled separately, and
then linked together later, which makes maintenance and version control
easy. Suppose that you have written a monolithic script mono.y of some
500 definitions, commencing with some

    # define  ...
        ...
    # include ...

definitions for the C precompiler, and terminating with a

    MAIN(foobie)

declaration. You can cut this script into four:

        mono.h          --- the heading declarations
        1sthalf.y       --- first 250 definitions
        2ndhalf.y       --- second 250 definitions
        monomain.y      --- the MAIN declaration.

and place the instruction

    # include "mono.h"

at the head of each of the .y files. Then run PRECCX over
each of these:

       preccx 1sthalf.y 1sthalf.c
       preccx 2ndhalf.y 2ndhalf.c
       preccx monomain.y monomain.c

compile each:

       gcc -ANSI -Wall -c 1sthalf.c 2ndhalf.c monomain.c

then link

       gcc -o mono  1sthalf.o 2ndhalf.o monomain.o

The ordering and placement of the definitions in the files is
not important.

SPEED


You may notice it yourself, but I'll say it now. PRECCX is fast,
typically taking two to five seconds to compile scripts of several
hundred lines. And it builds fast parsers too.

MACROS


PRECCX `Macros' may be defined in a script, simply by defining one
parser as a context for another.  For example,

@ optional(parser) = parser | {}

may be defined (this particular example is an equivalent for the
built-in [parser] construct). And then the construct

@ ice_cream(flavour) = tub(flavour) optional(sauce)

may be used.

The `macros' are really ordinary grammar definitions which just happen
to take other grammars as parameters. You may find that you have to
cast these parameters to be the same length as all the others, if your
model of C uses different sized pointers for function addresses than
long. The cast is only required when you introduce a grammar name as a
constant:

@ ice_cream(flavour) = tub(flavour) optional((PARAM)sauce)

and you may also find that you have to declare

extern PARSER sauce;

somewhere above the line, just to let C know what is going on.

SEE ALSO


Look at the UNIX man pages for  yacc(1), lex(1), and gcc(1L),


USING PRECCX ...


RUNNING


SYNOPSIS

      preccx [options] [ file.y [ file.c ] ]

PRECCX can also be used as a stdin to stdout filter:

 preccx [options] < file.y > file.c

It is better to use command line file names, however; there is then no
possibility of the console or keyboard being misidentified for error
messages and interrupts.

If file.c is omitted, stdout is used.

If file.y is omitted too, then stdin and stdout are used.

It is sometimes useful to run PRECCX in stdin to stdout mode
(interactive mode) in order to debug a complicated definition.

The command line options alter the sizes of internal PRECCX buffers and
tables. You may have to increase this if PRECCX runs out of space when
compiling a script.

  -rNN   Read token buffer size in Kb       (10)
  -pNN   internal Program size in Kb        (16)
  -vNN   Valued attribute buffer size in Kb (8)
  -fNN   context Frame buffer size in Kb    (16)

See the section on LIMITS, STACKS AND BUFFERS.

In addition, the

  -old

switch in 2.42 supports the use of the old style yacc-a-like
numerical references to attributes within actions. See the section
on ACTIONS.

COMPOSING ...


LITERATE SCRIPTS


A specification in a grammar definition script may look like:

@ expr = var {<'+'>|<'-'>} expr
@      | <'('> expr <')'>

The `@' is an `attention mark'. Every line which does not begin with
an `@' is passed through to the output unchanged, so arbitrary C code
can be embedded in a PRECCX script. This makes PRECCX scripts literate
in the sense made popular by Donald Knuth. Comments must therefore be
delimited by C comment marks, `/*' and `*/'.

A sequence of lines each of which begins with `@' is read as continuous
input by PRECCX. There must be one blank line of surround either side of
each group of lines beginning with `@'.


SYNTAX  ...



`*'


`*' is a postfix operator which means `zero or more times'. It can be
attached to any atomic PRECCX expression. That is, anything which looks
like a single thing to PRECCX: a literal, a definition name, a group in
braces, a ..., but not an unbracketed non-trivial sequence or alternation.

Example (1):

@ boring = <'z'>*

valid inputs:  zzzzzzzz
               zzzzzz
                          (nothing)

Example (2):

@ identifier1 = alpha {alpha | numeric}*

@ alpha       = (isalpha)

@ numeric     = (isdigit)

valid inputs:  identifier1
               isalpha
               go123on

The `*' may be followed by an integer C expression in order to define
a specific number of repetitions.

Example (3):

@ spaces(n) = space*n

@ space     = (isspace)

`+'


`+' is a postfix operator which means `one or more times', It is
equivalent to `*' in the sense that one may always substitute a+ by

        a a*

but it is sometimes more concise or revealing to use this form.

Example (1):

@ boring = <'z'>+
@        | :/*empty*/:

valid inputs:  zzzzzzzzzz
               zzzz
                        (nothing)

`[ ... ]'


`[ ... ]' is an outfix operator which means `optionally'. Syntactically,
it acts as a bracket. Actions can be captured within too.

Example (1):

@ integer = [ <'+'>|<'-'> ]
@           unsigned_int
@           [ {<'E'>|<'e'>} [<'+'>] unsigned_int ]

@ unsigned_int = (isdigit)+

valid inputs:  -100e2
               1234567890
               +456
               321E+6

The [ foo ] construct is equivalent to { foo | }. The foo* construct is
equivalent to [ foo+ ].

`{ ... }'


`{ ... }' are the grouping brackets for PRECCX expressions.

Example (1):

@ identifier2 = {(isalpha) | <'_'>} {(isalpha) | (isdigit) | <'_'>}*

valid inputs:  a123_
               _______
               g_l_e_e_0


`] ... ['


`] ... [' (anti-brackets) hide an expression, causing it to be required
but ignored. This has different effects in the middle of an expression
and at the end of a PRECCX expression.

The principal intended use is to require trailing context which is not
parsed.

Example (1):

@ word = {alphanum | <'\''> | <'-'>}*
@        ] separator | punctuation | stop | EOF [

@ sentence = word
@            {[punctuation] separator word}*
@            stop

@ stop = <'.'>

@ punctuation = <','> | <';'> | <':'> | <' '> <'-'>

@ separator   = (isspace)

But when used early in a sequence, the effect is of a demand for
`parallel' parsing. The hidden context must be reparsed by later
sequents and hence must satisfy two parse requirements simultaneously:

Example (2):

@ loweralpha = ](islower)[ (isalpha)

valid inputs:  a
               b
               c

Note that the syntax is sometimes ambiguous, and care must be taken to
split up definitions involving anti-brackets in order to disambiguate
it for PRECCX. The definition

@ foobie = ] a [ b ] c [

is either

@ foobie = ]{ a [ b ] c}[

or

@ foobie = {] a [} b {] c [}

and no particular interpretation is guaranteed (by me). Use grouping
brackets if in doubt. (OK. I'll come clean. The first interpretation
ought to be the one PRECCX uses always, since it involves the longest
match down the left hand side of the parse tree.)


`?'


`?' stands for any token (except the special 0 token which the yylex()
lexical analyzer should use to signal a break),

Example (1):

@ error = ?* {: printf("something happened on line %d\n",yylineno); :}

'^'


`^' means `beginning of line',

Example (1):

@ foo = {^ | separator} <'F'> <'O'> <'O'> ]separator|EOF[

`$'


`$' means `(a) match the special 0 TOKEN'. This token is usually
returned by yylex() to denote end of line or EOF, or some other break
or termination condition, So `$' is the place to perform special
actions.

`$' means also `(b) prepare to append more input'. This is `append' in
distinction to `overwrite'. After a `$' match, tokens are appended to the
input buffer as though no interruption had occurred in normal
processing, except that the 0 token is written to match the position of
the `$'. This action should be compared with that of the `$!'
construct.

Example (1):

@ EOL = $ :V(1)=' ';:

`!'


`!' means cut, or `execute all pending actions now'. The input buffer is
reset so that the current TOKEN becomes first. Backtracking across the !
position is disabled, and an error is generated if it is attempted.

Example (1):

@ EOL = $ ! {: printf("no hope of backtracking\n"); :}

`$!'


`$!'  is short for `$ !'.

`( ... )'


`(foo)' where foo is a BOOLEAN valued predicate on tokens, means
`match a token satisfying foo'. Foo may be defined
as an int 1 or 0 -valued C function elsewhere in the script,

Example (1):

@ name = (myisalpha)+

BOOLEAN myisalpha(c)
TOKEN c;
{
    return(isalpha(c)||(c=='_'));
}

`) ... ('


`)...(' round a C expression of BOOLEAN type, indicates a logical
test condition.

Example (1):

@ linefrom(n) = )n==0(  {?\x {: printf("%c",$x); :} }*
@             | )n<80( ? linefrom(n-1)

`< ... >'


`<...>' may be placed around literals for a match. Variables may occur in the
literal, which may be any C expression.

Example (1):

# define COLON ':'

@ twocolons = <COLON> <':'>

Example (2):

@ encrypted(x) = <rot13(x)>

`> ... <'


`>...<' may be placed around C expressions of TOKEN type to mean
`not a (particular literal)',

Example (1):

@ string = <'"'> strchar* <'"'>

@ strchar= <'\\'> ?
@        | >'"'<

`|'


`|' means `or', and is placed between alternate phrases of the grammar,

Example (1):

@ a_or_b = a
@        | b

conjunction


Simple conjunction indicates sequence.

Example (1):

@ abc =  a b c

is the term denoting an expression consisting of an `a expression'
followed by a `b expression' followed by a `c expression'.

An example of a full PRECCX script will be found in the section USAGE.

TOKENISING INPUT


A default do-nothing tokeniser is provided in the PRECCX library and
will be automatically linked in unless you specify a different yylex()
routine to the C compiler.

There is nothing to worry about here. If you do nothing yourself, you
will get a working parser out of a PRECCX script immediately, but if
you particularly want to put your own tokeniser on the input, then you
do that by

1. naming it `yylex()' and
2. making it return a TOKEN when called.
3. Place its object module or source code file ahead of the
   `-lcc' argument when you use the C compiler. For example:
          gcc -ANSI -o foo foo.c mylex.c -L $PRECCDIR -lcc
   and it will be linked in instead of the default.

Exact details of what yylex() should do:

A) (Important) yylex() should signal EOF by setting yytchar to EOF and
returning with the value 0, which yylex() routines generated by lex(1)
do not seem to get right. Under normal conditions, it should
   1) return a nonzero TOKEN and set yytchar to something other than EOF.
   2) set yylval to the attribute VALUE of the token, e.g. the value
      of the integer for an INT token, the character itself for a CHAR
      token, and so on.

The 0 return code is a special TOKEN only matched by the PRECCX
constructs `!' and `$' and `$!' (and `$$', for EOF).

B) yylex() should set yylen to the length of the string corresponding to
the returned TOKEN (this is not currently required by PRECCX).

C) yylex() should set yylloc to point to the string (this is not
currently required by PRECCX).

D) yylex() should increment yylineno when a new line is deemed to have
been input. PRECCX uses this information in the default error messages.

COMPILING


The way to compile a C source code file `foo.c' generated by PRECCX into
an executable `foo' is to use an incantation like:

        gcc -Wall -ANSI -o foo foo.c -L <PRECCX dir> -lcc

(under UNIX). The command line will vary with particular installations
and configurations.

In DOS (under Turbo-C), I find that it is important to select the
`assume code segment not the same as data segment' switch for the
compiler and linker. This is especially important if several different
modules are compiled and linked together.

Note that the default call stack size in DOS is only 4K, and this is
altered to 32K+ by PRECCX executables during the main() routine they
set up. The size can be varied by setting C_STACKSIZE (default
0x7FFFF). See section LIMITS, STACKS AND BUFFERS.

TOKEN TYPE


The following macro may be set in the grammar definition script, above
the # include lines for cc.h or ccx.h:

 # define TOKEN tokentype

(default char)

This defines the space reserved for each incoming token in the parser
which PRECCX builds. You must choose a preccx link library that matches
the size of TOKEN. Use libccn.a (or [preccxnM].lib, where M is the
memory model) for n-byte TOKENs.

You can change the TOKEN type seen by PRECCX by #defining it
differently:

# define TOKEN short

(you may want a wider range of TOKENs than the 256 possibilities
afforded by an 8-bit char, and `#define TOKEN short' is sometimes
useful). Any integer type is valid.

VALUE


 # define VALUE valuetype

(default char*)

This defines the space reserved for each value on the runtime stack
manipulated by the runtime program which PRECCX attaches to the parser.
There is no good reason for changing this to a type which is shorter
than long int (or far *char), because the actual space used will be a
union type which is at least as long as this.

Nor is it possible to change the size beyond that of long (it is not the
largest type passed as a value by C, but PRECCX cannot handle any larger
one).

So you can only use VALUE to switch to using structure or union
pointers, or to change the name of the value type without changing its
size.

ERROR TRAPPING


The PRECCX 2.4x series provides for explicit error trapping using
labeled cut marks in scripts. For example:

@ top  = !{skip} foo
@ skip = ?* $ top
MAIN(top)

defines a top parser with a default fallthrough to a parser that
silently skips a line and then retries top.

Actually, in this case, the call of top in skip is unnecessary because
PRECCX always retries its top level parser continuously until input is
exhausted, but this style of specification does keep the parse inside a
single invocation of the top level parser, which is ideologically
cleaner!

Failing explicit error directions, PRECCX will use its defaults.

The default error action may be intercepted at low level by #defining an
ON_ERROR(x) macro in C. There are currently three error values reported
by PRECCX:

 x=0  means a partial (but successful) parse;
 x=1  means an unsuccessful parse;
 x=-1 means an attempt to backtrack across a `cut' (`!').

By default. PRECCX calls the ON_ERROR(x) macro when these errors arise.
Then it attempts to reparse the remaining input. You might redefine
what the macro does. For example:

 #define ON_ERROR(x) switch(x){\
                     case 0: printf("ow!\n");   break;\
                     case 1: printf("ouch!\n"); break;\
                     case -1:printf("zowie!\n");break;\
                     }


The x=0 value might arise from a specification like <'a'>* and an input
like "abb". The remaining input will be "bb".

The x=1 value might arise from a specification like <'a'> <'a'> and
an input "abb". The remaining input will be "abb".

The x=-1 value might arise from a specification like <'a'> !
<'a'> and an input "abb". The remaining input will be "bb".

The default error macro supplied with PRECCX simply prints an error
message and the portion of the string beyond the (TOKEN *)maxp pointer,
which is generally accurate for error location.  It points to the
deepest successful penetration into the incoming string.

For your information, the pointer (TOKEN *)pstr always gives the
unparsed TOKEN string, of which (TOKEN *)maxp will be an end-segment.
These will not necessarily make good reading, since the pointer is into
PRECCX's buffer and that is only guaranteed to be populated between pstr
and maxp.

The pointer *yylval may be set by the lexer to show more detail, but
support is limited. You can determine what tokens are in the buffer by
looking at the *yybuffer pointer (or the PRECCX buffer[]) and then
attempt to reconstruct where you are from that snapshot.

The error routine macro is initially set to

 ON_ERROR(x) = switch(x) {
               case 0 : ZER_ERROR(0); break;
               case 1 : BAD_ERROR(1); break;
               case -1: BTK_ERROR(-1);break;
              }

If you want to try and resync the parse at an error, a sensible thing to
do would be to (rewrite ZER_ERROR or BAD_ERROR or BTK_ERROR to) skip a
token at maxp, and rerun the parse from there. You would have to read
the code of the run() function defined in cc.c to make sense of this,
but you might try:

get1token();                /* skip a TOKEN */
tok=the_top_level_parser(); /* run the parse again */
if(GOODSTATUS(tok)){        /* the parse succeeded, so ..  */
  pc=0;pc=p_evaluate());    /* run the pending actions */
} else
  printf("At least I tried!\n"));

Using a counter to set a maximal number of resync attempts in a single
line would also be sensible!

You can avoid all BAD_ERROR(1) calls by making sure that the top-level
parser has a failsafe fallthrough to a ?* parser, with some kind of
error action attached.

The version 2.x series PRECCX introduced the error, BTK_ERROR(-1),
which traps an attempt to backtrack across a cut. The ZER_ERROR(0),
BAD_ERROR(1) and BTK_ERROR(-1) defaults are what you get with
ON_ERROR(0), ON_ERROR(1) and ON_ERROR(-1) respectively. The default
values of these C macros are respectively zer_error(), bad_error() and
btk_error(), the three functions defined in the on_error.c module.


BEGIN AND END


You may include the lines

#define BEGIN mybegincode
#define END   myendcode

for C code to be executed at either end of a top level parse attempt.
This means that BEGIN will be re-executed if the top level parser
resyncs after an error, and your code should take account of that (most
likely by installing and using a counter).



ACTIONS


The parser generated from a PRECCX script will ordinarily signal valid
input by absorbing it silently, and signal invalid input by rejecting it
and spouting an error message. This is a standard style for
compiler-compilers. To get the parser to do anything else, you must
decorate the definition script with ACTIONs.

So now for the horrors of synthetic attributes. To make a parser do
anything significant, you need either to get it to synthesize a data
structure, or get it to generate outputs. Whichever, you need to scatter
actions through the grammar definition script.

Actions are pieces of C code (terminated by a semi-colon) and placed
between a pair of braced colons (`{: ... :}') in the grammar definition
script. For example:

@ addexpr = expr\x <'+'> expr\y {: printf("%d",$x+$y); :}

is not unreasonable.  `Values' attached to each term of a PRECCX
expression are an easy way to think of what is going on.

Note that literals (like <'+'>) have their attached value generated by
the yylex() token analyzer which feeds PRECCX. The VALUE yylval should
be set by yylex() when it returns a TOKEN, and this will be used as the
attached value.

Side-effecting actions need a little explanation. Because PRECCX is an
infinite look-ahead parser, it cannot execute actions at the same time
as it reads input. It might have to later backtrack across its parse,
and, whilst it might deconstruct data structures built up in the parse,
it is certainly impossible to undo writes to stdout which might have
occurred.

So PRECCX builds a program as it parses. When the parse finishes
correctly, the program is executed by an internal engine, but if the
parse is unsuccessful or has to be backtracked, the program is
`unbuilt' before its actions are executed. This program is a linear
sequence of C code actions which have been specified in the PRECCX
definition file. Thus the specification:

@abc=a b c :printf("D");:

@a=<'a'> :printf("A");:

@b=<'b'> :printf("B");:

@c=<'c'> :printf("C");:

will, upon receiving input "abc", generate the program

 printf("A");printf("B");printf("C");printf("D");

to be executed later.  Thus actions attached to a sequence expression
may be thought of as occurring immediately after the actions attached
to sub-expressions, and so on down. That explanation should enable you
to generate side-effects in the correct sequence.

MODES


In earlier versions of PRECCX, the call_mode (default 0/AUTO) parameter
determined the way PRECCX handled its internal stack of attached
values, using either automatic control or allowing client control.  In
2.42 and above, the call_mode parameter is obsolete. The stack has been
superseded by compiled code.

PRECCX now operates in only one mode under normal circumstances.
However, the -old switch on the command line will enable at least
partial support for yacc-style numerical references $1, $2, $3, etc., to
attached attributes.

LAYOUT


PRECCX grammar description files conventionally have the .y suffix, and
should follow the following format:

# define TOKEN ... (default = char)

# define VALUE ... (default = char*)

# define BEGIN ... (default nothing)

# define END   ... (default nothing)

# define ON_ERROR(x) ... (defaults to standard)

# include "ccx.h"  (or wherever the ccx.h file has gone)

@ first definition : attached action; :

@ ...

@ ...

MAIN(name of entry clause)

The cc.h header file may be used instead of ccx.h in scripts which
consist only of unparametrized definitions and terms.

LIMITS, STACKS AND BUFFERS


The standard sizes for the token buffer and interpreted program
stack inside PRECCX are respectively

  * READBUFFERSIZE and

  * PROGRAMSIZE,

defined in the header files cc.h and ccx.h. These can be changed in the
module which contains the MAIN(...) parser declaration. The
READBUFFERSIZE limits the number of TOKENs PRECCX can accept between
cut (`!') operations, and the PROGRAMSIZE limits the total number of
TOKENs and ACTIONs. This number is about twice the number of TOKENs seen
(assuming one action and one passed value per TOKEN), and thus limits the
`line length' too.

Note that the cost in space of a TOKEN itself may be small, but the
token stack requires a parallel stack of token VALUEs, supplied by the
yylex() lexer through the yylval variable.

PRECCX no longer uses a runtime VALUE stack for the attributes attached
to grammar components, but the parameter governing the size is retained
for compatibility. It is

   * STACKSIZE.

and the default value is 0.

The only other limit imposed on PRECCX is the size of the C runtime
stack. This can also be set in the MAIN(...) module, by defining

   * C_STACKSIZE

You should avoid recursive calls in favour of the * and + constructions
whenever space is tight. Each library call costs about 20 bytes of C
stack space.

EXAMPLES ...


CONTEXT INDEPENDENT


The following script defines a simple +/- calculator:

# define TOKEN char

# define VALUE int

# include "cc.h"

# include <ctype.h>

int acc = 0;

@ digit = (isdigit)\x    {: acc=10*acc+$x-'0'; :}

@ posint= digit posint
@       | digit    !     {: acc = 0; :}
@                        {@ acc @}

@ anyint= <'-'> posint\x {@ -$x @}
@       | posint

@ atom  = <'('> expr\x <')'>
@                        {@ $x @}
@       | anyint

@ expr  = atom\x sign_sum\y
@                        {@ $x+$y @}
@       | atom

@ sign_sum= <'-'> atom\x sign_sum\y
@                        {@ -$x+$y @}
@       |   <'-'> atom\x {@ -$x @}
@       |   <'+'> atom\x sign_sum\y
@                        {@ $x+$y @}
@       |   <'+'> atom

MAIN(expr)

This script must be passed through PRECCX:

        PRECCX  calculator.y  calculator.c

and then compiled, using the PRECCX kernel library in libcc.a:

        gcc -Wall -ANSI -o calculator calculator.c -L ... -lcc

The three dots stand for the directory in which the PRECCX library file
libcc.a is placed.

Note that by default the attached attribute is that of the last term in
a clause, so no {@ ... @} is needed in some places where it might
have been thought to be required.

Also note that it would have been more efficient to use an optional
following action and write

@ expr = atom\x  { sign_sum\y {@ ... @} | {@ $x @} }

instead of

@ expr = atom\x sign_sum\y {@ ... @} | atom

because the latter expression will build a parser which needlessly
checks twice for atom when no sign_sum follows.

CONTEXT DEPENDENT


For an example of a parser which uses parameters essentially, the
following definition of a parser which accepts only the fibonacci
sequence as input may be instructive:

# define TOKEN char
# define VALUE char*
# include "ccx.h"
# include <math.h>

# define INT(x)   (int)(x)
# define DIV(m,n) INT(INT(m)/INT(n))
# define MOD(m,n) INT(INT(m)%INT(n))
# define LOG10(n) INT(log10(DBLE(n)))
# define DBLE(n)  (double)(n)
# define TEN      DBLE(10)

# define FIRSTDIGIT(n) \
    (0!=n)?DIV((n),pow(TEN,DBLE(LOG10(n)))):0
# define LASTDIGITS(n) \
    (0!=n)?MOD((n),pow(TEN,DBLE(LOG10(n)))):0

MAIN(fibs)

@fibs     = fib((PARAM)1,(PARAM)1)\k
@           {: printf("%d terms OK\n",(int)$k); :}

@fib(a,b) = number(a) <','> fib(b,a+b)\k {@ $k+1 @}
@         | <'.'> <'.'>
@           {: printf("Next terms are %d,%d,..\n",(int)a,(int)b); :}
@                           {@ 0 @}

@number(n)= digit(n)
@         | digit(FIRSTDIGIT(n)) number(LASTDIGITS(n))

@digit(n) = <n+'0'>  /* rep. of 1 digit n */

The following are some example inputs and responses:

1,1,2,3,5,..
5 terms OK
Next terms are 8,13,..

1,1,2,3,5,8,13,21,34,51,85,..
error: failed parse: probable error at <>1,85,..


HIGHER ORDER


PRECCX macros (see Section MACROS) are higher-order parsers.
In principle one may define a separated-list macro:

@ sep_list(p,q) = p {q p}*

and use it as follows:

@ mysep  = <','>

@ mylist(p) = sep_list(p,mysep)

@ items = mylist(item)

but you may find that you have to

1) declare a parser before you use it as a parameter:

extern PARSER mysep, item;

2) cast the parser to the PARAM type (used by PRECCX for all parameters)
as you use it for the first time:

# define CONST(x) (PARAM)(x)

@ mylist(p) = sep_list(p,CONST(mysep))

@ items     = mylist(CONST(item))

This is only necessary if sizeof(short(*)()) is not the same as
sizeof(long) in your C model.

EXAMPLE DIRECTORY


PRECCX is intended to be both easy and convenient to use, but a
compiler compiler cannot be understood in one minute. There is a
directory of example files included with the PRECCX distribution.
Have a look at the *.y files there to get more of the feel.

(Sorry, but I can't put everything you need to know in here).

LIFE CYCLE   ...


FILES


The following files  may be found in the PRECCX distribution -- which
is not to say that all of them are in it, or that this list is complete.
I just want to let you know what some of the files you can see are:

preccx          PRECCX executable
preccx.y        Main PRECCX definition in its own language
preccx.c        PRECCX C source code (generated by PRECCX from preccx.y).
preccx.h        PRECCX header file, needed only to construct PRECCX.
lex.y           Lexer definition.
lex.c           Lexer C source code (generated by PRECCX from lex.y).
c.y             C parser definition.
c.c             C parser source code (generated by PRECCX from c.y).
ccdata.c        The global data used by PRECCX.
ccx.c           The source code of the PRECCX 2.x kernel operations, needed to
                make ccx.o, included in libcc.a.
cc.c            The source code of the unparametrized PRECCX 1.x kernel
                operations, needed to make cc.o, included in libcc.a.
ccx.h           The header file of the PRECCX 2.x kernel operations, needed
                by every code constructed by PRECCX.
cc.h            The header file of the unparametrized PRECCX 1.x kernel
                operations, an alternative to cc.h if you do not use
                parameterized definitions.
common.c        The source code of the kernel common to both 1.x and 2.x.
engine.c        The PRECCX runtime engine.
yystuff.c       Default lexer which allows you to escape newlines.
on_error.c      Default error routines.
atexit.c        Termination routines.
libcc.a         The UNIX library containing cc.o, ccx.o and yystuff.o, etc.
                needed to compile an executable from code built by PRECCX.
                Actually a link to one of:
libcc1.a        1-byte TOKEN library
libcc2.a        2-byte TOKEN library
libcc4.a        4-byte TOKEN library
preccx.lib      The DOS library containing cc.o, ccx.o and yystuff.o, etc.
                needed to compile an executable from code built by PRECCX.
                Actually a copy of one of:
preccx1c.lib    1-byte TOKEN library, compact memory model.
preccx2c.lib    2-byte TOKEN library, compact memory model.
preccx4c.lib    4-byte TOKEN library, compact memory model.
preccx1l.lib    1-byte TOKEN library, large memory model.
preccx2l.lib    2-byte TOKEN library, large memory model.
preccx4l.lib    4-byte TOKEN library, large memory model.
Makefile        The makefile for PRECCX, which calls:
Makefile.dos    The makefile for PRECCX under DOS.
Makefile.hpu    The makefile for PRECCX under HP-UNIX.
Makefile.syv    The makefile for PRECCX under system V UNIX.
test.y          Simple test script for PRECCX.
test.c          C output from the test.y script.
test            The test parser built by `gcc -ANSI -o test test.c -L ... -lcc'.

COPYRIGHT


 PRECCX COMPILER COMPILER  Copyright Peter T. Breuer, 1989, 1992, 1994.
         3, Arthur St. Cambridge CB4 3BX
<ptb@{comlab.ox.ac.uk,eng.cam.ac.uk,dit.upm.es,cs.fit.edu}>

All rights reserved.  In particular, you may not distribute for profit
or cost the source code of the kernel libraries, or the description of
PRECCX in its own scripting language, or its source code, without my
permission. You may not make copies of the PRECCX executables and
libraries, except for your own use and for the purpose of making
backups, and you definitely may not sell any copies. See the licensing
agreement which accompanies the package for full details.

AUTHOR


Peter Breuer

Programming Research Group, Oxford University Computing Laboratory,
Wolfson Buidling, Parks Road, Oxford OX1 3QD, UK.

DIT, Escuela Superior de Ingenieros de Telecomunicacion, Universidad
Politecnica de Madrid, Ciudad Universitaria, Madrid E-28040, SPAIN.

<ptb@{comlab.ox.ac.uk,eng.cam.ac.uk,dit.upm.es,cs.fit.edu}>

Original man page also hacked by Jonathan Bowen <bowen@comlab.ox.ac.uk>.

ACKNOWLEDGEMENTS


This executable readme created by P.T. Breuer using Dave Harris'
freeware DRC utility, David's Readme Compiler, available from SIMTEL
and mirrors at numerous archive sites.

PRECCX has been developed under Turbo-C 3.0 and Borland C 2.0 for 386
PCs under MS DOS 5.0 and PC DOS 6.3, and under GNU's gcc C compiler for
UNIX on numerous platforms and operating systems. Blame them.

The PRECCX.EXE executable for DOS has been compressed using Fabrice
Bellard's freeware LZEXE utility, available from SIMTEL and mirrors as
LZEXE91.ZIP .

BUGS


1. On Sun3's, the gcc compiler still complains that printf is being
redefined.  I don't know why. If anyone finds the right compiler
switch to magic this away, please tell me! For the hp300 series, the
switch is -D__hp9000s300, if that's any clue? Mind you, on the hp's I
get complaints about __fls being assumed to be int (it is).  I presume
all these reflect mess-up's in the gcc configuration. On my Sun SPARC I
also get complaints about strlen being redefined, but then that gcc
configuration is _definitely_ crocked.

2. (Cured Mar 10 1992 in v1.1)

3. (Not cured but irrelevant, as multiple libraries are now used, April
1992) There is no way to change the TOKEN and VALUE types compiled into
the libraries, other than trivially. The size must remain the same.

4. (Cured Mar 17 1992 in v1.2).

5. (Cured July 10 1992 in v2.21, by using dynamic allocation in main)
This is a perennial one. PRECCX allocates all stacks and buffers
statically, and there are one or two I don't watch for overflow on
at every possible overflow point. One day I will switch to dynamic
allocation -- or at least command line parameters which determine the
sizes, like yacc. See NOTES and LIMITS. (NB - should have been finally
banished in v2.42 since all stacks are guarded there).

6. Cured in April 1993 in v2.40, failure to set a mark at some cut
points resulted in fewer backtrack errors than there ought to have
been.

7. Cured in August 1994 in v2.41; raw literal strings and chars were not
always recognized as valid parser arguments. Easy one.

8. Cured finally in August 1994 in v2.42, I hope, several reports over
the years all eventually traced to a failure to realign the read buffer
when it ought to have been resulting in overflow about 2K tokens later.
Symptoms are an overwriting of the program area with an "illegal
instruction" report issued from the engine module. Let me know if the
impossible happens again.

Please report problems to <Peter.Breuer@comlab.ox.ac.uk>.


HISTORY


v2_01 : March 1992

     : Original issuable release of precc*x*. With parameterized grammars
     : now, but still with line-at-a-time lexer requirements. Preccx
     : asks the lexer for a string of tokens terminated by a zero token,
     : and offers the yybuffer location for them to be written to.

v2_02 : March 1992

     : Minor corrections (but necessary ones) to `!' and `!$' functionality.

v2_10 : Wed Jun 10 08:53:56 1992

     : Major revision of the library routines. They now use argument counts
     : instead of the P_STOP terminator to demarcate the list of parameters
     : in multi argument calls. This means that the P_STOP value is now valid
     : as a parameter (hooray, I always use MAXINT-7 in my programs).
     :
     : Bug fixes: more corrections to `!' and `!$' functionality.
     :
     : Improvements: * rewrite of bracket counting algorithm in preamble.c
     : to return more info (number of args) in support of the new style
     : libraries, and do it better.
     :               * added yylen to lexer code.

v2_20 : Thu Jun 11 21:07:09 1992

     : Major revision of the lexer interface. Switched over to
     : token-at-a-time calls to yylex(), a la yacc, for compatibility. This
     : is extremely inefficient, technically, of course! I'll release the
     : default lexer code so that it can be figured out.
     :
     : Lexers now return the TOKEN value when called, and put the VALUE in
     : yylval. They should do their own buffering if they want to be
     : efficient. They still have to shift yylineno for themselves, and set
     : yylen, and yylloc (the string location for error calls), though I
     : can't make head or tail of the documentation on the latter in
     : bison/yacc, so I don't really know what it's for.
     :
     : Experiment: * began to allow the p[q,r](x,y) syntax to distinguish
     : meta-parameters from ordinary ones. I suppose it makes sense as a
     : convention, even though I don't do anything with it at present.

v2_21 : Sun Jun 21 17:44:52 1992

     : Bug fixes to new-style default lexer (of course).
     :
     : Improvements: * internal stacks now created dynamically.
     :
     :               * new syntax, p*n for exactly-n repeats of p. This
     : required support in the kernel. The n can be a C expression and
     : catches local parameters thrown from the parser definitions.
     :
     : The documentation has been brought up to date and improved.


ver 2_22 : Sun Jul 12 17:11:03 1992

     : Making stack allocation user definable. Use STACKSIZE, C_STACKSIZE,
     : READBUFFERSIZE, FRAMEBUFFERSIZE in the main module (see cc.h).

     : Final lexer fixes.

ver 2_23 : Sat Jul 25 21:39:54 1992

     : Going back to ANSI code for MSDOS from Turbo-C.

     : The optimization default has been turned to off (because the expected
     : deficiency finally surfaced, and this will be fixed soon - I
     : have to include the instruction cache in the saved frame).

     : Shortened cc.h and ccx.h for users so that internals not exported.

ver 2_30 : Mon Aug 24 14:31:32 1992

     : Changes in source code for Unix and other compilers, to help
     : with portability and compatibility concerns. One bugfix in libraries.
     : Added a trivial atexit.c module to satisfy systems without atexit().

     : Changed SUCCESS to 1 and FAILURE to 0, for forward compatibility
     : with the monad model of parsing. This is the reason for the new
     : release number.

     : Now supporting the monadic `a\x b(x)' syntax, but not all the
     : functionality (wait for 2.31). The `a[b]' syntax is withdrawn, as
     : it's purely decorative, and slows down the parse noticeably.

     : Corrected a bug in some0n() and another in the push macro
     : which meant that changes in MAXPROGRAMSIZE weren't seen by ccx.h.

     : I introduced globals to contain all the user-definable numbers, just
     : in case they need to change dynamically in future. The struct precc_data
     : contains them all.

ver 2_31 : Mon Mar 19 1993

     : Various minor internal changes for compatibility and better
     : program comprehensability.

ver 2_32 : August 1994

     : ditto.

ver 2_40 : Apr 25 1993

     : Implemented the `a\x b' syntax correctly at last. Various cleanups
     : of the precc.y script to support this, particularly in the management
     : of local environments (which really should be handled as inherited
     : parameters, but aren't, for bootstrap reasons).

     : Corrected a bug in the implementation of `!' which prevented
     : recognition of many backtrack errors through it.

     : Split the precc.y script into three: precc.y, lex.y, c.y .

     : Introduced the `!{foo}' construct, which causes reentry at parser
     : foo in case of a backtrack error through that point.

     : The buffer-sizes in the precc utility itself can now be set on
     : the command line, as well as by C macros in clients.

     : patch: August 1994. Altered sources to reduce compiler warnings.
     : Mended a bug in findbrkt which meant that strings and quotes were
     : not recognized in parser args. Made #line N "source" directives
     : appear in emitted code.

ver 2_41 beta: august 1994.
     : Moved synthetic attribute construction into the compile stage.
     : The C stack is now being used for attribute passing, and C is looking
     : after the frame shifts, not precc. Synthesized attributes can now
     : be passed as inherited parameters at parse time. The old attribute
     : stack has been discarded (STACKSIZE=0 by default) and call_mode is
     : obsolete.  Optimization also obsolete because shifting handled by C.

     : Synthetic attributes should be constructed within @...@ .
     : E.g foo = bar gum @hum@ . I added encryption to get around the problem
     : that naked zeros couldn't be constructed before. Now there is no
     : restriction and I think the encryption is invisible.

     : Named synthetic attributes should be dereferenced using the $foo
     : syntax:    bar = gum\foo {: print($foo); :} . This does a cast.

     : The old $1 $2 syntax is supported, but only if you use the -old
     : switch to precc, and the generated code is horrible. It should still
     : be more robust than before, however. The $0 reference is DISALLOWED
     : now. $$ is now meaningless and should be replaced with $1, if at all.
     : Actions cannot make changes to these variables any longer.

     : Further cleanup in the bit of preccx.y I couldn't understand before.

     : Fixed: a huge bug that has been there forever. The read buffer is
     : now always flushed on _successfully_ passing a cut mark
     : (!). It wasn't, with resulting overflows, before. The buffer is now
     : also being watched for overflow. Cleaned up pstr/maxp/buffer code.

     : Removed: #line N directives from emitted code. Too confusing!

ver 2.42 September 1994.
     : Minor code changes to pass ANSI lint and manual
     : text cleanups. Code is now clean if -w-pro flag is set to avoid
     : warnings about "function used with no prototype". This only happens
     : because I use foo(); instead of foo(void); style declarations.

     : Decided on {: :} and {@ @} syntax for actions and attributes
     : respectively, but it is not strictly enforced yet and a little
     : inefficient unless C compiler optimization is used.
This HTML document was generated automatically by Jonathan Bowen on Tue Oct 18 12:07:08 BST 1994 with minor manual corrections.
PRECCX USER MANUAL

P.T. Breuer and J.P. Bowen

(10 August 1994)

Table of contents