% file: .../doc/design.tex

% $Header: /cvs/Darwin/src/live/Security/SecuritySNACCRuntime/doc/design.tex,v 1.1.1.1 2001/05/18 23:14:10 mb Exp $
% $Log: design.tex,v $
% Revision 1.1.1.1  2001/05/18 23:14:10  mb
% Move from private repository to open source repository
%
% Revision 1.1.1.1  1999/03/16 18:05:52  aram
% Originals from SMIME Free Library.
%
% Revision 1.1  1997/01/01 22:47:31  rj
% first check-in
%

\chapter{\label{comp-des-chapter}Compiler Design}

\section{\label{comp-overview-section}Overview}
The Snacc compiler is implemented with {\ufn yacc}, {\ufn lex}
(actually GNU's equivalents, {\ufn bison} and {\ufn flex}) and
\verb$C$.  Despite the shortcomings of {\ufn lex} and {\ufn yacc},
they provide reasonable performance without too much programming
effort.  Since {\ufn yacc} parsers are extremely difficult to modify
during runtime, any macro that you want the compiler to handle must
be hand coded into the ASN.1 {\ufn yacc} grammar
({\ufn \dots/compiler/core/parse-asn1.y}) followed by recompilation of snacc.
Macro definitions do not need special consideration since they are
skipped by the compiler.  Macro definitions and complex value notation
are kept as text in the data structure resulting from a parse if you
want to try to parse and process them.

To handle the anti-compiler nature of ASN.1's syntax, snacc makes
several passes on parse tree data structure when compiling.  None of
these passes creates temporary files; this allows snacc to process
large ASN.1 specifications quite quickly.  Each compiler pass is
explained in the next sections.  The main passes of the compiler are
executed in the following order:

\begin{enumerate}
\item parse useful types ASN.1 module
\item parse all user specified ASN.1 modules
\item link local and imported type references in all modules
\item parse values in all modules
\item link local and imported value references in all modules
\item process any macro types
\item normalize types
\item mark recursive types and signal any recursion related errors
\item check for semantic errors in all modules
\item generate C/C++ type information for each ASN.1 type
\item Sort the types from least dependent to most dependent
\item generate the C, C++, IDL or type table
\end{enumerate}

The source code for the compiler resides in {\ufn \dots/compiler/} and the
back ends are in {\ufn \dots/compiler/back-ends/c-gen/}, {\ufn \dots/compiler/back-ends/c++-gen/} and {\ufn \dots/compiler/back-ends/idl-gen/}.

\section{\label{comp-pass1-section}Pass 1: Parsing the Useful Types Module}
The ASN.1 useful types are not hardwired into snacc.  Instead they
have been placed in a separate ASN.1 module.  This allows the user to
define his own useful types or re-define the existing ones without
modifying snacc.  This also has the benefit that names of useful types
are not keywords in the lexical analyzer.  This step is not really a
compiler pass on the module data, however it is described as one for
simplicity.

The useful types module should be passed to snacc with the {\ufn -u}
flag in front of it.  The {\ufn -u} flag tells snacc to treat the
module in a special way.  Instead of parsing the module and generating
code for it, snacc parses the module and makes the types in it
accessible to all of the other modules being parsed.  Note that the
other modules do not need to explicitly import from the useful types
module.  See Section~\ref{comp-pass3-section} for more information on how
useful types affect linking.

The encode, decode, and other routines for the useful types are in the
runtime library.  Currently, the useful types library routines are the
same as the ones the compiler would normally generate given the useful
types module.  However, since they are in the library, you can modify
them to check character sets (string types), or convert local time
formats into their BER equivalent (UTCTime, GeneralizedTime).

The following types are in the useful types module:
\begin{small}
\begin{verbatim}
ASN-USEFUL DEFINITIONS ::=
BEGIN
ObjectDescriptor ::= [UNIVERSAL 7]  IMPLICIT OCTET STRING
NumericString    ::= [UNIVERSAL 18] IMPLICIT OCTET STRING
PrintableString  ::= [UNIVERSAL 19] IMPLICIT OCTET STRING
TeletexString    ::= [UNIVERSAL 20] IMPLICIT OCTET STRING
T61String        ::= [UNIVERSAL 20] IMPLICIT OCTET STRING
VideotexString   ::= [UNIVERSAL 21] IMPLICIT OCTET STRING
IA5String        ::= [UNIVERSAL 22] IMPLICIT OCTET STRING
GraphicString    ::= [UNIVERSAL 25] IMPLICIT OCTET STRING
VisibleString    ::= [UNIVERSAL 26] IMPLICIT OCTET STRING
ISO646String     ::= [UNIVERSAL 26] IMPLICIT OCTET STRING
GeneralString    ::= [UNIVERSAL 27] IMPLICIT OCTET STRING
UTCTime          ::= [UNIVERSAL 23] IMPLICIT OCTET STRING
GeneralizedTime  ::= [UNIVERSAL 24] IMPLICIT OCTET STRING

EXTERNAL         ::= [UNIVERSAL 8] IMPLICIT SEQUENCE
{
  direct-reference      OBJECT IDENTIFIER OPTIONAL,
  indirect-reference    INTEGER OPTIONAL,
  data-value-descriptor ObjectDescriptor OPTIONAL,
  encoding CHOICE
  {
    single-ASN1-type [0] OCTET STRING,  -- should be ANY
    octet-aligned    [1] IMPLICIT OCTET STRING,
    arbitrary        [2] IMPLICIT BIT STRING
  }
}
END
\end{verbatim}
\end{small}

If you use the EXTERNAL type, you must provide the mechanism to encode
and decode the value in the embedded CHOICE, \verb$encoding$.  The
type and transfer syntax of the value in an EXTERNAL type is not known
when the ASN.1 code is compiled by snacc.  Snacc cannot generate
encoders and decoders without complete type information and only
supports a single set of encoding rules, BER\@.

\section{\label{comp-pass2-section}Pass 2: Parsing ASN.1 Modules}
During this pass, all of the specified modules are parsed into the {\em
Module} data structure.  The ASN.1 source files are not consulted
again, after they are parsed.  {\ufn Yacc} and {\ufn lex} are doing the work in
this step. (see files {\ufn snacc.c}, {\ufn lex-asn1.l}, {\ufn parse-asn1.y}
and {\ufn asn1module.h}).

A lexical tie-in is where the yacc parser puts the lexical analyzer
into a different mode (and is usually considered a hack).  The
different modes tokenize symbols differently, which is useful for
skipping well delimited sections that cannot be parsed easily by a
{\ufn yacc} parser on the first pass.  Lexical tie-ins are used in two
places to simplify the ASN.1 grammar sufficiently for {\ufn yacc} and
{\ufn lex}.  There are two special modes in the lexical analyzer, one
for ASN.1 macro definitions and the other for ASN.1 values enclosed in
\{\}'s.

The lexical tie-in for eating macro definition bodies works with macro
definitions of the following form:

\begin{verbatim}
<upper case identifier> MACRO ::= BEGIN ... END
\end{verbatim}

Everything between the {\ASN BEGIN} and {\ASN END} is stuffed into a
string by {\ufn lex} and passed back as single token to the
{\ufn yacc} parser.

Values within \{\}'s are grabbed in a similar way.  Value parsing
cannot really be done at this stage since complete type information is
needed and the types are not fully parsed or linked yet.

Most syntax errors are reported during this pass.  If syntax errors
are encountered, snacc will report as many as it can from the
offending module before the parser is hopelessly lost and then exit.
If the types and values are separated with semi-colons, the parser can
recover after a syntax error and attempt to find more errors in that
module before exiting.


\section{\label{comp-pass3-section}Pass 3: Linking Types}
The third pass links all type references.  Snacc attempts to resolve
any currently visible (i.\ e.\  not in macro definitions or constructed
values) type reference.  This includes type references in simple value
definitions and subtyping information.  The useful types module (if
given) is linked first.

Snacc will exit after this pass if any type references could not be
resolved.  Error messages with file and line number information will
be printed to {\C stderr}.

This pass also counts and stores the number of times a type definition is
referenced locally and from other modules.  This information is used
during the type sorting pass.

First, each module identifier is checked for conflicts with the
others.  If the module identifier includes an OBJECT IDENTIFIER, snacc
only checks for conflicts with the other module identifier OBJECT
IDENTIFIERs.  When only a module name is provided, snacc checks for
conflicts with the the other module names, even if the other module
identifiers include OBJECT IDENTIFIERs.  If the OBJECT IDENTIFIER of
a module identifier contains any value references, it will be ignored
for module look-up purposes.  Note that value references within the
module identifier OBJECT IDENTIFIERs are not allowed in the 1992
version of ASN.1 due to the difficulty in module name resolution they
present.

Two modules with the same name but different OBJECT IDENTIFIERs are
not considered an error within ASN.1.  However, because the generated
files use the module name as part of their name, the code generation
pass will gripe about and fail for modules with the same name.

Next, each module's import {\em lists} are resolved by finding the
named module and then verifying that the named module contains all of
the imported types.

Then for each module, each type reference (except those of the form
{\em modulename.typename}) is assumed to be a local type reference and
the linker attempts to find a local type definition of the same name
to resolve it with.  If a matching local definition is found, the type
reference is resolved and the linker continues with the next type
reference.

For each type reference of the form {\em modulename.typename}, the
linker looks in the module with name {\em modulename} for the type
{\em typename}.  If the type is found the reference is resolved,
otherwise a linking error is reported.  Note that this form of type
reference provides a special scope that does not conflict with other
local or imported types in that module.

For type references that failed to resolve locally and are not of the
form {\em modulename.typename}, the linker looks in the import lists
of the current type reference's module for a type to resolve with.  If
the type is found in the import lists, the reference is resolved.

For the remaining unresolved type references (failed local and legal
import resolution and are not of the form {\em modulename.typename}),
the linker looks in the useful types module, if one was specified with
the {\ufn -u} option.  If the type is found in the useful types module
then the reference is resolved, otherwise a linking error is reported.

Note that when a useful types module is specified, it is globally
available to all modules, but it has the lowest linking priority.
That is, if a type reference can be resolved legally without the
useful types module, it will be.

Some type checking must be done in this pass to link certain types
properly.  These include:
\begin{itemize}
\item {a SELECTION type must reference a field of a CHOICE type.}
\item {a COMPONENTS OF type in a SET must reference a SET.}
\item {a COMPONENTS OF type in a SEQUENCE must reference a SEQUENCE.}
\end{itemize}


\section{\label{comp-pass4-section}Pass 4: Parsing Values}
The fourth pass attempts to parse any value that is enclosed in \{\}'s in
the given modules.  INTEGERS, REALs and BOOLEANS that are not enclosed in
braces are parsed in the first pass.

The value parser is implemented without {\ufn yacc} and {\ufn lex} and
uses each value's type information to help parse the value.  Values
within \{\}'s hidden within types such as default values and parts of
subtypes are not parsed.  Since subtypes and default values do not
affect the generated code, upgrading the value parser in this respect
is not very useful.

The only type of value in \{\}'s that is parsed is the OBJECT
IDENTIFIER\@.  All of the OBJECT IDENTIFIER value forms are supported
but snacc loosens the restrictions on using arc names defined in the
OBJECT IDENTIFIER tree.

ASN.1 allows OBJECT IDENTIFIER values to reference special built-in
arc names from the OBJECT IDENTIFIER tree defined in Annexes B, C and
D of X.208.  For example the first arc in an OBJECT IDENTIFIER value
can be either {\ASN ccitt} {\ASN iso} or {\ASN joint-iso-ccitt}.  The
acceptable arc names are context dependent; for example the second arc
can be one of {\ASN standard}, {\ASN registration-authority},
{\ASN member-body} or {\ASN identified-organization} only if the first
arc was {\ASN iso} or 1.

Snacc uses a simplified algorithm to handle references to the arc
names defined in the OBJECT IDENTIFIER tree.  Any arc value that is
represented by a single identifier is checked to see if it is one of
the arc names defined in the OBJECT IDENTIFIER tree; context is
ignored.  If the identifier matches one of the arc names then its
value is set accordingly.  The lack of context sensitivity in snacc's
algorithm may cause the arc name to link with an arc name from the
OBJECT IDENTIFIER tree when a local or imported INTEGER was desired.
The following is the list special arc names that snacc understands and
their values (see {\ufn \dots/compiler/core/oid.c}):

\begin{itemize}
\setlength{\itemsep}{0pt}
\setlength{\parsep}{0pt}
\nspace{0}
\item {ccitt = 0}
\item {iso = 1}
\item {joint-iso-ccitt = 2}
\item {standard = 0}
\item {registration-authority = 1}
\item {member-body = 2}
\item {identified-organization = 3}
\item {recommendation = 0}
\item {question = 1}
\item {administration = 2}
\item {network-operator = 3}
\end{itemize}

\section{\label{comp-pass5-section}Pass 5: Linking Values}
The fifth pass links value references. The value linker looks for
value references to resolve in value definitions and type definitions,
including default values and subtyping information.  The value linking
algorithm is virtually identical to the type linking pass (see Section
\ref{comp-pass3-section}).

Currently the value parsing is limited to OBJECT IDENTIFIER values.
Simple values that are not between \{\}'s are parsed in the first
pass.  Here is an example that illustrates the OBJECT IDENTIFIER
parsing and linking.  The following values:

\begin{small}
\begin{verbatim}
foo OBJECT IDENTIFIER ::= { joint-iso-ccitt 2 88 28 }
bar OBJECT IDENTIFIER ::= { foo 1 }
bell INTEGER ::= 2
gumby OBJECT IDENTIFIER ::= { foo bell }
pokie OBJECT IDENTIFIER ::= { foo stimpy(3) }
\end{verbatim}
\end{small}

\noindent
are equivalent to this:

\begin{small}
\begin{verbatim}
foo OBJECT IDENTIFIER ::= { 2 2 88 28 }
bar OBJECT IDENTIFIER ::= { 2 2 88 28  1 }
bell INTEGER ::= 2
gumby OBJECT IDENTIFIER ::= { 2 2 88 28 2 }
pokie OBJECT IDENTIFIER ::= { 2 2 88 28 3 }
\end{verbatim}
\end{small}

Note that in version 1.0, named arcs (e.g. {\ASN stimpy(3)}) were
promoted to full integer values.  This was wrong---many standards
re-used them (e.g. X.500 and {\ASN ds(5)}) leading to multiply defined
integer values.  If you want to improve the value parsing, look in
{\ufn \dots/compiler/core/val-parser.c}

\section{\label{comp-pass6-section}Pass 6: Processing Macros}

The fifth pass processes macros.  For all macros currently handled,
snacc converts type definitions inside the macro to type references
and puts the type definition in the normal scope.  This way, the code
generator does not have to know about macros to generate code for the
types defined within them.

The only macro that receives any special processing is the SNMP
OBJECT-TYPE macro.  This macro's information defines an OBJECT
IDENTIFIER or INTEGER to type mapping for use with any ANY DEFINED BY
type.  Note that the OBJECT-TYPE macro has been extended beyond its
SNMP definition to allow integer values for INTEGER to type mappings.

ASN.1 allows you to define new macros within an ASN.1 module;  this
can change the grammar of the ASN.1 language.  Since snacc is
implemented with {\ufn yacc} and yacc grammars cannot be modified
easily during runtime, snacc cannot change its parser in response to
macro definitions it parses.

Any macro that snacc can parse has been explicitly added to the yacc
grammar before compiling snacc.  When a macro that snacc can parse is
parsed, a data structure that holds the relevant information from the
macro is added to the parse tree.  The type and value linking passes
as well as the macro processing and possibly the normalization pass
need to be modified to handle any new macros that you add.

The following macros are parsed:

\begin{itemize}
%\begin{linespacing}{0.5}
\setlength{\itemsep}{0pt}
\setlength{\parsep}{0pt}
\nspace{0}
\item{ OPERATION (ROS) }
\item{ ERROR (ROS) }
\item{ BIND (ROS) }
\item{ UNBIND (ROS) }
\item{ APPLICATION-SERVICE-ELEMENT (ROS) }
\item{ APPLICATION-CONTEXT }
\item{ EXTENSION (MTSAS)}
\item{ EXTENSIONS (MTSAS) }
\item{ EXTENSION-ATTRIBUTE (MTSAS) }
\item{ TOKEN (MTSAS) }
\item{ TOKEN-DATA (MTSAS)}
\item{ SECURITY-CATEGORY (MTSAS) }
\item{ OBJECT (X.407) }
\item{ PORT   (X.407) }
\item{ REFINE (X.407)}
\item{ ABSTRACT-BIND (X.407) }
\item{ ABSTRACT-UNBIND (X.407) }
\item{ ABSTRACT-OPERATION (X.407) }
\item{ ABSTRACT-ERROR (X.407) }
\item{ ALGORITHM (X.509)}
\item{ ENCRYPTED (X.509)}
\item{ PROTECTED (X.509)}
\item{ SIGNATURE (X.509)}
\item{ SIGNED    (X.509)}
\item{ OBJECT-TYPE (SNMP) }
%\end{linespacing}
\end{itemize}

However, no code is generated for these macros.  As stated above, only
the OBJECT-TYPE macro affects the encoders and decoders.

\section{\label{comp-pass7-section}Pass 7: Normalizing Types}
The sixth pass normalizes the types to make code generation simpler.
The following is done during normalization:
\begin{itemize}

\item[1.] { COMPONENTS OF types are replaced with the contents of the SET
or SEQUENCE components that they reference.}

\item[2.] { SELECTION types are replaced with the type they reference.}

\item[3.] { SEQUENCE, SET, CHOICE, SET OF and SEQUENCE OF {\em definitions}
embedded in other types are made into separate type definitions. }

\item[4.] { For modules in which ``IMPLICIT TAGS'' is specified, tagged
type references such as {\ASN [APPLICATION 2] Foo} are marked IMPLICIT
if the referenced type ({\ASN FOO} in this case) is not an untagged
CHOICE or untagged ANY type.}

\item[5.] { INTEGERs with named numbers, BIT STRINGs with named bits and
ENUMERATED types embedded in other types are made into separate type
definitions.}
\end{itemize}

The COMPONENTS OF and SELECTION type simplifications are obvious but
the motivation for the others may not be so obvious.  The third type of
simplification makes type definitions only one level deep.  This
simplifies the decoding routines since snacc uses local variables for
expected lengths, running length totals and tags instead of stacks.

The implicit references caused by ``IMPLICIT TAGS'' are marked
directly on type references that need it.  This saves the code
generators from worrying about whether implicit tagging is in effect
and which types can be referenced implicitly.

The types with named numbers or bits are made into a separate type to
allow the C++ back end to simply make a class that inherits from the
INTEGER or BIT STRING class and defines the named numbers or bits
inside an enum in the new class.  This is described further in the C++
code generation chapter.

\section{\label{comp-pass8-section}Pass 8: Marking Recursive Types}


This pass marks recursive types and checks for recursion related
errors.  To determine whether a type definition is recursive, each
type definition is traced to its leaves, checking for references to
itself.  Both local and imported type references within a type are
followed to reach the leaves of the type.  A leaf type is a simple
(non-aggregate) built-in type such as an INTEGER or BOOLEAN\@. At the
moment, recursion information is only used during the type dependency
sorting pass.

{\em Snacc} attempts to detect two types of recursion related errors.  The
first type of error results from a recursive type that is composed
solely of type references.  Types of this form contain no real type
information and would result in zero-sized values.  For example the
following recursive types will generate this type of warning:
\begin{small}
\begin{verbatim}
A ::= B
B ::= C
C ::= A
\end{verbatim}
\end{small}

The other recursion related error results from a type whose value will
always be infinite in size.  This is caused by recursion with no
optional component that can terminate the recursion.  If the recursion
includes an OPTIONAL member of a SET or SEQUENCE, a CHOICE member, or
a SET OF or SEQUENCE OF, the recursion can terminate.

Both of the recursion errors generate warnings from snacc but will
not stop code generation.


\section{\label{comp-pass9-section}Pass 9: Semantic Error Checking}
The ninth pass checks for semantic errors in the ASN.1 specification
that have not been checked already. Both the type linking pass and the
recursive type marking pass do some error checking as well. Snacc attempts
to detect the following errors in this pass:

\begin{itemize}
\item { elements of CHOICE and SET types must have distinct tags.}

\item { CHOICE, ANY, and ANY DEFINED BY types cannot be implicitly tagged. }

\item { type and value names within the same scope must be unique. }

\item { field names in a SET, SEQUENCE or CHOICE must be distinct.  If
a CHOICE is a member of a SET, SEQUENCE or CHOICE and has no field name,
then the embedded CHOICE's field names must be distinct from its
parents to avoid ambiguity in value notation.}

\item { an APPLICATION tag code can only be used once per module. }

\item { each value in a named bit list (BIT STRINGs) or named number
list (INTEGERs and ENUMERATED) must be unique within its list.}

\item { each identifier in a named bit list or named number list must
be unique within its list.}

\item { the tags on a series of one or more consecutive OPTIONAL or DEFAULT
SEQUENCE elements and the following element must be distinct. }

\item { gives a warning if an ANY DEFINED BY type appears in a
SEQUENCE before its identifier or in a SET\@.  These would allow encodings
where the ANY DEFINED BY value was prior to its identifier in the
encoded value; ANY DEFINED BY values are difficult to decode without
knowing their identifier.}

\end{itemize}

Snacc does not attempt to detect the following errors due the
limitations of the value parser.
\begin{itemize}
\item { SET and SEQUENCE values can be empty (\{\}) only if the SET or
SEQUENCE type was defined as empty or all of its elements are marked
as OPTIONAL or DEFAULT.}

\item { each identifier in a BIT STRING value must from that BIT
STRING's named bit list (this could be done in an improved value
linker instead of this pass).}
\end{itemize}


\section{\label{comp-pass10-section}Pass 10: Generating C/C++ Type Information}

This pass fills in the target language type information.  The process
is different for the C and C++ back ends since the C++ ASN.1 model is
different and it was developed later (more design flaws had been
corrected for the C++ backend).

For C and C++ there is an array that contains the type {\em definition}
information for each built-in type.  For each built-in ASN.1 type, the
C array holds:

\begin{description}
\item[typename] {the C {\C typedef} name for this type definition.}

\item[isPdu]    {TRUE if this type definition is a PDU\@. This is set
for types used in ANY and ANY DEFINED BY types and those indicated by
the user via compiler directives.  Additional interfaces to the encode
and decode routines are generated for PDU types.  The SNMP OBJECT-TYPE
macro is the current means of indicating whether a type is used within
an ANY or ANY DEFINED BY type.}

\item[isPtrForTypeDef] { TRUE if other types defined solely by this type
definition are defined as a pointer to this type.}

\item[isPtrForTypeRef] { TRUE if type references to this type
definition from a SET or SEQUENCE are by pointer.}

\item[isPtrForOpt] { TRUE if OPTIONAL type references to this type
definition from a SET or SEQUENCE are by pointer.}

\item[isPtrInChoice] { TRUE if type references to this type
definition from a CHOICE are by pointer.}

\item[optTestRoutineName] { name of the routine to test whether an
OPTIONAL element of this type in a SET or SEQUENCE is present.
Usually just the name of a C macro that tests for NULL.}

\item[printRoutineName] { name of this type definition's printing routine.}
\item[encodeRoutineName]{ name of this type definition's encoding routine.}
\item[decodeRoutineName]{ name of this type definition's decoding routine.}
\item[freeRoutineName]  { name of this type definition's freeing routine.}
\end{description}

The C++ type definition array is similar to C's. It contains:

\begin{description}
\item[classname] { holds the C++ {\C class} name for this type definition.}
\item[isPdu] { same as C isPdu except that is does not affect the code
generation since the C++ back end includes the extra PDU encode and
decode routines by default.}
\item[isPtrForTypeDef] { same as C isPtrForTypeDef. }
\item[isPtrForOpt] { same as C isPtrForOpt.}
\item[isPtrInChoice] { same as C isPtrInChoice}
\item[isPtrInSetAndSeq] { whether type references to this class
from a SET or SEQUENCE are by pointer.}
\item[isPtrInList] {whether type references to this class
from a SET OF or SEQUENCE OF are by pointer.}
\item[optTestRoutineName] { name of the routine to test whether an
OPTIONAL element of this type in a SET or SEQUENCE is present.
Usually is just name of a C macro that tests for NULL.}
\end{description}

The first step of this pass uses the type arrays to fill in the C or
C++ type {\em definition} information for each module's ASN.1 type
definitions.  This is done for the useful types module as well.

The next step goes through each constructed type and fills in the type
{\em reference} information for each reference to a built-in, user defined
or useful type.  Much of the type reference information is taken from
the referenced type's definition information.  The type reference
information contains the following (for both C and C++):

\begin{description}
\item[fieldName] { field name for this type if it is referenced from
a CHOICE, SET or SEQUENCE.}
\item[typeName] { type name of the referenced type.}
\item[isPtr] { whether this reference is by pointer.}
\item[namedElmts] { named elements for INTEGER, ENUMERATED or BIT
STRING types with their C names and values.}
\item[choiceIdValue] { if this type reference is in a CHOICE, this
holds the value of the CHOICE's choiceId that indicates the presence
of this field.}
\item[choiceIdSymbol] { if this type reference is in a CHOICE, this
holds the C enum value symbol that has the choiceIdValue value.}
\item[optTestRoutineName] { name of the routine or macro to test for
the presence of this element if it is an OPTIONAL element of a SET or SEQUENCE.}
\end{description}

\section{\label{comp-pass11-section}Pass 11: Sorting Types}

This pass sorts the type definitions within each module in order of
dependence.  ASN.1 does not require the types to be defined before
they are referenced but both C and C++ do.  Without this pass, the
generated types/classes would probably not compile due to type
dependency problems.  There is no attempt to order the modules;
command line order is used for the module dependence.  If you have
problems with mutually dependent modules, the simplest approach is to
combine the dependent modules into a single ASN.1 module.

Some compilers such as CASN1 \cite{CASN1} require the user to order
the types within the ASN.1 modules.  This can be tedious and since
snacc may generate new type definitions from nested aggregate type
definitions in the normalization pass, the user does not have complete
control over the order of every type definition.  (The user could use
the {\ufn -P} option to get the normalized ASN.1 and then order it but
that is painful as well.)

Snacc attempts to sort the types from least dependent to most
dependent using the following convoluted algorithm:

First, separate the type definitions within a module into the groups:
\begin{itemize}
\item[1.] { type definitions that are defined directly from simple built-in
types such as INTEGER.}

\item[2.] { types such as SET, SEQUENCE, SET OF, SEQUENCE OF and CHOICE
that contain no references to types defined in this module.  That, is
they are defined from only simple built-in types, imported types or
useful types.}

\item[3.] { type definitions that reference locally defined types.}

\item[4.] { type definitions that are not referenced by any local types.}
\end{itemize}

Only the 3rd group of type definitions needs more sorting.  After it
has been sorted, the groups are merged in the order 1, 2, 3, 4 to
yield a sorted type definition list.

Now we describe how the 3rd group of type definitions is sorted.
\begin{itemize}

\item[1.] {for each type definition in the third group, a list of its local type
references is built and attached to it.  This type reference list only
goes one level deep; it does not follow type references to find more
type references.}

\item[2.] { all of the linearly-dependent types are removed and sorted.
This is done by repeatedly removing type definitions that do not
directly depend on any other type definitions that remain in the 3rd
group.  The process of removing the type definitions sorts them.}

\item[3.] { the type definitions that were not removed in step 2 are
divided into two groups: recursive and non-recursive.  The
non-recursive types depend on the recursive ones since they are still
in the list after step 2.}

\item[4.] { the non-recursive types from step 3 are sorted as in step
2.  All of them should sort linearly since none are recursive. }

\item[5.] { if the target language is C, any SET OF or SEQUENCE OF
types are separated from the recursive type definitions built in step 3.
This is done because the C representation of a list type is generic
(uses a {\C void~*} to reference the list element) and therefore does
not really depend on the list's element type.}

\item[6] { the list of local type references for the recursive types
from step 3 is re-generated as in step 1 using a relaxation:  types
referenced as pointers are not added to a type's reference list.}

\item[7] { the recursive types from step two are re-sorted as in step
2 using their new local type reference lists. Two lists are formed,
those that sorted linearly and those that did not. Hopefully the
latter list will be empty.}
\end{itemize}

To form a sorted third group, the lists are merged in the following order:
\begin{itemize}
\item {linearly sorted types from step 2}
\item {separated list types (C only) from step 5}
\item {sorted recursive types from step 7}
\item {unsorted recursive types from step 7 (hopefully empty)}
\item {sorted non-recursive types from step 4}
\end{itemize}


In C, the code generator defines both {\C typedef} names and
{\C struct} tags (names).  For example,
\begin{verbatim}
Foo ::= SET { a INTEGER, b BOOLEAN }

Bar ::= SEQUENCE { a OBJECT IDENTIFIER, b Foo }
\end{verbatim}
translates to the following C data types:
\begin{verbatim}
typedef struct Foo /* SET */
{
    AsnInt a; /* INTEGER */
    AsnBool b; /* BOOLEAN */
} Foo;

typedef struct Bar /* SEQUENCE */
{
    AsnOid a; /* OBJECT IDENTIFIER */
    struct Foo *b; /* Foo */
} Bar;
\end{verbatim}

Note that both the {\C struct} and the {\C typedef} have the name
{\C Foo}.  Also note that the Bar type references the {\C Foo} via
{\C struct Foo~*}.

For types such as {\C Bar} that contain the {\C Foo} type,
{\C Foo} is referenced as {\C struct Foo~*} instead of just
{\C Foo~*} because C allows you to use the type {\C struct Foo~*}
(incomplete type) in defining types even prior to the actual
declaration of the the {\C struct Foo}. The {\C Foo~*} type can
{\em only} be used after the {\C Foo typedef} declaration.  The use
of incomplete types can often overcome recursion related type ordering
problems (not relevant in this example since they are not recursive).

\section{\label{comp-pass12-section}Pass 12: Generating Code}

This pass creates and fills the source files with C or C++ code or
produces a type table containing the type descriptions from all of the
parsed modules, including the useful types module (if given).  The
purpose of the normalization, sorting and error detection passes is to
simplify this pass.

The normalization pass simplified the ASN.1 types in various ways to
make C/C++ type and code generation simpler.

The type sorting pass hopefully eliminates type dependency problems in the
generated code.  The C/C++ type generator simply proceeds through the
ordered type list writing the C/C++ type definitions to a header file.

The error detection and linking passes will make snacc exit if errors
are found, so the code generation pass can assume the ASN.1 types are
virtually error free.  This usually allows snacc to exit gracefully
instead of crashing due to an undetected error.

The type table data structure is similar to snacc's parse tree for the
ASN.1 modules but it is much simpler.  This is because all of the type
linking and error checking has been done.  The type definitions in the
type tables are in defined by the type sorting pass (dependency).

The next chapters describe the code that is generated by snacc and the
libraries the generated code uses.