1 The source code --------------- 1.1 Inventory --------- Inform is written in portable ANSI C and the source code is divided into 21 files of code, called "sections", plus 1 #include file of linkage, constant and type definitions. These files are: arrays.c asm.c bpatch.c chars.c directs.c errors.c expressc.c expressp.c files.c inform.c lexer.c linker.c memory.c objects.c states.c symbols.c syntax.c tables.c text.c veneer.c verbs.c header.h Note that all their names fit into the 8.3 filenaming convention. The subdivision into 21 sections is intended to ensure that each .c file can be compiled into one linkable object file: under some C compilers object code files cannot exceed 64K in length. On my machine, expressc.o (the object code derived from expressc.c) is the largest, at 40K (about a third of which is the static table of operator data). A concise tree giving the structure of the source code follows. A section name is given brackets when it has already been given on a previous line: so "(inform.c)" is intended to be read as "now back to inform.c again". A name written as a function, like "compile()", is indeed a function, whose arguments are omitted here. Text in square brackets indicates the presence of interesting tables of static data. Finally, note that this structure is not absolutely rigorous: the error-reporting routines in "errors.c", for example, are called from all over Inform, not just from the lexical analyser. inform.c ICL parser: switches, filename translation, path variables memory.c ICL memory command parser: sets memory settings. (inform.c) compile(): polling all sections to manage variables, allocate and free arrays syntax.c syntax analyser, top level lexer.c lexical analyser: converts source text to a stream of tokens [Tables of all Inform keywords] chars.c performs character set translations files.c reads source code files into buffers for the lexer; performs miscellaneous file I/O symbols.c keeps table of symbol names found in the source; recognises from and adds to this table errors.c issues error, fatal error and warning messages (syntax.c) parse_program(): top level routine in syntax analyser parse_directive() directs.c parse_given_directive(): parses and obeys the easier directives; manages conditional compilation; delegates directive parsing down to other sections for harder cases: text.c make_abbreviation(): Abbreviate arrays.c make_global(): Array, Global objects.c make_attribute(): Attribute make_property(): Property make_class(): Class make_object(): Object verbs.c make_fake_action(): Fake_Action make_verb(): Verb extend_verb(): Extend linker.c link_module(): Link (symbols.c) assign_symbol(): giving value and type to symbols (syntax.c) parse_routine() parse_code_block() states.c parse_statement(): assigns ".Label" labels, parses and generates code for statements parse_action(): handles statements parse_print(): handles print and print_ret asm.c parse_assembly(): handles assembly language source code (preceded by "@") expressp.c parse_expression(): parses all expressions (including constants), conditions and assignments. expressc.c [Table of operators] code_generate(): generates code from parse trees previously found by parse_expression() (asm.c) assemble_2_to() (and many other similarly named routines) [Database of all Z-machine opcodes] assemble_instruction(): assembles a single Z-code instruction assemble_label_no(): puts label N here assemble_routine_end(): finishes routine, backpatches and optimises branches (text.c) compile_string(): translates ASCII text to Z-encoded text the dictionary manager the abbreviations optimiser (chars.c) performs character set translations veneer.c compile_veneer(): compiles in any whole routines needed by the rest of the compiled code [Table of Inform source code for veneer routines] tables.c construct_storyfile(): glues together all the code, dictionary, object tree, etc. into a story file bpatch.c backpatch the code in the light of recent knowledge 1.2 Map --- Here follows a map of the Inform archipelago, marking the inhabited islands, their shipping lanes and chief imports and exports: command line and/or ICL files in | | ICL commands \|/ +----------+ FRONT | inform.c | END | memory.c | +----------+ | | filenames | \|/ +------------+ LEXICAL ------> files.c -----> | lexer.c | ANALYSER source chars | symbols.c | code in +------------+ /|\ | symbol | | values | | tokens | \|/ +------------+ SYNTAX | syntax.c | -----+------> asm.c --->---\ ANALYSER: | states.c | / assembly /|\ initial| STATEMENTS | . | / language | code| | ---------- | / | | @ ASSEMBLY | asm.c | -/ | | | ---------- | | | | . | parse trees | \|/ EXPRESSIONS | expressp.c | --+-------> expressc.c asm.c | (map 6.1) | \ | | . | \ | | . | \ TEXT | | ---------- | strings +-------+Z-text | DIRECTIVES | directs.c | \---->|text.c |-->---)|(--\ | . | |chars.c| | | | . | +-------+ | | | . | dictionary| | | | . | alphabets \|/ \|/ | | . | | | | | arrays.c | ------->------\ | | | | . | array area | | raw| | | . | | | Z-code| | | objects.c | ----->-----\ | | | | | verbs.c | objects | | | | | +------------+ | | | | | | \|/\|/\|/ \|/ | | grammar +----------+----------+ | \---------------> | tables.c bpatch.c | | +----------+----------+ | | OUTPUT | | | | | Z-machine \|/ Z-code| | up to | | | code area | \|/ \|/ \-----------> files.c | | \|/ story file out (For clarity, the linker and a few smaller tables are missed out; and the "service" sections of Inform, such as "errors.c" and the allocation code in "memory.c", are missed out since they are, so to speak, pipes and sewers which lie beneath the surface of the ocean.) 1.3 Naming conventions ------------------ The "header.h" makes over 700 constants using #define. These are mainly in capital letters and are followed by _ and then some short code indicating what kind of constant is being defined: for instance, NUMBER_TT means "the token type ". We write *_TT for the set of constants ending in _TT. Similarly, though to a lesser extent, groups of related variables and routines have grouped names. ------------------------------------------------------------------------- Set of constants Used for ------------------------------------------------------------------------- *_Extension File extensions, such as ".z5", used if the host OS supports them *_Directory Initial values for the ICL path variables (e.g., default pathname where story files are written to) *_TT Token types *_CODE Token values for statement and directive names *_COND Token values for textual condition names *_SEGMENT Token values for object definition segment names *_MK Token values for misc statement keywords *_TK Token values for "trace" directive keywords *_DK Token values for misc directive keywords *_SC Token values for system constant names (* is written in lower case) *_SYSF Token values for system function names [In all of the above eight cases, * is the name of the statement, keyword, etc. referred to, written in upper case except as specified above] *_SEP Token values for separators (the name sometimes reflects the text, e.g., DARROW_SEP for the double-length arrow "-->"; sometimes its use, e.g. NOTEQUAL_SEP for "~=") *_OP Token values for operators *_A Associativity values for operators *_U "Usage" (infix, prefix or postfix) values for operators *_T Symbol types *_SFLAG Symbol flags (bitmasks containing one bit set, so that (sflags[i] & *_SFLAG) is true if flag is set) *_CONTEXT Contexts in which the expression evaluator can run (e.g., "void", "condition") *_zc Internal numbers referring to Z-machine opcodes (* is the Standard 0.2 name for the opcode, written in lower case) *_OT Assembly operand types *_STYLE Two constants used to set whether the Z-machine has "time" or "score/moves" on its status line *_ZA Z-machine areas: e.g. PROP_ZA refers to the property values table area *_MV Marker values *_RTE Run-time error numbers *_VR Veneer routines (* is the name of the routine, usually in mixed case) *_DBR Record types in debugging information files ------------------------------------------------------------------------- Set of variables Used for ------------------------------------------------------------------------- *_switch Flag indicating whether a command-line switch such as -s is on or off *_setting Numerical value set by a command-line switch such as -t3 MAX_* A limit on something: note that a few of these are #define'd but most are memory setting variables no_* Number of things of this type made so far max_* Maximum number of things of this type made token_* Three variables used to hold the value, type and lexeme text for the last token read *_trace_level 0 if tracing information is not being printed out about *; otherwise, the larger this is, the more output is produced *_offset Byte offset in the Z-machine, either from the start of this Z-machine area or from the start of Z-machine memory *_top Pointer marking the current end in some uchar array (usually holding a Z-machine area being put together) ------------------------------------------------------------------------- Set of routines Used for ------------------------------------------------------------------------- init_*_vars() Routine in which section * of Inform initialises its variables to begin compilation *_begin_pass() ...and in which it initialises its variables at the start of the source code pass *_allocate_arrays() ...and in which is allocates any memory or arrays it needs to begin compilation *_free_arrays() ...and in which it deallocates any memory or arrays it has allocated, after compilation parse_*() Routine in the syntax analyser to parse the source code construct * assemble_*() Instructing the assembler to generate an instruction: assemble_#() (where # is a number from 0 to 4) an instruction with # operands assemble_#_to() an instruction with #operands which stores a result assemble_#_branch() an instruction with #operands which makes a conditional branch *_linenum() Keeping and writing line references to the debugging information file 1.4 Typedef-named types ------------------- ------------------------------------------------------------------------- Typedef name Where defined Used for ------------------------------------------------------------------------- int32 H signed 32-bit integer uint32 H unsigned 32-bit integer uchar H unsigned char assembly_operand H holding Z-machine numbers in the form used in Z-code, together with linkage information about how they were calculated assembly_instruction H a convenient representation of an instruction of Z-code to assemble opcode asm.c everything about a Z-machine opcode: how many operands it has, whether branch or store, etc. verbl H grammar line of 8 token values verbt H grammar table of grammar lines prop H list of values for a property propt H property values table fpropt H the same, but with attributes too objectt H object tree-position and attributes dict_word H Z-text of a dictionary word dbgl H source code reference used for the debugging file (to file, line, char) keyword_group H the plain text of a group of keywords (such as: all the statement names) token_data H lexical tokens expression_tree_node H node in a parse tree produced by the section "expressp.c" operator H everything about an operator: its name, how to recognise it, its usage and associativity, etc. memory_block H an extensible area of memory (allocated in 8K chunks as required) tlb_s text.c used in abbreviations optimiser optab_s text.c used in abbreviations optimiser FileId H filename and handle for a source file ErrorPosition H filename and line reference for error message printing purposes LexicalBlock lexer.c name, line count, etc. within a block of text being lexed Sourcefile lexer.c buffer, pipeline and lexical block for a source code file being lexed ImportExport linker.c holds import/export records Marker linker.c holds marker records VeneerRoutine veneer.c holds low-level Inform source code for a veneer routine ------------------------------------------------------------------------- "H" is an abbreviation here for "header.h"