12 Low-level language features --------------------------- 12.1 Using the "Trace" directive --------------------------- The trace directive is primarily intended for compiler maintenance purposes. It can cause Inform to print tracing information on nine different aspects of what it does, in most cases at several levels of detail. ------------------------------------------------------------------------- Tracing option Level Output ------------------------------------------------------------------------- "assembly" 1 all assembly language produced, in an imitation of "@..." syntax 2 and also the actual bytes produced, in hexadecimal (Note that because trace printing occurs before label optimisation, addresses of instructions cannot be relied on: nor can the values of operands which are marked for later backpatching.) "tokens" 1 lexemes of all tokens output from the lexer to the syntax analyser, and indication when a token is put back 2 full token descriptions 3 and also the lexical context in which they were identified "expressions" 1 annotated parse trees for all expressions being code-generated 2 and the behaviour of the shift-reduce parser when parsing it, and the parse tree received by the code generator (not yet annotated) 3 and the token-to-etoken translations made, and the parse tree produced by the emitter before lvalue checking "linker" used in 1 list all import/export information sent compiling module 2 and all marker information sent "linker" used in 1 list all modules linked in compiling story file 2 and all import/export information received 3 and all marker information received 4 and how each marker was dealt with "lines" --- (currently inoperable) ------------------------------------------------------------------------- "dictionary" 1 print current dictionary contents (including verb/preposition nos.) in alphabetical order "objects" 1 print current object tree "verbs" 1 print current grammar "symbols" 1 print out those entries in the symbols table which are not: unknown, generated by the veneer or in a system file 2 print entire symbols table ------------------------------------------------------------------------- 12.2 System constants and other secret syntax ---------------------------------------- The addresses of many important tables in the Z-machine are not recorded in the header, or anywhere else: but they are known to the compiler, and needed by the run-time code. The system constants are provided mainly as a way of passing this information into run-time code, usually code within the veneer. Constant Evaluates to -------------------------------------------------------------------------- #version_number Version number of Z-machine format being compiled to #dict_par1 Byte offset in a dictionary table entry of the the first ("flags") parameter byte #dict_par2 And the second ("verb") byte #dict_par3 And the third ("adjective") byte (note that these three depend only on the version number) #largest_object The largest object number constructed, plus 256 (the "+256" is due to a quirk in the implementation used by Inform 3 or so; since this constant is not a very public feature, I don't mind leaving it in) #actual_largest_object Ditto, but without the "+256" #adjectives_table Byte addresses of adjectives table, #actions_table actions table, #preactions_table "preactions table", #classes_table class number to object number table, #identifiers_table property ID names table #array_names_offset array names table #readable_memory_offset The byte address of the first byte which isn't accessible using readb and readw: i.e., the first byte of the first Z-code routine #code_offset Packed address of Z-code area #strings_offset Packed address of static strings area #array__start Start of array space (byte address) #array__end End of array space + 1 #cpv__start Start of common property values space (byte address) #cpv__end End + 1 #ipv__start Start of individual property values space (byte addr.) #ipv__end End + 1 -------------------------------------------------------------------------- Two more secret syntaxes were introduced in Inform 6.10. The first changes the behaviour of the "Include" directive: Include "Language__"; (and no other string) includes the current language definition file, whose name is an ICL variable. The second controls which grammar table format is generated: normally GV1, but this can be set to GV2 by Constant Grammar__Version = 2; The "Grammar__Version" symbol is redefinable; if no such Constant directive is made, then it will have the value 1. It needs to be changed to its final value before the first Verb, Extend or Fake_Action directive is reached. 12.3 The "Zcharacter" directive -------------------------- Finally, the "Zcharacter" directive is provided mostly for the benefit of language definition files, for configuring Inform games to use a non-English alphabet or character set. (See the Inform Translator's Manual.) Different forms of "Zcharacter" allow both the Z-machine alphabet table and the Unicode translation table to be specified. (i) The Z-machine alphabet The Z-machine's text encryption system is optimised to make it especially cheap on memory to use letters in alphabet 1, then cheapish to use letters in alphabets 2 and 3 but rather expensive to use letters which aren't in any of the three. We aren't much concerned about lack of memory in the game as a whole, but the economy is very useful in dictionary words, because dictionary words are only stored to a "resolution" of nine Z-characters. Thus, in a dictionary word: something from alphabet 1 costs 1 Z-character 2 or 3 2 Z-characters outside the alphabets costs 4 Z-characters The standard arrangement of these alphabets (A1 lower case a to z, A2 upper case A to Z, A3 numerals and punctuation marks) includes no accented characters. In a language with frequent accented or non-English letters, such as Finnish or French, this means that 4 of the 9 Z-characters in a dictionary word may be wasted on just one letter. For instance, 't@'el@'ecarte' is stored as 't@'el' 't@'el@'ephone' is stored as 't@'el' (there are not even enough of the 9 Z-characters left to encode the second e-acute, let alone the "c" or the "p" which would distinguish the two words). On the other hand if e-acute could be moved into Alphabet 3, say in place of one of the punctuation marks which is never needed in dictionary words, the two e-acutes would take just 2 Z-characters each and then 't@'el@'ecarte' would be stored as 't@'el@'ecar' 't@'el@'ephone' would be stored as 't@'el@'epho' which is far more acceptable. The Z-machine has a mechanism (at least in Version 5 or better) for changing the standard alphabet tables, invented to make the German translation of Infocom's "Zork I" work. The "Trace dictionary" will print the current contents of the alphabet table (as well as the dictionary contents). (i).1 Moving a single character in There are two ways to change the standard English alphabet table. One way, which is probably good enough for a language definition file for a mostly Latin language (where only up to around 10 accented or non-English letters are commonly needed) is to move characters into the least-used positions in Alphabet 2. For this, use the directive: Zcharacter ; It will only be possible if there's a letter in A2 which hasn't yet been used (otherwise, changing that entry in A2 will make some of the text already compiled wrong). The directive is thus only practicable early in compilation, such as at the start of the library definition file. For instance the code Trace dictionary; Zcharacter '@'e'; Zcharacter '@`a'; Zcharacter '@^a'; Trace dictionary; might produce output including... Z-machine alphabet entries: a b c (d) e (f) g (h) i j k l m n o (p)(q) r s t u v (w)(x)(y)(z) A (B) C (D) E (F)(G) H I (J)(K) L (M)(N) O (P)(Q) R S (T)(U)(V)(W)(X)(Y)(Z) ( ) ^ 0 1 (2) 3 (4)(5) 6 (7)(8) 9 (.) , (!)(?)(_)(#)(')(~) / (\) - (:)(()()) Z-machine alphabet entries: a b c (d) e (f) g (h) i j k l m n o (p)(q) r s t u v (w)(x)(y)(z) A (B) C (D) E (F)(G) H I (J)(K) L (M)(N) O (P)(Q) R S (T)(U)(V)(W)(X)(Y)(Z) ( ) ^ 0 1 @'e 3 @`a@^a 6 (7)(8) 9 (.) , (!)(?)(_)(#)(')(~) / (\) - (:)(()()) ...in which note that bracketed letters are ones which have not been encoded yet. The three Zcharacter directives have inserted e-acute, a-grave and a-circumflex into the positions previously occupied by the numerals 2, 4 and 5. It is reasonable to make up to about 10 such insertions, after which any further attempts will only be successful if the game being compiled doesn't (let us say) have a title like "123456789: An Interactive Lesson In Counting", which would have used 9 of the numerals and forced them to stay in the final alphabet table. (i).2 Changing the entire alphabet This has to be done very early in compilation, before any strings are translated, so that it can't be done by a language definition file. One might put such directives into a file called "Alphabet.inf" and then begin the main game with Include "Alphabet"; to achieve this. The form required is to give three strings after "Zcharacter", containing 26, 26 and 23 characters respectively. For instance: Zcharacter "abcdefghijklmnopqrstuvwxyz" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "0123456789!$&*():;.,<>@{386}"; (Note that "@{386}" specifies only one character: Unicode $0386.) Space, new-line and quotation marks " are automatically included, while ~, @ and ^ have special meanings in Inform and should not be used. Otherwise, any arrangement of characters is fine, except that every character used has to be either a normal ASCII character or part of the "extra characters" (already declared) in the ZSCII set. (ii) Defining the "extra characters" Inform normally makes up a block of "extra characters" based on the source code it reads: if it reads plain ASCII or ISO Latin1 (-C0 or -C1) then the block contains the usual European accents, such as e-acute or i-circumflex, as defined in Z-machine Standard 0.2. (And if this table is never changed, Inform doesn't then compile the table at all, as this is the default Z-machine arrangement.) More generally if Inform reads ISO 8859-n (-Cn) then the block is set up to contain all the non-ASCII letter characters in ISO 8859-n. There's room to spare for others to be added, and Zcharacter table + '@{386}'; would add Unicode character $0386 (Greek capital Alpha with tonos accent, as it happens) to the current stock of "extra characters". Alternatively, you can simply give a fresh stock altogether: Zcharacter table '@{9a}' '@{386}' '@^a'; would specify a stock of just three, for instance. These directives must be made before the characters in question are first used in game text. (iii) Defining terminating characters It's also possible to specify which ZSCII character codes are "terminating characters", meaning that they terminate a line of input. Normally, the return key is the only terminating character, but this can be added to. For instance, the following directive makes ZSCII 132 and 136 terminating: Zcharacter terminating 132 136; The legal values to include are those for the cursor, function and keypad keys, plus mouse and menu clicks. The special value 255 makes all of these characters terminating. 12.4 Sequence points --------------- Inform marks certain positions in the code it compiles as being "sequence points". The idea is that the code can be regarded as a sequence of chunks, and the sequence points mark where these chunks begin. Roughly speaking, each different statement in Inform source code compiles to a different chunk, so that statements correspond closely to sequence points. Sequence points are marked in assembly trace output using the notation "<*>". For instance, the source code [ WorkOutSquares counter; counter = 0; while (counter < 100) { squares-->counter = counter*counter; counter = counter + 1; } ]; produces the traced output: 6 +00008 [ WorkOutSquares counter 7 +00009 <*> store counter short_0 8 +0000c .L0 8 +0000c <*> jl counter short_100 to L1 if FALSE 9 +00011 <*> mul counter counter -> sp 9 +00015 storew long_480 counter sp 10 +0001b <*> add counter short_1 -> counter 11 +0001f jump L0 11 +00022 .L1 12 +00022 <*> rtrue The "<*>" in front of an instruction means "the position where this instruction begins is a sequence point". We could mark the five positions in the original source code as: [ WorkOutSquares counter; <*> counter = 0; <*> while (counter < 100) { <*> squares-->counter = counter*counter; <*> counter = counter + 1; } <*> ]; Note that the open and close braces and square brackets don't normally cause sequence points. The exact rule is that every statement, action < > command, assignment or expression in void context is at a sequence point, except as shown in the examples below: for (<*> i=0: <*> i<45: <*> i++) ... "for" loops contain 0 to 3 sequence points, depending on whether there's any code compiled in the three parts of the specification. For instance for (::) <*> print "Madness!"; contains no sequence point corresponding to the "for" specification. <*> objectloop (<*> O ofclass Coin) <*> print (name) O; "objectloop" normally generates two sequence points: at the start, where the variable is initialised, and then where it's tested. However, loops over the contents of particular objects work differently: <*> objectloop (O in Mailbox) <*> print (name) O; (Because the test "O in Mailbox" is not actually being performed at run-time: instead, O is looping through the tree.) do <*> print counter++, " "; <*> until (counter < 17); Here the sequence point generated by the loop itself is attached to the "until" clause, not the "do" clause, because that's where the test is performed. "switch", "while" and "if" statements are not exceptions to the usual rule (1 statement = 1 sequence point at the beginning), but it might be useful to give some examples anyway: <*> switch(counter) { 1: <*> print "One^"; 2, 3: <*> print "Two or three^"; default: <*> print "Neither^"; } <*> if (i == 17) <*> print "i is 17"; else <*> print "i isn't 17"; <*> while (i<100) <*> print i++; The following is true: Except possibly in code explicitly assembled using the "@" notation, at each sequence point the Z-machine stack is empty and no important information is held in the global variables reserved by Inform as "registers": thus, it's safe for a debugger to switch execution from any sequence point in a routine to any other. No two sequence points can be at the same position in either the source code or the compiled code. Every sequence point corresponds to a definite position in the source code (because the veneer, i.e. the code compiled from within Inform itself, contains no sequence points). But the following is _not_ true: Sequence points occur in the same order in the source code as they do in compiled code Every routine contains at least one sequence point (a very few "stub" routines are excluded) Inform uses sequence points only to generate debugging information files, and to annotate assembly tracing output. They do not affect the code compiled. 12.5 Format of debugging information files ------------------------------------- This is a provisional specification of a format which will probably change slightly in future releases. Support for the old -k option has been re-introduced in Inform 6.12 to assist development of Infix, the projected source-level debugger for Inform. (See the minor utility program "infact", updated to 6.12 format, which prints out the contents of a debugging information file in legible form.) A debugging information file begins with a six-byte header: 0,1 the bytes $DE and then $BF (DEBF = "Debugging File") 2,3 a word giving the version number of the format used (currently 0) 4,5 a word giving the current Inform version number, in its traditional decimal form: e.g. 1612 means "6.12" The remainder of the file consists of a sequence of records, terminated by an end-of-file record. These records may be in _any_ order unless otherwise noted. Each record begins with an identifying byte, for which constants looking like *_DBR are defined in Inform's source code. A "string" is a null-terminated string of ASCII chars. A "word" is a 16-bit unsigned number, high byte first. A "line" is a sequence of four bytes: the first is the file number, the next two are a line number (a word), and the last is a character number within that line. In all three cases -- file numbers, line numbers, character numbers -- counting begins at 1. The line reference 0:0:0 is however used to mean "no such line": for instance, the metaclass "Routine" is defined at line 0:0:0, because it's defined by the compiler, not in any source code. Character positions greater than 255 in any line are recorded simply as 255. An "address" is a 24-bit unsigned number, a sequence of three bytes (high byte, middle byte, low byte). All addresses are counted in bytes (rather than being Z-machine packed addresses). EOF_DBR (byte: 0) End of the debugging file. FILE_DBR (byte: 1) 1 byte, counting from 1 string string One of these records always appears before any reference to the source code file in question. CLASS_DBR (byte: 2) string line line OBJECT_DBR (byte: 3) word string line line GLOBAL_DBR (byte: 4) byte string ARRAY_DBR (byte: 12) word string The byte address is an offset within the "array space" area, which always begins with the 480 bytes storing the values of the global variables. ATTR_DBR (byte: 5) word string PROP_DBR (byte: 6) word string FAKE_ACTION_DBR (byte: 7) word string Note that the numbering of fake actions differs in Grammar Versions 1 and 2. ACTION_DBR (byte: 8) word string HEADER_DBR (byte: 9) 64 bytes This is provided in order to check that a debugging information file (probably) does match a given story file. ROUTINE_DBR (byte: 11) word line address string then for each local variable: string terminated by a zero byte. Note that the PC start address is in bytes, relative to the start of the story file's code area. Routines are numbered upward from 0, and in each case the ROUTINE_DBR, LINEREF_DBR and ROUTINE_END_DBR records occur in order. LINEREF_DBR (byte: 10) word word and then, for each sequence point: line word The PC offset for each sequence point is in bytes, from the start of the routine. (Note that the initial byte of the routine, giving the number of local variables for that routine, is at PC offset 0: thus the actual code begins at PC offset 1.) It is possible for a routine to have no sequence points (as in the veneer, or in the case of code reading simply "[; ];"). ROUTINE_END_DBR (byte: 14) word line address MAP_DBR (byte: 13) A sequence of records consisting of: string address terminated by a zero byte. The current names of structures consist of: "abbreviations table" "header extension" "alphabets table" "Unicode table" "property defaults" "object tree" "common properties" "class numbers" "individual properties" "global variables" "array space" "grammar table" "actions table" "parsing routines" "adjectives table" "dictionary" "code area" "strings area" Other names made be added later, and some of the above won't be present in all files ("Unicode table", for instance). Locations are byte addresses inside the story file. LINEREF_DBR records will probably be compressed in future releases. 12.6 Notes on how to syntax-colour Inform source code ------------------------------------------------ "Syntax colouring" is an automatic process which some text editors apply to the text being edited: the characters are displayed just as they are, but with artificial colours added according to what the text editor thinks they mean. The editor is in the position of someone going through a book colouring all the verbs in red and all the nouns in green: it can only do so if it understands how to tell a verb or a noun from other words. Many good text editors have been programmed to syntax colour for languages such as C, and a few will allow users to reprogram them to other languages. One such is the popular Acorn RISC OS text editor "Zap", for which the author has written an extension mode called "ZapInform". ZapInform contributes colouring rules for the Inform language and this section documents its algorithm, which has since been successfully adapted by Paul Gilbert's "PIDE" environment and John Wood's C++ code for Inform syntax styling, both running under Windows 95/NT. (My thanks to John for making two corrections to the previously-published algorithm.) (a) State values ZapInform associates a 32-bit number called the "state" with every character position. The "state" is as follows. 11 of the upper 16 bits hold flags, the rest being unused: 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 comment single-quoted text double-quoted text statement after marker highlight flag highlight all flag colour backtrack after-restart-flag wait-direct (waiting for a directive) dont-know-flag These flags make up the "outer state" while the lower 16 bits holds a number pompously called the "inner state": 0 after WS (WS = white space or start of line or comma) 1 after WS then "-" 2 after WS then "-" and ">" [terminal] 3 after WS then "*" [terminal] 0xFF after junk 0x100*N + S after WS then an Inform identifier N+1 characters long itself in state S: 101 w 202 wi 303 wit 404 with 111 h 212 ha 313 has 121 c 222 cl 323 cla 424 clas 525 class same + 0x8000 when complete [terminal] In practice it would be madness to try to actually store the state of every character position in memory (it would occupy four times as much space as the file itself). Instead, ZapInform caches just one state value, the one most recently calculated, and uses a process called "scanning" to determine new states. That is, given that we know the state at character X and want to know the state at character Y, we can find out by scanning each character between X and Y, altering the state according to each one. It might possible save some time to cache more state values than this (say, the state values at the start of every screen-visible line of text, or some such) but the complexity of doing this doesn't seem worthwhile on my implementation. Scanning is a quick process because the Zap text editor stores the entire file in almost contiguous memory, easy to run through, and the state value can be kept in a single CPU register while this is done. (b) Scanning text Let us number the characters in a file 1, 2, 3, ... The state before character 1 is always 0x02000000: that is, inner state zero and outer state with only the waiting-for-directive flag set. (One can think of this as the state of an imaginary "character 0".) The state at character N+1 is then a function of the state at character N and what character is actually there. Thus, State(0) = 0x02000000 and for all N >= 0, State(N+1) = Scanning_function(State(N), Character(N+1)) And here is what the scanning function does: 1. Is the comment bit set? Is the character a new-line? If so, clear the comment bit. Stop. 2. Is the double-quote bit set? Is the character a double-quote? If so, clear the double-quote bit. Stop. 3. Is the single-quote bit set? Is the character a single-quote? If so, clear the single-quote bit. Stop. 4. Is the character a single quote? If so, set the single-quote bit and stop. 5. Is the character a double quote? If so, set the double-quote bit and stop. 6. Is the character an exclamation mark? If so, set the comment bit and stop. 7. Is the statement bit set? If so: Is the character "]"? If so: Clear the statement bit. Stop. If the after-restart bit is clear, stop. Run the inner finite state machine. If it results in a keyword terminal (that is, a terminal which has inner state 0x100 or above): Set colour-backtrack (and record the backtrack colour as "function" colour). Clear after-restart. Stop. If not: Is the character "["? If so: Set the statement bit. If the after-marker bit is clear, set after-restart. Stop. Run the inner finite state machine. If it results in a terminal: Is the inner state 2 [after "->"] or 3 [after "*"]? If so: Set after-marker. Set colour-backtrack (and record the backtrack colour as "directive" colour). Zero the inner state. [If not, the terminal must be from a keyword.] Is the inner state 0x404 [after "with"]? If so: Set colour-backtrack (and record the backtrack colour as "directive" colour). Set after-marker. Set highlight. Clear highlight-all. Is the inner state 0x313 ["has"] or 0x525 ["class"]? If so: Set colour-backtrack (and record the backtrack colour as "directive" colour). Set after-marker. Clear highlight. Set highlight-all. If the inner state isn't one of these: [so that recent text has formed some alphanumeric token which might or might not be a reserved word of some kind] If waiting-for-directive is set: Set colour-backtrack (and record the backtrack colour as "directive" colour) Clear waiting-for-directive. If not, but highlight-all is set: Set colour-backtrack (and record the backtrack colour as "property" colour) If not, but highlight is set: Clear highlight. Set colour-backtrack (and record the backtrack colour as "property" colour). Is the character ";"? If so: Set wait-direct. Clear after-marker. Clear after-restart. Clear highlight. Clear highlight-all. Is the character ","? If so: Set after-marker. Set highlight. Stop. The "inner finite state machine" adjusts only the inner state, and always preserves the outer state. It not only changes an old inner state to a new inner state, but sometimes returns a "terminal" flag to signal that something interesting has been found. State Condition Go to state Return terminal-flag? 0 if "-" 1 if "*" 3 yes if space, "#", newline 0 if "_" 0x100 if "w" 0x101 if "h" 0x111 if "c" 0x121 other letters 0x100 otherwise 0xFF 1 if ">" 2 yes otherwise 0xFF 2 always 0 3 always 0 0xFF if space, newline 0 otherwise 0xFF all 0x100+ states: if not alphanumeric, add 0x8000 to the state yes then for the following states: 0x101 if "i" 0x202 otherwise 0x200 0x202 if "t" 0x303 otherwise 0x300 0x303 if "h" 0x404 otherwise 0x400 0x111 if "a" 0x212 otherwise 0x200 0x212 if "s" 0x313 otherwise 0x300 0x121 if "l" 0x222 otherwise 0x200 0x222 if "a" 0x323 otherwise 0x300 0x323 if "s" 0x424 otherwise 0x400 0x424 if "s" 0x525 otherwise 0x500 but for all other 0x100+ states: if alphanumeric, add 0x100 to the state 0x8000+ always 0 (Note that if your text editor stores tabs as characters in their own right (usually 0x09) rather than rows of spaces, tab should be included with space and newline in the above.) Briefly, the finite state machine can be left running until it returns a terminal, which means it has found "->", "*" or a completed Inform identifier: and it detects "with", "has" and "class" as special keywords amongst these identifiers. (c) Initial colouring ZapInform colours one line of visible text at a time. For instance, it might be faced with this: Object -> bottle "~Heinz~ bottle" And it outputs an array of colours for each character position in the line, which the text editor can then use in actually displaying the text. It works out the state before the first character of the line (the "O"), then scans through the line. For each character, it determines the initial colour as a function of the state at that character: If single-quote or double-quote is set, then quoted text colour. If comment is set, then comment colour. If statement is set: Use code colour unless the character is "[" or "]", in which case use function colour, or is a single or double quote, in which case use quoted text colour. If not: Use foreground colour unless the character is "," or ";" or "*" or ">", in which case use directive colour, or the character is "[" or "]", in which case use function colour, or is a single or double quote, in which case use quoted text colour. However, the scanning algorithm sometimes signals that a block of text must be "backtracked" through and recoloured. For instance, this happens if the white space after the sequence "c", "l", "a", "s" and "s" is detected when in a context where the keyword "class" is legal. The scanning algorithm does this by setting the "colour backtrack" bit in the outer state. Note that the number of characters we need to recolour backwards from the current position has been recorded in bits 9 to 16 of the inner state (which has been counting up lengths of identifiers), while the scanning algorithm has also recorded the colour to be used. For instance, in Object -> bottle "~Heinz~ bottle" ^ ^ ^ backtracks of size 6, 2 and 6 are called for at the three marked spaces. Note that a backtrack never crosses a new-line. ZapInform uses the following chart of colours: name default actual colour foreground navy blue quoted text grey comment light green directive black property red function red code navy blue codealpha dark green assembly gold escape character red but note that at this stage, we've only used the following: function colour [ and ] as function brackets, plus function names comment colour comments directive colour initial directive keywords, plus "*", "->", "with", "has" and "class" when used in a directive context quoted text colour singly- or doubly-quoted text foreground colour code in directives code colour code in statements property colour property, attribute and class names when used within "with", "has" and "class" For instance, Object -> bottle "~Heinz~ bottle" would give us the array DDDDDDDDDDFFFFFFFQQQQQQQQQQQQQQQQ (F being foreground colour; it doesn't really matter what colour values the spaces have). (d) Colour refinement The next operation is "colour refinement", which includes a number of things. Firstly, any characters with colour Q (quoted-text) which have special meanings are given "escape-character colour" instead. This applies to "~", "^", "\" and "@" followed by (possibly) another "@" and a number of digits. Next we look for identifiers. An identifier for these purposes includes a number, for it is just a sequence of: "_" or "$" or "#" or "0" to "9" or "a" to "z" or "A" to "Z". The initial colouring of an identifier tells us its context. We're only interested in those in foreground colour (these must be used in the body of a directive) or code colour (used in statements). If an identifier is in code colour, then: If it follows an "@", recolour the "@" and the identifier in assembly-language colour. Otherwise, unless it is one of the following: "box" "break" "child" "children" "continue" "default" "do" "elder" "eldest" "else" "false" "font" "for" "give" "has" "hasnt" "if" "in" "indirect" "inversion" "jump" "metaclass" "move" "new_line" "nothing" "notin" "objectloop" "ofclass" "or" "parent" "print" "print_ret" "provides" "quit" "random" "read" "remove" "restore" "return" "rfalse" "rtrue" "save" "sibling" "spaces" "string" "style" "switch" "to" "true" "until" "while" "younger" "youngest" we recolour the identifier to "codealpha colour". On the other hand, if an identifier is in foreground colour, then we check it to see if it's one of the following interesting keywords: "first" "last" "meta" "only" "private" "replace" "reverse" "string" "table" If it is, we recolour it in directive colour. Thus, after colour refinement we arrive at the final colour scheme: function colour [ and ] as function brackets, plus function names comment colour comments quoted text colour singly- or doubly-quoted text directive colour initial directive keywords, plus "*", "->", "with", "has" and "class" when used in a directive context, plus any of the reserved directive keywords listed above property colour property, attribute and class names when used within "with", "has" and "class" foreground colour everything else in directives code colour operators, numerals, brackets and statement keywords such as "if" or "else" occurring inside routines codealpha colour variable and constant names occurring inside routines assembly colour @ plus assembly language opcodes escape char colour special or escape characters in quoted text (e) An example Consider the following example stretch of code (which is not meant to be functional or interesting, just colourful): ! Here's the bottle: Object -> bottle "bottle marked ~DRINK ME~" with name "bottle" "jar" "flask", initial "There is an empty bottle here.", before [; LetGo: ! For dealing with water if (noun in bottle) "You're holding that already (in the bottle)."; ], has container; [ ReadableSpell i j k; if (scope_stage==1) { if (action_to_be==##Examine) rfalse; rtrue; } @set_cursor 1 1; ]; Extend "examine" first * scope=ReadableSpell -> Examine; Here are the initial colourings: ! Here's the bottle: CCCCCCCCCCCCCCCCCCCC Object -> bottle "bottle marked ~DRINK ME~" DDDDDDDDDDFFFFFFFQQQQQQQQQQQQQQQQQQQQQQQQQQ with name "bottle" "jar" "flask", FFDDDDDPPPPPQQQQQQQQFQQQQQFQQQQQQQD initial "There is an empty bottle here.", FFFFFFFPPPPPPPPQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQD before FFFFFFFPPPPPP [; LetGo: ! For dealing with water FFFFFFFfSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSCCCCCCCCCCCCCCCCCCCCCCCC if (noun in bottle) SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS "You're holding that already (in the bottle)."; SSSSSSSSSSSSSSSSSQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQS ], SSSSSSSfD has container; FFDDDDDPPPPPPPPPD [ ReadableSpell i j k; fffffffffffffffSSSSSSS if (scope_stage==1) SSSSSSSSSSSSSSSSSSSSS { if (action_to_be==##Examine) rfalse; SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS rtrue; SSSSSSSSSSSS } SSS @set_cursor 1 1; SSSSSSSSSSSSSSSSSS ]; fD Extend "examine" first DDDDDDDQQQQQQQQQFFFFFF * scope=ReadableSpell -> Examine; FFFFFFFFFFFFFFFFDDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFDDDFFFFFFFD (Here F=foreground, D=directive, f=function, S=code (S for "statement"), C=comment, P=property, Q=quoted text.) And here is the refinement: ! Here's the bottle: CCCCCCCCCCCCCCCCCCCC Object -> bottle "bottle marked ~DRINK ME~" DDDDDDDDDDFFFFFFFQQQQQQQQQQQQQQQEQQQQQQQQEQ with name "bottle" "jar" "flask", FFDDDDDPPPPPQQQQQQQQFQQQQQFQQQQQQQD initial "There is an empty bottle here.", FFFFFFFPPPPPPPPQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQD before FFFFFFFPPPPPP [; LetGo: ! For dealing with water FFFFFFFfSSIIIIISSSSSSSSSSSSSSSSSSSSSSSCCCCCCCCCCCCCCCCCCCCCCCC if (noun in bottle) SSSSSSSSSSSSSSSSSIIIISSSSIIIIIIS "You're holding that already (in the bottle)."; SSSSSSSSSSSSSSSSSQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQS ], SSSSSSSfD has container; FFDDDDDPPPPPPPPPD [ ReadableSpell i j k; fffffffffffffffSSSSSSS if (scope_stage==1) SSSSSSIIIIIIIIIIISSIS { if (action_to_be==##Examine) rfalse; SSSSSSSSSSIIIIIIIIIIIISSIIIIIIIIISSSSSSSSS rtrue; SSSSSSSSSSSS } SSS @set_cursor 1 1; SSAAAAAAAAAAASISIS ]; fD Extend "examine" first DDDDDDDQQQQQQQQQFDDDDD * scope=ReadableSpell -> Examine; FFFFFFFFFFFFFFFFDDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFDDDFFFFFFFD (where E = escape characters, A = assembly and I = "codealpha", that is, identifiers cited in statement code).