2 Porting Inform to a new environment ----------------------------------- 2.1 Dependence on the OS -------------------- Strenuous efforts have been made over the last three years to make the source as operating-system independent as possible. As a general principle, mostly adhered to, all operating-system differences should be in the "header.h" file, and not in the 20 sections. As a general rule, for each target OS which Inform is being ported to, a new #define name is invented. For example, the name LINUX is used for the Linux port. When Inform is being compiled, only that one symbol will be defined, and none of the others: thus #ifdef LINUX ...code... #endif compiles the given code only for the Linux port. There are some very poor "ANSI C" compilers out there, and many more mediocre ones (which almost obey the standard, but don't quite): in any case the ANSI standard is very broadly defined. For example, the code int x; x = 45*1007; printf("%d\n", x); is entirely ANSI compliant, but results in different numbers being printed on different machines, due to the fact that ANSI does not specify the range of numbers which a variable of type int can safely hold. Since C is so highly unportable a language, and since some of the compilers used to produce Inform are poor, the whole Inform code has to be written with the worst possible compiler in mind. An illustration of this is that all preprocessor commands, such as #define, must begin on column 1 of the source code: even when they occur in code which is #ifdef'd out. VAX C (a particularly bad compiler) will reject #ifndef VAX #define FROG 2 #endif for example, even when VAX is defined. This makes the declarations in "header.h" annoyingly illegible for everybody. 2.2 Portability issues: the types int32 and uchar --------------------------------------------- The main issues when porting Inform have been found to be: (a) the size of "int", (b) whether "char" is unsigned or signed by default, (c) what conventions apply to filenames, (d) assumptions about sizeof() when casting pointers, (e) how parameters (switches, filenames, etc.) are passed to Inform. (a) ANSI requires that "int" be at least 16 bit, though advances in CPU technology mean that under most of today's environments it will in fact be 32 bit. ANSI further requires that "long int" be at least as big as "int", but not that it has to be any bigger. Inform needs at least one integer type to be able to hold 32 bit signed values, and "header.h" contains code which attempts to typedef the name "int32" to such a type. This should happen automatically. Under ANSI rules, as the above says, a compiler need not have any integer type larger than 16 bits: if so, that compiler will not be able to compile Inform. An annoying issue here is that compilers vary widely in the extent to which they give errors or warnings when they detect silent "promotions" from one integer type to another. This makes it very hard to sort out the types of every object in the program between int and int32 so that everybody is happy: in practice, every time a new release of the source code has been made, a few dozen types have had to be fiddled with until everybody can compile it. (b) Compilers seem to divide about fifty-fifty on this. Again, the original standard is vague: the issue is really about how the value (char) 253, say, should be interpreted when it is cast to (int). Should the answer be 253, or -3? ANSI did not specify this because compiler writers wanted to be able to choose whichever could be done instantly on the CPUs they were working with. (If you store a char in the bottom 8 bits of a 32 bit register, then casting the value (char) 253 to (int) -3 means setting all 24 of the upper bits, which requires code to be compiled.) Inform uses a typedef'd type called "uchar" when it needs an unsigned char type: it uses plain "char" when it doesn't mind. It never needs a signed char type. In theory ANSI C compilers must recognise the keywords "signed" and "unsigned", but some don't: typedef unsigned char uchar; actually produces an error on some compilers. So the typedef can only be made with your help. (On other compilers, "unsigned" is legal but "signed" is illegal.) (c) Many people think that the minimal 8.3 convention will work on any operating system, but this is not true (it won't work under Acorn's RISC OS). Much of each OS specification in "header.h" is therefore to do with filenaming. (d) For instance, sizeof(char *) sizeof(int *) sizeof(int32 *) sizeof(int) may all be different numbers on machines with segmented memory maps. This being so, casting between pointer types may lose information, and a few arrays in the source have surprising types to ensure safety. One thing Inform does need to be able to do is to subtract one pointer (of the same type) from another: it defines the macro subtract_pointers(X, Y) to do this. X and Y are normally of type uchar; there seems to have been no problem with this in practice. (e) The ANSI standard is quite good on the command line, and Inform expects to read parameters by the standard argc, argv mechanism. Unfortunately the Macintosh, for instance, has no orthodox command line. Such a port probably wants to have an "outer shell" which displays a window, allows options to be set and then calls the Inform 6 core as needed. The section "inform.c" normally compiles a "main" routine which makes a few machine-dependent changes and then passes its arguments straight on to "sub_main". For instance, here's the v6.10 source: int main(int argc, char **argv) { int rcode; #ifdef MAC_MPW InitCursorCtl((acurHandle)NULL); Show_Cursor(WATCH_CURSOR); #endif rcode = sub_main(argc, argv); #ifdef ARC_THROWBACK throwback_end(); #endif return rcode; } The Macintosh Programmer's Workshop port is making multi-tasking work before embarking on compilation; the Acorn Desktop Debugging Environment port is tidying up after any error throwbacks, at the end of compilation. The point is that here is the place for such minor machine quirks. However, if you want an entirely new front end (such as Robert Pelak's Macintosh port of Inform has), then you need to define #define EXTERNAL_SHELL in your machine definition block (see later). This will mean that no "main" routine is compiled at all from "inform.c" (so you can simply link the Inform source into your own code, which will contain its own "main.c"): Inform should be run by calling extern int sub_main(int argc, char **argv); having set up argc and argv suitably. For instance, the outer shell might take names typed into dialogue boxes, and various ticked options on a window, and make these into a series of ICL commands, which are then handed over textually to sub_main. I suggest that the most efficient way to do this is to write them as an ICL file somewhere and to pass sub_main a single parameter telling it to run this ICL file. 2.3 The character set and the format of text files ---------------------------------------------- The Inform source code assumes that the compiler is running on a machine whose character set agrees with ASCII in the range $20 to $7e. (This allows both plain ASCII and any of the ISO 8859 extensions to ASCII.) ASCII is now universal, but there is no common format for plain text files, and in particular how lines of text are ended. For example: MS-DOS, Windows, etc.: $0d $0a Mac OS: $0d RISC OS: $0a Inform 6 can read source code files in all these formats, and which further use any of the character sets above: plain ASCII or ISO 8859-1 to -9. (This is configurable using the -C switch.) 2.4 The OS definitions block in "header.h" -------------------------------------- Each Inform port makes a block of definitions in the header file. These blocks take a standard format. Firstly, the block is put in #ifdef's so that it will only be processed in this one port. The block is divided into 6 sections, as follows. /* 1 */ MACHINE_STRING should be set to the name of the machine or OS. /* 2 */ Section 2 contains some miscellanous options, all of which are on/off: they are by default off unless defined. The possibilities are: USE_TEMPORARY_FILES - use scratch files for workspace, not memory, by default EXTERNAL_SHELL - this port is providing an entire external front end, with its own "main" routine: see above PROMPT_INPUT - prompt input: ignore argc and argv, instead asking for parameters at the keyboard. (I hope people will write front-ends rather than resort to this, but it may be a useful staging post.) TIME_UNAVAILABLE - if the ANSI library routines for working out today's date are not available CHAR_IS_SIGNED - if on your compiler the type "char" is signed by default Note that defining USE_TEMPORARY_FILES does not make a mandatory choice (as it did under Inform 5): whether to use allocated memory or temporary files is selectable with -F0 (files off) or -F1 (files on) in ICL. All that this option does is to define the default setting for this -F switch. Running -F0 is faster (possibly, depending on whether your C library provides buffering or not, much faster) but consumes 100 to 300K more memory (it does so flexibly, allocating only what it needs, unlike the Inform 5 option). Most users will not want to understand the issues involved here, so please make a sensible default choice for them. Once again, note that CHAR_IS_SIGNED must be defined if "char" is signed: otherwise "uchar" will be typedef'd wrongly. /* 3 */ An estimate of the typical amount of memory likely to be free should be given in DEFAULT_MEMORY_SIZE. (This is only a default setting.) There are three settings: HUGE_SIZE, LARGE_SIZE and SMALL_SIZE. (I think it was Andrew Plotkin, though, who remarked that HUGE_SIZE might sensibly be renamed "not-bad-by-1980s-standards-size": these all allocate quite small amounts of memory compared to, say, the 8M of workspace that Windows appears to need just to keep breathing.) For most modern machines, LARGE_SIZE is the appropriate setting, but some older micros may benefit from SMALL_SIZE. /* 4 */ This section specifies the filenaming conventions used by the host OS. It's assumed that the host OS has the concept of subdirectories and has "pathnames", that is, filenames giving a chain of subdirectories divided by the FN_SEP (filename separator) character: e.g. for Unix FN_SEP is defined below as '/' and a typical name is users/graham/jigsaw.z5 Normally the comma ',' character is used to separate pathnames in a list of pathnames, but this can be overridden by defining FN_ALT as some other character. Obviously it should be a character which never occurs in normal pathnames. If FILE_EXTENSIONS is defined then the OS allows "file extensions" of 1 to 3 alphanumeric characters like ".txt" (for text files), ".z5" (for game files), etc., to indicate the file's type (and, crucially, regards the same filename but with different extensions -- e.g., "frog.amp" and "frog.lil" -- as being different names). If FILE_EXTENSIONS is defined, then Inform uses the following standard set of extensions unless they are overridden by other definitions at this point. (Please don't override these definitions without reason.) Source_Extension ".inf" Source code file Include_Extension ".h" Include file (e.g. library file) Code_Extension ".z3" Version 3 story file V4Code_Extension ".z4" 4 V5Code_Extension ".z5" 5 V6Code_Extension ".z6" 6 V7Code_Extension ".z7" 7 V8Code_Extension ".z8" 8 Module_Extension ".m5" Linkable module file (version 5, which is all that Inform 6 supports yet) ICL_Extension ".icl" ICL file The debugging information file and the transcript file also have defined default-names which can be over-ridden in this section if desired: Transcript_File "gametext.txt" or "gametext" Debugging_File "gameinfo.dbg" or "gamedebug" If you do not define FILE_EXTENSIONS, then it is essential to define STANDARD_DIRECTORIES instead. (You can also define both, if you so choose.) The STANDARD_DIRECTORIES option causes Inform to put all files of a particular kind into a standard directory for them: e.g., a "games" directory might hold the story files compiled, etc. All that happens when a standard directory is defined is that Inform sets the default value of the relevant pathname variable to that standard directory: otherwise, its pathname variable starts out as "". The standard directories are, once again, defined by default as follows: once again you can define these settings yourself, but please don't do so without a good reason. Source_Directory "source" Include_Directory "library" Code_Directory "games" Module_Directory "modules" Temporary_Directory "" ICL_Directory "" Note that the actual user of Inform can still override anything you choose by setting the pathname with an ICL command. A good way to test all this is to run inform -h1, which does some experimental filename translations and prints the outcome. /* 5 */ Section 5 contains information on how to choose the filenames for the three temporary files. (Note that this needs to be done even if USE_TEMPORARY_FILES is not defined.) On many machines, you only need to give a suitable name. (As usual, if you don't bother, something fairly sensible happens.) Temporary_Name is the body of a filename to use (if you don't set this, it becomes "Inftemp") and Temporary_Directory is the directory path for the files to go in (which can be altered with an ICL command). However, under some multi-tasking OSs it is desirable for multiple Inform tasks to work simultaneously without clashes, and this means giving the temporary files filenames which include some number uniquely identifying the task which is running. If you want to provide this, define INCLUDE_TASK_ID and provide some code... #define INCLUDE_TASK_ID #ifdef INFORM_FILE static int32 unique_task_id(void) { ...some code returning your task ID... } #endif /* 6 */ Finally, section 6 is "anything else". In particular this is where DEFAULT_ERROR_FORMAT should be set. This switches between different styles of error message. (This is not a matter of aesthetics: some error-throwback debugging tools are very fussy about what format error messages are printed out in.) For example, here is a typical OS definition block: #ifdef UNIX /* 1 */ #define MACHINE_STRING "Unix" /* 2 */ #define CHAR_IS_SIGNED /* 3 */ #define DEFAULT_MEMORY_SIZE LARGE_SIZE /* 4 */ #define FN_SEP '/' #define FILE_EXTENSIONS /* 5 */ #define Temporary_Directory "/tmp" #define INCLUDE_TASK_ID #ifdef INFORM_FILE static int32 unique_task_id(void) { return (int32)getpid(); } #endif #endif 2.5 Running Inform in a multi-tasking OS ------------------------------------ As mentioned above, if Inform is being used in a multi-tasking environment then temporary file-naming will need a little attention. Another issue is that under some systems the other tasks may all freeze up while Inform is working, because tasks only voluntarily hand control back to the OS (allowing it to poll the other tasks and share out the processor time). This means that some call to an OS primitive routine may have to be inserted into Inform somewhere: a good place to do this is in the routine reached_new_line() of the section "lexer.c".