august/august.hint
Best of Show: <augustss@carlstedt.se> Lennart Augustsson

    Lennart Augustsson
    CR&T
    Stora Badhusgatan 18-20
    S-411 21 Goteborg
    Sweden


Judges' comments:

    To use:
	make august
	cat august.c test.oc | ./august > test.oo
	./august < test.oo

    NOTE: Some compilers will compile this code into infinite loops!
	  If the above cat line does not execute in a very short amount
	  of time, then you may need to fix your compiler or use gcc.

    Try: 
	cat august.c fac.oc | ./august > fac.oo
	./august < fac.oo

    Or try:
	cat august.c fac.oc | ./august | ./august

    And try:
	cat august.c parse.oc | ./august > parse.oo
	cat parse.oo fac.oc | ./august | ./august

    This entry can feed on itself.  If you C preprocess the source,
    you compile the interpreter:

	make august.oc

    We can now compile the interpreter as follows:

	cat august.c august.oc | ./august > august.oo

    and use the compiled interpreter to execute some previously
    compiled code:

	cat august.oo test.oo | ./august

    And we can have the compiled interpreter interpret itself which
    inturn compiles the test program:

	cat august.oo august.oo fac.oo | ./august

    And if you have lots of spare time, you can recurse one level deeper:

	cat august.oo august.oo august.oo fac.oo | ./august

    We (the judges) recommend that you spend some time studying this
    entry as it surely will make the 'best of the IOCCC' list.  It was
    very well received by those who attended the IOCCC BOF.


Selected notes from the author:

    This program is a bytecode interpreter.  This fact is not
    particularly obfuscated; any experienced C programmer can see that
    in about a minute.

    The interpreter reads the bytecode from the standard input.  What
    remains of the standard input after this is left as standard input for
    the interpreted program.

    Having an interpreter but no code isn't any fun.  But within a comment
    in the beginning of the interpreter is a sequence of bytes that is
    code that the interpreter understands.  It is in fact the code for a
    compiler that compiles from OC (Obfuscated C, a subset of C) into the
    byte code that the interpreter understands.

    The language the compiler understands is a subset of C (the grammar
    is in grammar.small) with the functions getchar() and putchar()
    available.  The subset has the types `int', `char', and pointers to
    those (it has function pointers too, but not with the proper
    syntax).  Arrays can only be global.  There is a rather limited set
    of operators, and control constructs.

    The compiler has absolutely no error checking, so things
    can go wrong.  But this is what seasoned C progammers are used to.

    Assume that the interpreter has been compiled (source in prog.c and
    binary in prog) and that we have this in a file named test.oc:

	main() { putchar('!'); putchar('\n'); exit(0); }

    We can then compile by

	cat august.c test.oc | ./august > test.oo

    And run the compiled program by

	./august < test.oo

    Or even simpler:

	cat august.c test.oc | ./august | ./august
	    
    A larger example is included fac.oc.

    The compiler for OC is naturally written in OC (where did you think
    the byte code came from?).  It is in the file parse.oc.

    Hmm...  So now we have an interpreter written in OC and a compiler
    for OC, can we compile the interpreter?  Yes, we can!  The OC
    compiler does not have a preprocessor (hey, what do you expect from
    a program that is this small), so we need to use an external one.
    To preprocess august.c use the command from the build file, but add
    the -E flag and change 60000 to 40000, i.e.:

       cc -E -DZ=40000 .... august.c > prog.oc

    If prog.oc contain some junk line starting with a `#' (most likely
    it does) then remove it.

    OK, we can now compile the interpreter:

       cat august.c prog.oc | ./august > prog.oo

    And we can run it:

       cat prog.oo test.oo | ./august

    Here we have the interpreter interpreting another interpreter that runs
    the program.  Did you think that was too fast?  Just throw in another
    level of interpretation:

       cat prog.oo prog.oo test.oo | ./august

    And another...

       cat prog.oo prog.oo prog.oo test.oo | ./august

    *****

    Fitting the byte code for the compiler into the initial comment
    was a challenge.  I had to strip the compiler/interpreter of
    a lot of functionality to bring down their sizes.  The initial
    version had a full set of binary operators as well as prefix &.
    Given a handful more bytes it would be easy to put them back.

    Anyway, the bytecode from the compiler is first optimized in
    an external optimizer (not supplied here); this brings down the
    size of it by about 30%.  The remaining code is then LZ-compressed
    (roughly) with a sliding window of 255 bytes.  The most common
    occuring bytes are then encoded using " \n\t;{}" to avoid having
    them counted as significant.  About 200 bytes of the interpreter
    handles the unpacking of the byte code (this is a lot, but it pays
    off).

    Despite all this work I still had to put a lot of code in the
    compilation command.  This disturbs me.  It is possible to compress
    the code further with a better entropy coder (e.g. an arithmetic
    coder), but decompression then takes up too much code.  Maybe this
    would be a way to fit it all into the 1536 bytes, but I'm don't have
    time to try it.

    *****

    Execution starts at main().  Nothing can be used before it is
    defined.  getchar() and putchar() are predefined functions.

    The following is a description of the OC grammar:

	OC grammar
	==========
	Terminals are in quotes, () is used for bracketing.

	program:	decl*

	decl:		vardecl
			fundecl

	vardecl:	type NAME ;
			type NAME "[" INT "]" ;

	fundecl:	type NAME "(" args ")" "{" body "}"

	args:		/*empty*/
			( arg "," )* arg

	arg:		type NAME

	body:		vardecl* stmt*

	stmt:		ifstmt
			whilestmt
			dowhilestmt
			"return" expr ";"
			expr ";"
			"{" stmt* "}"
			";"

	ifstmt:		"if" "(" expr ")" stmt
			"if" "(" expr ")" stmt "else" stmt

	whilestmt:	"while" "(" expr ")" stmt

	dowhilestmt:	"do" stmt "while" "(" expr ")" ";"

	expr:		expr binop expr
			unop expr
			expr "[" expr "]"
			"(" expr ")"
			expr "(" exprs ")"
			NAME
			INT
			CHAR
			STRING

	exprs:		/*empty*/
			(expr ",")* expr

	binop:		"+" | "-" | "*" | "/" | "%" |
			"=" |
			"<" | "==" | "!="

	unop:		"!" | "-" | "*"

	type:		"int" stars
			"char" stars

	stars:		"*"*

    *****

    Well, that is enough ranting for now.