scheme自举

2015-02-16

编译器自举是什么意思?以scheme为例,最初的scheme编译器肯定没法用scheme语言写,可能用c或者其它写成。等有了第一个编译器以后,就可以用scheme写了。如果一门语言的编译器用这门语言自身编写,并且能够跑起来,就算能够自举了(Go语言的1.5版正在搞自举呢,题外话)。

无聊又在写scheme编译器玩。这次我想用scheme语言写编译器部分,生成字节码输出。再用C语言写一个字节码的虚拟机。等这两部分都弄好了,编译scheme的代码,生成字节码跟虚拟机拼起来,就可以得到一个不依赖外部scheme的编译器了。也就是,可以实现自举。

关于写编译器、解释器,以前也各种折腾过,总是没有做出令自己比较满意的结果。究其原因,写一个很基本的demo并不难,但是处理复杂的语言规范的细枝未节,实现一些高级的语言特征,自己的精力和水平都不行。

这次把目标定低,只玩自举,不求高大上。首先是虚拟机这边,暂时先不要搞垃圾回收,这样可以少死很多脑细胞。JIT什么的了,想都不要去想,性能不是现在考虑的。还有就是虚拟机的体系结构也不必要按标准的来,因为scheme的闭包以及continuation,若是采用c的那套体系结构,要在虚拟机这边做trampoline一类的技术,编译器那边也要对自由变量做lambda lifting,还有CPS变换,会很伤脑细胞。不按标准硬件结构来,设置一个环境寄存器,简单很多。

然后是scheme写编译器,尽量用简单语法写,为了自举而采用一个折中。代码不要用宏语法,因为暂时不打算实现宏。call/cc也是暂时用不到并且暂时不会实现的。有些语法会很好用,但实现略难,那么写代码时就尽量避开这种语法,总之,使实现自举的代价最低。

接下来看看其中一些关键字的语法。

对于define,暂时不支持内部define。因为实现内部define要在编译的第一步做一个重写,类似下面的变换:

(define a 3)
=>
((lambda (a) (set! a 3)) #f)

我太懒了,宁愿不要用到内部define。全局define还是必须要的,否则写代码太痛苦了。

let是要的。let的变种很多,从最基本的let,循环的let语法,以及let*和letrec等。都可以手动做一个类似宏的重写。比如基本的let变换:

(let ((a 1)
      (b 3))
  (+ a b))
=>
((lambda (a b)
    (+ a b))
  1 3)

至于命名let循环,尽管没有做内部define,但还是可以用Y combinator重写,虽然性能不高:

(let loop ((n 0)
           (sum 0))
  (if (> n 100)
      sum
      (loop (+ n 1) (+ sum n))))
=>
((Y (lambda (loop)
     (lambda (n sum)
       (if (> n 100)
      sum
      (loop (+ n 1) (+ sum n))))))
 0 0)

写到这里的时候发现了一个很蛋疼的问题:call-by-value方式的Y的实现,教材中通常是这样子。

(define Y
  (lambda (F)
    ((lambda (u) (u u))
     (lambda (x)
       (F (lambda (v) ((x x) v)))))))

这样写Y是有问题的,它只支持一个参数。虽然可以写成下面这样支持两个参数,但始终不是支持任意参数。

(define Y
  (lambda (F)
    ((lambda (u) (u u))
     (lambda (x)
       (F (lambda (v1 v2) ((x x) v1 v2)))))))

尝试利用(lambda x . y)这种变长参数,但是没成功。只好先自己手写几个处理不同参数的Y1,Y2,Y3...

case也暂时不实现,因为没有define-syntax宏不好实现case,暂时代码都用cond写吧。

看一下效果,这段是主结构的代码:

(define meaning
  (lambda (e r tail?)
    (if (atom? e)
        (if (symbol? e) (meaning-reference e r tail?)
            (meaning-quotation e r tail?))
        (let ((syntax (car e)))
          (cond
            ((eq? syntax 'quote)  (meaning-quotation (cadr e) r tail?))
            ((eq? syntax 'lambda) (meaning-abstraction (cadr e) (cddr e) r tail?))
            ((eq? syntax 'if)     (meaning-alternative (cadr e) (caddr e) (cadddr e) r tail?))
            ((eq? syntax 'begin)  (meaning-sequence (cdr e) r tail?))
            ((eq? syntax 'set!)   (meaning-assignment (cadr e) (caddr e) r tail?))
            ((eq? syntax 'define) (meaning-define (cadr e) (caddr e) r tail?))
            ((eq? syntax 'let)    (meaning (rewrite-let (cdr e)) r tail?))
            ((eq? syntax 'let*)   (meaning (rewrite-let* (reverse (cadr e)) (caddr e)) r tail?))
            ((eq? syntax 'cond)   (meaning (rewrite-cond (cdr e)) r tail?))
            (else     (meaning-application syntax (cdr e) r tail?)))))))

编译出来字节码是:

(CREATE-CLOSURE
2
GOTO
673
ARITY=?
4
EXTEND-ENV
SHALLOW-ARGUMENT-REF
0
INVOKE1
atom?
JUMP-FALSE
54
SHALLOW-ARGUMENT-REF
0
INVOKE1
symbol?
JUMP-FALSE
24
CHECKED-GLOBAL-REF
1
PUSH-VALUE
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
SHALLOW-ARGUMENT-REF
1
PUSH-VALUE
SHALLOW-ARGUMENT-REF
2
PUSH-VALUE
ALLOCATE-FRAME
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
22
CHECKED-GLOBAL-REF
2
PUSH-VALUE
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
SHALLOW-ARGUMENT-REF
1
PUSH-VALUE
SHALLOW-ARGUMENT-REF
2
PUSH-VALUE
ALLOCATE-FRAME
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
609
SHALLOW-ARGUMENT-REF
0
INVOKE1
car
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
EXTEND-ENV
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
CONSTANT
quote
POP-ARG1
INVOKE2
eq?
JUMP-FALSE
39
CHECKED-GLOBAL-REF
2
PUSH-VALUE
CHECKED-GLOBAL-REF
3
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
DEEP-ARGUMENT-REF
1
1
PUSH-VALUE
DEEP-ARGUMENT-REF
1
2
PUSH-VALUE
ALLOCATE-FRAME
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
550
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
CONSTANT
lambda
POP-ARG1
INVOKE2
eq?
JUMP-FALSE
57
CHECKED-GLOBAL-REF
4
PUSH-VALUE
CHECKED-GLOBAL-REF
3
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
CHECKED-GLOBAL-REF
5
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
DEEP-ARGUMENT-REF
1
1
PUSH-VALUE
DEEP-ARGUMENT-REF
1
2
PUSH-VALUE
ALLOCATE-FRAME
4
POP-FRAME!
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
483
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
CONSTANT
if
POP-ARG1
INVOKE2
eq?
JUMP-FALSE
75
CHECKED-GLOBAL-REF
6
PUSH-VALUE
CHECKED-GLOBAL-REF
3
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
CHECKED-GLOBAL-REF
7
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
CHECKED-GLOBAL-REF
8
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
DEEP-ARGUMENT-REF
1
1
PUSH-VALUE
DEEP-ARGUMENT-REF
1
2
PUSH-VALUE
ALLOCATE-FRAME
5
POP-FRAME!
4
POP-FRAME!
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
398
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
CONSTANT
begin
POP-ARG1
INVOKE2
eq?
JUMP-FALSE
29
CHECKED-GLOBAL-REF
9
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
INVOKE1
cdr
PUSH-VALUE
DEEP-ARGUMENT-REF
1
1
PUSH-VALUE
DEEP-ARGUMENT-REF
1
2
PUSH-VALUE
ALLOCATE-FRAME
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
359
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
CONSTANT
set!
POP-ARG1
INVOKE2
eq?
JUMP-FALSE
57
CHECKED-GLOBAL-REF
10
PUSH-VALUE
CHECKED-GLOBAL-REF
3
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
CHECKED-GLOBAL-REF
7
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
DEEP-ARGUMENT-REF
1
1
PUSH-VALUE
DEEP-ARGUMENT-REF
1
2
PUSH-VALUE
ALLOCATE-FRAME
4
POP-FRAME!
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
292
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
CONSTANT
define
POP-ARG1
INVOKE2
eq?
JUMP-FALSE
57
CHECKED-GLOBAL-REF
11
PUSH-VALUE
CHECKED-GLOBAL-REF
3
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
CHECKED-GLOBAL-REF
7
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
DEEP-ARGUMENT-REF
1
1
PUSH-VALUE
DEEP-ARGUMENT-REF
1
2
PUSH-VALUE
ALLOCATE-FRAME
4
POP-FRAME!
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
225
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
CONSTANT
let
POP-ARG1
INVOKE2
eq?
JUMP-FALSE
41
CHECKED-GLOBAL-REF
0
PUSH-VALUE
CHECKED-GLOBAL-REF
12
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
INVOKE1
cdr
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
DEEP-ARGUMENT-REF
1
1
PUSH-VALUE
DEEP-ARGUMENT-REF
1
2
PUSH-VALUE
ALLOCATE-FRAME
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
174
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
CONSTANT
let*
POP-ARG1
INVOKE2
eq?
JUMP-FALSE
81
CHECKED-GLOBAL-REF
0
PUSH-VALUE
CHECKED-GLOBAL-REF
13
PUSH-VALUE
CHECKED-GLOBAL-REF
14
PUSH-VALUE
CHECKED-GLOBAL-REF
3
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
CHECKED-GLOBAL-REF
7
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
ALLOCATE-FRAME
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
DEEP-ARGUMENT-REF
1
1
PUSH-VALUE
DEEP-ARGUMENT-REF
1
2
PUSH-VALUE
ALLOCATE-FRAME
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
83
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
CONSTANT
cond
POP-ARG1
INVOKE2
eq?
JUMP-FALSE
41
CHECKED-GLOBAL-REF
0
PUSH-VALUE
CHECKED-GLOBAL-REF
15
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
INVOKE1
cdr
PUSH-VALUE
ALLOCATE-FRAME
1
POP-FRAME!
0
POP-FUNCTION
PRESERVE-ENV
FUNCTION-INVOKE
RESTORE-ENV
PUSH-VALUE
DEEP-ARGUMENT-REF
1
1
PUSH-VALUE
DEEP-ARGUMENT-REF
1
2
PUSH-VALUE
ALLOCATE-FRAME
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
GOTO
32
CHECKED-GLOBAL-REF
16
PUSH-VALUE
SHALLOW-ARGUMENT-REF
0
PUSH-VALUE
DEEP-ARGUMENT-REF
1
0
INVOKE1
cdr
PUSH-VALUE
DEEP-ARGUMENT-REF
1
1
PUSH-VALUE
DEEP-ARGUMENT-REF
1
2
PUSH-VALUE
ALLOCATE-FRAME
4
POP-FRAME!
3
POP-FRAME!
2
POP-FRAME!
1
POP-FRAME!
0
POP-FUNCTION
FUNCTION-INVOKE
RETURN
SET-GLOBAL!
0)

写到这里突然发现这篇文章毫无意义,看得懂的人很容易看得懂我说什么(都是废话),看不懂的人始终无法理解我在讲什么。

好吧,如果读者看不懂,又想要看懂,这些资料可能有帮助:

  • 《Lisp in small pieces》这本书不错,主要参考了里面很多东西,看个大概就知道怎么写scheme编译器了
  • urscheme, 这个例子很好,2000多行代码就可以写一个scheme到x86汇编的编译器哦
  • nanopass 可以拆成很细的许多步骤,每步只完成一点点事情,最终实现一个编译器。

------------2015.3.28 更新----------

支持变长参数的Y可以这么写:

(define Y
  (lambda (f)
    ((lambda (u) (u u))
     (lambda (x) (f (lambda v (apply (x x) v)))))))

lambda v ...这种形式的lambda表达式中v可以绑定整个参数列表。

scheme