1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
|
Data types
==========
All data types are encoded as one or more consecutive 16bit words (cells).
Most significant bit of each 16bit word is reserved for memory manager usage
and remaining 15bits are used for identifying data types and encoding
data values. Most significant data bits of first word identifies
data type. Most significant bit is referenced as bit 15 and least significant
bit as bit 0.
Implementation mostly uses static inline functions defined in header file,
instead of preprocessor definitions, to provide API more suitable for
foreign function interface.
Number
------
Data type representing signed integer values of arbitrary length. Although
encoding itself doesn't limit value size, to provide easier interface for
data manipulation, values are limited to signed integers represented with
32bit dual complement encoding. Single number is encoded with one or more
words:
+---------+---------------------------------+
| address | data |
+=========+=================================+
| n | m 0 1 s v v v v v v v v v v v v |
+---------+---------------------------------+
| n + 1 | m 1 v v v v v v v v v v v v v v |
+---------+---------------------------------+
| ... | ... |
+---------+---------------------------------+
| n + i | m 1 v v v v v v v v v v v v v v |
+---------+---------------------------------+
| ... | ... |
+---------+---------------------------------+
| n + m | m 0 v v v v v v v v v v v v v v |
+---------+---------------------------------+
where:
* Bit 15 of each word (`m`) is reserved for memory management.
* Bit 14 of first word (``0``) identifies number type.
* Bit 13 of first word and bit 14 of other words represents "more follows".
Only in last word (`n + m`) is this bit set to ``0``.
* Bit 12 of first word (`s`) identifies sign.
* Rest of bits are used as dual complement encoded integer value where
word `n` contains most significant bits and word `n + m` contains
least significant bits.
Pair
----
Data type representing two addresses referencing word locations (usually
known as cons cell). Address values are limited to 14bit unsigned integers
which enables encoding of this type with two words:
+---------+---------------------------------+
| address | data |
+=========+=================================+
| n | m 1 0 a a a a a a a a a a a a a |
+---------+---------------------------------+
| n + 1 | m a b b b b b b b b b b b b b b |
+---------+---------------------------------+
where:
* Bit 15 of each word (`m`) is reserved for memory management.
* Bits 14 and 13 of first word (``10``) identify pair type.
* 14 `a` bits encode first address value.
* 14 `b` bits encode second address value.
String
------
Data type representing zero of more 8bit values. Single string is represented
with one or more words:
+---------+---------------------------------+
| address | data |
+=========+=================================+
| n | m 1 1 0 0 s s s s s s s s s s s |
+---------+---------------------------------+
| n + 1 | m a a a a a a a a b b b b b b b |
+---------+---------------------------------+
| n + 2 | m b c c c c c c c c d d d d d d |
+---------+---------------------------------+
| ... | ... |
+---------+---------------------------------+
where:
* Bit 15 of each word (`m`) is reserved for memory management.
* Bits 14, 13, 12 and 11 of first word (``1100``) identify string type.
* 11 `s` bits represent string length (maximum string length is 2047).
* Bits `a`, `b`, `c`, ... represent 8bit values.
This encoding schema tries to optimize memory usage but at the same time
introduces significant overhead in manipulating string data.
Symbol
------
Symbols are used as human readable labels associated with data values. They
are encoded as 8bit characters similarly as string data:
+---------+---------------------------------+
| address | data |
+=========+=================================+
| n | m 1 1 0 1 s s s s s s s s s s s |
+---------+---------------------------------+
| n + 1 | m a a a a a a a a b b b b b b b |
+---------+---------------------------------+
| n + 2 | m b c c c c c c c c d d d d d d |
+---------+---------------------------------+
| ... | ... |
+---------+---------------------------------+
where:
* Bit 15 of each word (`m`) is reserved for memory management.
* Bits 14, 13, 12 and 11 of first word (``1101``) identify symbol type.
* 11 `s` bits represent symbol length (maximum symbol length is 2047).
* Bits `a`, `b`, `c`, ... represent 8bit character values.
Builtin function
----------------
Builtin functions are referenced by function's index and encoded with
single word:
+---------+---------------------------------+
| address | data |
+=========+=================================+
| n | m 1 1 1 0 0 i i i i i i i i i i |
+---------+---------------------------------+
where:
* Bit 15 (`m`) is reserved for memory management.
* Bits 14, 13, 12, 11 and 10 (``11100``) identify builtin function type.
* 10 `i` bits represent builtin function index.
Builtin syntax
--------------
Builtin syntaxes are referenced by syntax's index and encoded with
single word:
+---------+---------------------------------+
| address | data |
+=========+=================================+
| n | m 1 1 1 0 1 i i i i i i i i i i |
+---------+---------------------------------+
where:
* Bit 15 (`m`) is reserved for memory management.
* Bits 14, 13, 12, 11 and 10 (``11101``) identify builtin syntax type.
* 10 `i` bits represent builtin syntax index.
Function
--------
Functions are defined by parent context, list of argument names and function
body. Type identifier together with 14bit addressees of associated values are
encoded within 4 words:
+---------+---------------------------------+
| address | data |
+=========+=================================+
| n | m 1 1 1 1 0 x x x x x x x x x x |
+---------+---------------------------------+
| n + 1 | m x c c c c c c c c c c c c c c |
+---------+---------------------------------+
| n + 2 | m x a a a a a a a a a a a a a a |
+---------+---------------------------------+
| n + 3 | m x b b b b b b b b b b b b b b |
+---------+---------------------------------+
where:
* Bit 15 of each word (`m`) is reserved for memory management.
* Bits 14, 13, 12, 11 and 10 of first word (``11110``) identify function
type.
* 14 `c` bits represent parent context address.
* 14 `a` bits represent argument name list address.
* 14 `b` bits represent body definition address.
* `x` bits are not used.
Syntax
------
Syntaxes are defined by parent context, list of argument names and syntax
body. Type identifier together with 14bit addressees of associated values are
encoded within 4 words:
+---------+---------------------------------+
| address | data |
+=========+=================================+
| n | m 1 1 1 1 0 x x x x x x x x x x |
+---------+---------------------------------+
| n + 1 | m x c c c c c c c c c c c c c c |
+---------+---------------------------------+
| n + 2 | m x a a a a a a a a a a a a a a |
+---------+---------------------------------+
| n + 3 | m x b b b b b b b b b b b b b b |
+---------+---------------------------------+
where:
* Bit 15 of each word (`m`) is reserved for memory management.
* Bits 14, 13, 12, 11 and 10 of first word (``11110``) identify syntax
type.
* 14 `c` bits represent parent context address.
* 14 `a` bits represent argument name list address.
* 14 `b` bits represent body definition address.
* `x` bits are not used.
Source code
-----------
cell.h
''''''
.. literalinclude:: ../src_c/cell.h
:language: c
cell.c
''''''
.. literalinclude:: ../src_c/cell.c
:language: c
|