From 288727f09a1b3458c268497d111349e608c3f9fa Mon Sep 17 00:00:00 2001 From: "bozo.kopic" Date: Tue, 2 Aug 2022 01:20:12 +0200 Subject: init --- docs/apply.rst | 63 ++++++++++++ docs/arch.rst | 155 ++++++++++++++++++++++++++++ docs/buff.rst | 29 ++++++ docs/builtin.rst | 30 ++++++ docs/cell.rst | 270 +++++++++++++++++++++++++++++++++++++++++++++++++ docs/conf.py | 23 +++++ docs/ctx.rst | 53 ++++++++++ docs/env.rst | 34 +++++++ docs/eval.rst | 48 +++++++++ docs/examples.rst | 20 ++++ docs/extensions.rst | 28 +++++ docs/function.rst | 83 +++++++++++++++ docs/index.rst | 29 ++++++ docs/introduction.rst | 69 +++++++++++++ docs/main.rst | 15 +++ docs/mem.rst | 126 +++++++++++++++++++++++ docs/read.rst | 160 +++++++++++++++++++++++++++++ docs/repl.rst | 40 ++++++++ docs/static/custom.css | 22 ++++ docs/status.rst | 76 ++++++++++++++ docs/stream.rst | 54 ++++++++++ docs/syntax.rst | 116 +++++++++++++++++++++ docs/write.rst | 27 +++++ 23 files changed, 1570 insertions(+) create mode 100644 docs/apply.rst create mode 100644 docs/arch.rst create mode 100644 docs/buff.rst create mode 100644 docs/builtin.rst create mode 100644 docs/cell.rst create mode 100644 docs/conf.py create mode 100644 docs/ctx.rst create mode 100644 docs/env.rst create mode 100644 docs/eval.rst create mode 100644 docs/examples.rst create mode 100644 docs/extensions.rst create mode 100644 docs/function.rst create mode 100644 docs/index.rst create mode 100644 docs/introduction.rst create mode 100644 docs/main.rst create mode 100644 docs/mem.rst create mode 100644 docs/read.rst create mode 100644 docs/repl.rst create mode 100644 docs/static/custom.css create mode 100644 docs/status.rst create mode 100644 docs/stream.rst create mode 100644 docs/syntax.rst create mode 100644 docs/write.rst (limited to 'docs') diff --git a/docs/apply.rst b/docs/apply.rst new file mode 100644 index 0000000..e45c31b --- /dev/null +++ b/docs/apply.rst @@ -0,0 +1,63 @@ +Function/syntax application +=========================== + +Builtin function/syntax application +----------------------------------- + +Data representing builtin function/syntax, together with type identification, +contains number representing index of builtin entry. As described in previous +chapters, each builtin entry contains function pointer: + +.. code-block:: c + + typedef lsp_status_t (*lsp_builtin_cb_t)(lsp_env_t *e, lsp_addr_t ctx, lsp_addr_t args); + +Application of builtin function/syntax calls provided native function +with arguments provided to function/syntax application. + +Only difference between builtin function and syntax application is in arguments +provided to application itself. In case of function application, these +arguments are previously evaluated and in case of syntax, they are provided +without previous evaluation. + + +User defined function/syntax application +---------------------------------------- + +Data representation of function/syntax references: + + * parent context + * argument name list + * function/syntax body + +During function/syntax application, new context is created as copy of +associated parent context. That context is used for evaluation of +function/syntax content. Prior to content evaluation, provided arguments +are added as new entries in context and are associated with symbols +from argument name list based on their position in list. Once context +is created and argument values are associated, function/syntax body is +evaluated. Evaluation of function/syntax body is sequential evaluation +of all available expression (from first to last) where value of last expression +is used as resulting value. + +In case of function application, result of last expression evaluation is +also value of function application itself. In case of syntax evaluation, +result of last expression is data that is additionally evaluated in context +from which syntax application was called. + + +Source code +----------- + +apply.h +''''''' + +.. literalinclude:: ../src_c/apply.h + :language: c + + +apply.c +''''''' + +.. literalinclude:: ../src_c/apply.c + :language: c diff --git a/docs/arch.rst b/docs/arch.rst new file mode 100644 index 0000000..9eededf --- /dev/null +++ b/docs/arch.rst @@ -0,0 +1,155 @@ +Architecture abstraction layer +============================== + +To enable easier targeting of different execution platforms and provide +decoupling from standard C library (or other dependencies), thin abstraction +layer is required. + + +Supported target platforms +-------------------------- + +Constant definition ``LSP_ARCH`` is used as identifier of desired target +platform. This definition should be set during C source preprocessing to +one of supported values: + + * ``LSP_ARCH_POSIX`` (POSIX target) + * ``LSP_ARCH_AVR8`` (8bit AVR target) + +Depending on value of this constant, different platform specific +implementations will be included. + + +C data types +------------ + +`arch.h` defines fixed length integers (``lsp_int8_t``, ``lsp_int16_t``, +``lsp_int32_t``, ``lsp_uint8_t``, ``lsp_uint16_t``, ``lsp_uint32_t``) +independent of target platform. If target has available standard C library, +this types are aliases to types defined by `stdint.h`. In case target doesn't +have standard C library available, these types should be defined by +appropriate C integer types (``char``, ``short``, ``int``, ``long``, ...). + +Additionally, ``lsp_bool_t`` is defined as alias to ``_Bool``. If ``_Bool`` +type is not available, ``lsp_uint8_t`` can be used. Care is taken not to depend +on specifics of integer to ``_Bool`` conversions so that other integer types +could be used as ``lsp_bool_t``. + + +Platform specific functions +--------------------------- + +Each platform abstraction layer implementation (`arch/*.c`) is responsible +for functions aliased with following names: + + * ``LSP_ARCH_INIT`` + + Function, called before any other function, responsible for + initialization of platform specific state. + + * ``LSP_ARCH_CREATE_MEM`` + + Function responsible for allocating memory that will represent + ``lsp_mem_t`` structure (described in following chapters) and it's + initialization. + + * ``LSP_ARCH_FREE_MEM`` + + Function responsible for freeing previously allocated memory. + + * ``LSP_ARCH_CREATE_IN_STREAM`` + + Function responsible for allocating memory that will represent + ``lsp_in_stream_t`` structure (described in following chapters) and + it's initialization. + + * ``LSP_ARCH_FREE_IN_STREAM`` + + Function responsible for freeing previously allocated memory. + + * ``LSP_ARCH_CREATE_OUT_STREAM`` + + Function responsible for allocating memory that will represent + ``lsp_out_stream_t`` structure (described in following chapters) and + it's initialization. + + * ``LSP_ARCH_FREE_OUT_STREAM`` + + Function responsible for freeing previously allocated memory. + +In case of memory constrained targets (e.g. ``LSP_ARCH_AVR8``), functions +responsible for allocating memory can return pointers to statically +preallocated memory instead of dynamically allocated. For this target +platforms, associated freeing functions don't implement any functionality. + + +Input stream +------------ + +During initialization of input stream, platform specific implementation +is responsible for providing function pointer with signature: + +.. code-block:: c + + typedef lsp_int16_t (*lsp_stream_getchar_t)(lsp_in_stream_t *s); + +Provided function is responsible for reading single character from input +stream (used by REPL). In case of POSIX, it's behavior corresponds to +``getchar`` provided by standard C library. + +Details of input stream implementation are available in following chapters. + + +Output stream +------------- + +During initialization of output stream, platform specific implementation +is responsible for providing function pointer with signature: + +.. code-block:: c + + typedef lsp_int16_t (*lsp_stream_putchar_t)(lsp_out_stream_t *s, lsp_int16_t v); + +Provided function is responsible for writing single character to output +stream (used by REPL). In case of POSIX, it's behavior corresponds to +``putchar`` provided by standard C library. + +Details of output stream implementation are available in following chapters. + + +Source code +----------- + +arch.h +'''''' + +.. literalinclude:: ../src_c/arch.h + :language: c + + +arch/avr8.h +''''''''''' + +.. literalinclude:: ../src_c/arch/avr8.h + :language: c + + +arch/avr8.c +''''''''''' + +.. literalinclude:: ../src_c/arch/avr8.c + :language: c + + +arch/posix.h +'''''''''''' + +.. literalinclude:: ../src_c/arch/posix.h + :language: c + + +arch/posix.c +'''''''''''' + +.. literalinclude:: ../src_c/arch/posix.c + :language: c diff --git a/docs/buff.rst b/docs/buff.rst new file mode 100644 index 0000000..b052483 --- /dev/null +++ b/docs/buff.rst @@ -0,0 +1,29 @@ +String buffer +============= + +While reading character data from input stream, buffer used for storing +string of arbitrary length is required. `lsp_buff_t` provides preallocated +limited temporary storage which is used as intermediary buffer in construction +of `string` data with arbitrary size. + +After initialization with `lsp_buff_init`, characters are added to buffer +with `lsp_buff_push` function. Once all characters are appended, single +`string` data containing all previously appended characters, can be obtained +with `lsp_buff_pop` function. + + +Source code +----------- + +buff.h +'''''' + +.. literalinclude:: ../src_c/buff.h + :language: c + + +buff.c +'''''' + +.. literalinclude:: ../src_c/buff.c + :language: c diff --git a/docs/builtin.rst b/docs/builtin.rst new file mode 100644 index 0000000..bb87f8c --- /dev/null +++ b/docs/builtin.rst @@ -0,0 +1,30 @@ +Builtin +======= + +Builtin functions and builtin syntaxes are represented with +`lsp_builtin_entry_t` structures. Each entry is defined with unique +name and function pointer: + +.. code-block:: c + + typedef lsp_status_t (*lsp_builtin_cb_t)(lsp_env_t *e, lsp_addr_t ctx, lsp_addr_t args); + +Associated function pointer is called during function/syntax application +(described in following chapters). + + +Source code +----------- + +builtin.h +''''''''' + +.. literalinclude:: ../src_c/builtin.h + :language: c + + +builtin.c +''''''''' + +.. literalinclude:: ../src_c/builtin.c + :language: c diff --git a/docs/cell.rst b/docs/cell.rst new file mode 100644 index 0000000..7e66897 --- /dev/null +++ b/docs/cell.rst @@ -0,0 +1,270 @@ +Data types +========== + +All data types are encoded as one or more consecutive 16bit words (cells). +Most significant bit of each 16bit word is reserved for memory manager usage +and remaining 15bits are used for identifying data types and encoding +data values. Most significant data bits of first word identifies +data type. Most significant bit is referenced as bit 15 and least significant +bit as bit 0. + +Implementation mostly uses static inline functions defined in header file, +instead of preprocessor definitions, to provide API more suitable for +foreign function interface. + + +Number +------ + +Data type representing signed integer values of arbitrary length. Although +encoding itself doesn't limit value size, to provide easier interface for +data manipulation, values are limited to signed integers represented with +32bit dual complement encoding. Single number is encoded with one or more +words: + + +---------+---------------------------------+ + | address | data | + +=========+=================================+ + | n | m 0 1 s v v v v v v v v v v v v | + +---------+---------------------------------+ + | n + 1 | m 1 v v v v v v v v v v v v v v | + +---------+---------------------------------+ + | ... | ... | + +---------+---------------------------------+ + | n + i | m 1 v v v v v v v v v v v v v v | + +---------+---------------------------------+ + | ... | ... | + +---------+---------------------------------+ + | n + m | m 0 v v v v v v v v v v v v v v | + +---------+---------------------------------+ + +where: + + * Bit 15 of each word (`m`) is reserved for memory management. + + * Bit 14 of first word (``0``) identifies number type. + + * Bit 13 of first word and bit 14 of other words represents "more follows". + Only in last word (`n + m`) is this bit set to ``0``. + + * Bit 12 of first word (`s`) identifies sign. + + * Rest of bits are used as dual complement encoded integer value where + word `n` contains most significant bits and word `n + m` contains + least significant bits. + + +Pair +---- + +Data type representing two addresses referencing word locations (usually +known as cons cell). Address values are limited to 14bit unsigned integers +which enables encoding of this type with two words: + + +---------+---------------------------------+ + | address | data | + +=========+=================================+ + | n | m 1 0 a a a a a a a a a a a a a | + +---------+---------------------------------+ + | n + 1 | m a b b b b b b b b b b b b b b | + +---------+---------------------------------+ + +where: + + * Bit 15 of each word (`m`) is reserved for memory management. + + * Bits 14 and 13 of first word (``10``) identify pair type. + + * 14 `a` bits encode first address value. + + * 14 `b` bits encode second address value. + + +String +------ + +Data type representing zero of more 8bit values. Single string is represented +with one or more words: + + +---------+---------------------------------+ + | address | data | + +=========+=================================+ + | n | m 1 1 0 0 s s s s s s s s s s s | + +---------+---------------------------------+ + | n + 1 | m a a a a a a a a b b b b b b b | + +---------+---------------------------------+ + | n + 2 | m b c c c c c c c c d d d d d d | + +---------+---------------------------------+ + | ... | ... | + +---------+---------------------------------+ + +where: + + * Bit 15 of each word (`m`) is reserved for memory management. + + * Bits 14, 13, 12 and 11 of first word (``1100``) identify string type. + + * 11 `s` bits represent string length (maximum string length is 2047). + + * Bits `a`, `b`, `c`, ... represent 8bit values. + +This encoding schema tries to optimize memory usage but at the same time +introduces significant overhead in manipulating string data. + + +Symbol +------ + +Symbols are used as human readable labels associated with data values. They +are encoded as 8bit characters similarly as string data: + + +---------+---------------------------------+ + | address | data | + +=========+=================================+ + | n | m 1 1 0 1 s s s s s s s s s s s | + +---------+---------------------------------+ + | n + 1 | m a a a a a a a a b b b b b b b | + +---------+---------------------------------+ + | n + 2 | m b c c c c c c c c d d d d d d | + +---------+---------------------------------+ + | ... | ... | + +---------+---------------------------------+ + +where: + + * Bit 15 of each word (`m`) is reserved for memory management. + + * Bits 14, 13, 12 and 11 of first word (``1101``) identify symbol type. + + * 11 `s` bits represent symbol length (maximum symbol length is 2047). + + * Bits `a`, `b`, `c`, ... represent 8bit character values. + + +Builtin function +---------------- + +Builtin functions are referenced by function's index and encoded with +single word: + + +---------+---------------------------------+ + | address | data | + +=========+=================================+ + | n | m 1 1 1 0 0 i i i i i i i i i i | + +---------+---------------------------------+ + +where: + + * Bit 15 (`m`) is reserved for memory management. + + * Bits 14, 13, 12, 11 and 10 (``11100``) identify builtin function type. + + * 10 `i` bits represent builtin function index. + + +Builtin syntax +-------------- + +Builtin syntaxes are referenced by syntax's index and encoded with +single word: + + +---------+---------------------------------+ + | address | data | + +=========+=================================+ + | n | m 1 1 1 0 1 i i i i i i i i i i | + +---------+---------------------------------+ + +where: + + * Bit 15 (`m`) is reserved for memory management. + + * Bits 14, 13, 12, 11 and 10 (``11101``) identify builtin syntax type. + + * 10 `i` bits represent builtin syntax index. + + +Function +-------- + +Functions are defined by parent context, list of argument names and function +body. Type identifier together with 14bit addressees of associated values are +encoded within 4 words: + + +---------+---------------------------------+ + | address | data | + +=========+=================================+ + | n | m 1 1 1 1 0 x x x x x x x x x x | + +---------+---------------------------------+ + | n + 1 | m x c c c c c c c c c c c c c c | + +---------+---------------------------------+ + | n + 2 | m x a a a a a a a a a a a a a a | + +---------+---------------------------------+ + | n + 3 | m x b b b b b b b b b b b b b b | + +---------+---------------------------------+ + +where: + + * Bit 15 of each word (`m`) is reserved for memory management. + + * Bits 14, 13, 12, 11 and 10 of first word (``11110``) identify function + type. + + * 14 `c` bits represent parent context address. + + * 14 `a` bits represent argument name list address. + + * 14 `b` bits represent body definition address. + + * `x` bits are not used. + + +Syntax +------ + +Syntaxes are defined by parent context, list of argument names and syntax +body. Type identifier together with 14bit addressees of associated values are +encoded within 4 words: + + +---------+---------------------------------+ + | address | data | + +=========+=================================+ + | n | m 1 1 1 1 0 x x x x x x x x x x | + +---------+---------------------------------+ + | n + 1 | m x c c c c c c c c c c c c c c | + +---------+---------------------------------+ + | n + 2 | m x a a a a a a a a a a a a a a | + +---------+---------------------------------+ + | n + 3 | m x b b b b b b b b b b b b b b | + +---------+---------------------------------+ + +where: + + * Bit 15 of each word (`m`) is reserved for memory management. + + * Bits 14, 13, 12, 11 and 10 of first word (``11110``) identify syntax + type. + + * 14 `c` bits represent parent context address. + + * 14 `a` bits represent argument name list address. + + * 14 `b` bits represent body definition address. + + * `x` bits are not used. + + +Source code +----------- + +cell.h +'''''' + +.. literalinclude:: ../src_c/cell.h + :language: c + + +cell.c +'''''' + +.. literalinclude:: ../src_c/cell.c + :language: c diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 0000000..d9d0dc9 --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,23 @@ +from pathlib import Path + +root_path = Path(__file__).parent.parent.resolve() + +extensions = ['sphinx.ext.todo'] + +version = (root_path / 'VERSION').read_text(encoding='utf-8').strip() +project = 'lisp16' +copyright = '2022, Bozo Kopic' +master_doc = 'index' + +html_theme = 'furo' +html_static_path = ['static'] +html_css_files = ['custom.css'] +html_use_index = False +html_show_sourcelink = False +html_show_sphinx = False +html_sidebars = {'**': ["sidebar/brand.html", + "sidebar/scroll-start.html", + "sidebar/navigation.html", + "sidebar/scroll-end.html"]} + +todo_include_todos = True diff --git a/docs/ctx.rst b/docs/ctx.rst new file mode 100644 index 0000000..7eaa6d2 --- /dev/null +++ b/docs/ctx.rst @@ -0,0 +1,53 @@ +Context +======= + +Context (also known as environment in some of Lisp implementations) +is association between symbols and their data values that apply to +specific evaluation scope. This implementation provides single +lexical context for evaluation of all names regarding of associated data +type (also known as Lisp-1 namespaces). + +Basic method of introducing new scope is function definition and application. +During function definition, new function is associated with scope in which +function itself is defined. Exact copy of current context as is in moment +of definition is used as functions parent scope. During function application +new context is created which inherit all associations that were available +in parent scope. + +Initial context, which is used as starting context, is initialized with +associations to builtin functions and syntaxes. + +Each context contains arbitrary number of mutable entries. Single entry +defines association where any kind of data is referenced by symbol. +Together with functions that add new entries (`lsp_ctx_add`) or obtain +data associated with provided symbol (`lsp_ctx_get`), context enables +modification of data referenced by provided symbol (`lsp_ctx_set`). +During entry modification, previous data instance itself is not changed, only +entry reference is modified to point to newly provided data instance. +If child context modifies existing entry in parent context, this modifications +will also be visible in parent context. + +Because of support for +`tail call optimization `_, +implementation of context hierarchy relies on `lsp_ctx_copy` operation instead +of referencing parent context from child context. This method induces +additional overhead in context operation. Never the less, additional memory +allocation overhead is mostly neutralized by usage of immutable linked list +as basis for entry storage. + + +Source code +----------- + +ctx.h +''''' + +.. literalinclude:: ../src_c/ctx.h + :language: c + + +ctx.c +''''' + +.. literalinclude:: ../src_c/ctx.c + :language: c diff --git a/docs/env.rst b/docs/env.rst new file mode 100644 index 0000000..5a4a243 --- /dev/null +++ b/docs/env.rst @@ -0,0 +1,34 @@ +Environment +=========== + +Environment (not to be mixed with context) represent current state of +interpreter instance. It contains reference to memory, input stream and +output stream. To enable +`tail call optimization `_, +environment is also used as storage for next expression evaluation. + +Main method responsible for environment evaluation is `lsp_env_resolve`. +This function implements evaluation loop (also known as trampoline), which +iteratively evaluates sequence of expressions. Evaluation of single expression +can result in direct data value (which is registered with +`lsp_env_set_result_value` function) or can be delegated to execution +of another expression (which is registered with `lsp_env_set_result_eval` +function). Evaluation loop (trampoline) repeats expression evaluation +until resulting data value is fully resolved. + + +Source code +----------- + +env.h +''''' + +.. literalinclude:: ../src_c/env.h + :language: c + + +env.c +''''' + +.. literalinclude:: ../src_c/env.c + :language: c diff --git a/docs/eval.rst b/docs/eval.rst new file mode 100644 index 0000000..1e66698 --- /dev/null +++ b/docs/eval.rst @@ -0,0 +1,48 @@ +Evaluation +========== + +Because Lisp code is represented with data structures, each data structure +can be used as interpreter instruction. Evaluation of data, when used as +interpreter expression, is defined according to data type: + + * number, string, builtin function, builtin syntax, function, syntax + + Data of this type evaluate to itself (expression consisting of + single instance of data evaluates to provided data instance). + + * symbol + + Symbols evaluate to data associated to provided symbol in + current evaluation context. If context entry associated with the symbol + is not available, evaluation error is signaled. + + * pair/list + + Lists evaluate to function/syntax application. As first step, + first element of list is evaluated. If first element evaluates + to function or builtin function, all remaining elements are also + evaluated and used as provided arguments. If first element evaluates + to syntax or builtin syntax, remaining list elements are not evaluated + and are used as provided arguments. After evaluation of first element + and possible argument evaluation (in case of functions), evaluation + is delegated to function/syntax application (described in following + chapters). If first list element doesn't evaluate to function/syntax, + evaluation error is signaled. Exception to this rule is empty + list which evaluates to itself. + + +Source code +----------- + +eval.h +'''''' + +.. literalinclude:: ../src_c/eval.h + :language: c + + +eval.c +'''''' + +.. literalinclude:: ../src_c/eval.c + :language: c diff --git a/docs/examples.rst b/docs/examples.rst new file mode 100644 index 0000000..7905ef4 --- /dev/null +++ b/docs/examples.rst @@ -0,0 +1,20 @@ +Examples +======== + +Examples are run using interpreter build for POSIX platform. Before +executing examples, interpreter is bootstrapped with `base-large.lsp` +extensions. + + +Factorial +--------- + +Simple implementation of factorials calculation. + +.. literalinclude:: ../examples/factorial.lsp + :language: scheme + +Example execution:: + + $ cat src_lsp/base-large.lsp examples/factorial.lsp | build/posix/lisp16 + 3628800 diff --git a/docs/extensions.rst b/docs/extensions.rst new file mode 100644 index 0000000..97eab1e --- /dev/null +++ b/docs/extensions.rst @@ -0,0 +1,28 @@ +Extensions +========== + +Base Lisp language, provided by interpreter with builtin functions/syntaxes, +is deliberately designed with limited set of functionality. + +To provide more usable development environment, additional extensions +written in Lisp are available. These extensions are implemented as set of +instructions that should be evaluated by interpreter immediately after +startup. After this bootstrapping evaluation finished, additional +functionalities are available as part of active context. + + +Source code +----------- + +base-small.lsp +'''''''''''''' + +.. literalinclude:: ../src_lsp/base-small.lsp + :language: scheme + + +base-large.lsp +'''''''''''''' + +.. literalinclude:: ../src_lsp/base-large.lsp + :language: scheme diff --git a/docs/function.rst b/docs/function.rst new file mode 100644 index 0000000..4f895a5 --- /dev/null +++ b/docs/function.rst @@ -0,0 +1,83 @@ +Builtin functions +================= + +In following examples, lines starting with ``>`` represent characters +provided to input stream. Lines without starting ``>`` character represent +evaluation results written to output stream. + +Available builtin functions: + + * `eval` + + * `apply` + + * `error` + + * `cons` + + * `set-car!` + + * `set-cdr!` + + * `number?` + + * `pair?` + + * `string?` + + * `symbol?` + + * `function?` + + * `syntax?` + + * `eq?` + + * `equal?` + + * `>` + + * `<` + + * `+` + + * `-` + + * `*` + + * `/` + + * `read` + + * `read-u8` + + * `peek-u8` + + * `write` + + * `write-u8` + + * `make-string` + + * `string-length` + + * `string-ref` + + * `string-set!` + + +Source code +----------- + +function.h +'''''''''' + +.. literalinclude:: ../src_c/function.h + :language: c + + +function.c +'''''''''' + +.. literalinclude:: ../src_c/function.c + :language: c diff --git a/docs/index.rst b/docs/index.rst new file mode 100644 index 0000000..11ca931 --- /dev/null +++ b/docs/index.rst @@ -0,0 +1,29 @@ +.. include:: ../README.rst + + +Content +------- + +.. toctree:: + :maxdepth: 1 + + introduction + arch + status + cell + mem + stream + buff + read + write + builtin + ctx + env + eval + apply + syntax + function + repl + main + extensions + examples diff --git a/docs/introduction.rst b/docs/introduction.rst new file mode 100644 index 0000000..e4d57be --- /dev/null +++ b/docs/introduction.rst @@ -0,0 +1,69 @@ +Introduction +============ + +Modern `microcontrollers `_ +provide powerful computational platforms in form of +affordable and widely usable integrated circuit packages. Although, execution +performance is often more than enough for execution of complex algorithms, +some of constraints can represent challenge in implementing certain kind of +applications. In case of interpreters for high level languages, amount of +available RAM is significant factor which should be taken into account during +design of interpreter and interpreted applications. This project explores +these and similar impacts by implementing Lisp interpreter capable of +interactive execution on 8bit/32bit microcontrollers. + +`Lisp `_ is family +of high level programming languages/dialects characterized with +concise `homoiconic `_ syntax. +`Scheme `_, as one +of Lisp dialects, provides few powerful core functionalities which can be used +to extend language itself with wide variety of derived constructs applicable to +different programming paradigms and domains. Therefor, reduced core of Scheme, +with only few data types and builtin functions/syntaxes, can provide good base +system for execution of high level applications on embedded/constrain +platforms. + +One of characteristics of most Lisp implementations is support for interactive +programming based on +`REPL `_. +If we take into account that microcontrollers usually provide wide range of +different I/O peripherals, REPL executing instructions on microcontroller can +represent exploratory environment for testing interaction with low level +peripheries not usually available on more powerful general purpose computing +platforms. Although, this kind of environment can be provided by splitting +functionality between microcontroller and more powerful general purpose +computer which communicates with microcontroller, that approach relies on +availability of general purpose computer. By executing interpreter as a whole +on microcontroller, more self sufficient interactive platform is available in +form of a single microcontroller. Interaction with interpreter running on +microcontroller can be based on full duplex +`UART `_ +communication. In this way, wide range of simple terminals can be used (even +those utilizing other microcontrollers). + +Because implementation of Lisp interpreter in C programming language doesn't +require a lot of dependencies specific to single target platform, +implementation can be based on thin abstraction layer that will provide +necessary interaction with host platform. This enables running implementation +based on same source code on different platforms, including more powerful +general purpose computers. Implementation targeting POSIX systems enables +easier testing and provides more developer friendly environment for exploring +interpreter characteristics. Usability of single code base for different +execution environments impacts some of API design decisions. Most notably, +interpreter uses memory locations referenced by additional pointer indirection +instead of statically allocated locations. This approach induces some penalties +in form of execution speed but at the same time provides API more suitable +for interaction with +`foreign function interface `_ +and usage of multiple independent interpreter instances as part of single POSIX +process. + +Existence of this project is mostly motivated with educational and research +reasons. Therefore, significant emphasis is given on this documentation as +integrated part of project. Rest of documentation tries to explain and document +implementation in gradual bottom-up approach. It is advised to read this +documentation sequential in order because each chapter depends on explanations +available in previous chapters. Source code is organized to closely follow +documentation structure and its listing is available as part of each associated +chapter. Reader's knowledge of C programming language and related +platform/memory model is assumed. diff --git a/docs/main.rst b/docs/main.rst new file mode 100644 index 0000000..075cc1c --- /dev/null +++ b/docs/main.rst @@ -0,0 +1,15 @@ +Main +==== + +Implementation of `main` function is responsible for initialization of +required structures and calling REPL function. + + +Source code +----------- + +main.c +'''''' + +.. literalinclude:: ../src_c/main.c + :language: c diff --git a/docs/mem.rst b/docs/mem.rst new file mode 100644 index 0000000..20353db --- /dev/null +++ b/docs/mem.rst @@ -0,0 +1,126 @@ +Memory management +================= + +Functions declared in `mem.h` provide interface for usage of dynamically +allocated data. All other modules interact with dynamic data only through +these functions. + + +Memory layout +------------- + +State of memory management is represented with structure `lsp_mem_t`. It +contains continuous memory block of platform specific word count. +Addresses used for referencing each data correspond to this memory block +starting word index. Because addresses are represented with 14bit unsigned +integer values, usable memory for allocation of dynamic data is limited to +16384 words (32768 bytes). In case of memory constrained systems, this +size is even smaller. + + +Data allocation +--------------- + +Each word (event those which are not single data starting words) have +single bit dedicated for memory management usage. This most significant bit +represent words current usage state where ``0`` represents used word (word is +used for representing data content) and ``1`` represent unused word (word is +available for allocation of new data instance). + +During allocation of new data, all available words are searched for +continuous block of unused words which could be used to represent newly +allocated data content. If such word block could not be found, garbage +collection procedure is triggered, after which search for free block is +repeated. If search is not successful for the second time, allocation +of new data fails. If search is successful, address of newly allocated data +is remembered and used as starting address for future allocation searches. + +All `lsp_mem_create_*` functions, used for data allocation, add allocated +data to list of accessible root data. Management of accessible data is +possible with `lsp_mem_inc_ref` and `lsp_mem_dec_ref` which increase and +decrease references to root data. Once data is not part of root list or is not +referenced by other data which are part of root list, it is considered +not accessible and can be reclaimed by garbage collector. + +During memory initialization, often used data instances are preallocated and +available as part of `lsp_mem_t` structure (`nil`, `zero`, `one`, `quote`, +`quasiquote`, `unquote` and `unquote_splicing`). + + +Garbage collector +----------------- + +Garbage collector is based on variant of simple mark and sweep design. +Initially, all memory words are marked as free. Once words are used for data +representation, used words are immediately marked as non free. After +repeated allocation of new data instances, pool of available free words +is depleted and garbage procedure is started. + +As first step of garbage collection, all memory words are marked as free. +Then, words used for representation of root list are marked as non free by +usage of recursive function which, together with immediately used words, marks +all other referenced data words. Data types which reference other data are +`pair`, `function` and `syntax`. After all data directly and indirectly +referenced by root list is marked as used, garbage collection finishes +(all other words remain marked as unused). + + +Data usage +---------- + +Together with data allocation function, memory management functions include +interface for manipulation and usage of allocated data. These include +`lsp_mem_is_*` function for data type assertion and `lsp_mem_get_*` functions +for data content retrieval. This functions provide thin wrapper for `cell.h` +functions with mapping of data addresses to data words. + +All data types, except for `string` and `pair` are considered immutable - +data content is initialized during allocation and is not mutated afterwards. +In case of `string` and `pair` data, `lsp_mem_set_*` functions enable +in place modification of data instance content. In case of `string` data, +only content of preallocated size can be modified (size of string can not +change after allocation). + +All other parts of interpreter use only `lsp_mem_*` wrappers instead of +direct `lsp_cell_*` function usage. + + +Symbol reusability +------------------ + +Because symbols are immutable, memory usage optimization can be done during +symbol allocation. Each time new symbol should be allocated, content of +all memory words is searched for already allocated symbol with same content. +If such instance is found, reference to already existing symbol is returned +together with incrementing this reference in root list. This optimization +comes with cost of additional memory search during each symbol allocation. + + +Ownership conventions +--------------------- + +Since data availability is controlled with usage create function and +reference incrementation/decrementation, convention for memory ownership is +required. In case of this project, if not explicitly specified otherwise, +caller of function is owner of all input arguments and remains their owner +after function execution finished (function temporary borrows ownership +of input arguments). If function returns data, ownership of returned data +is passed to function caller. It is responsibility of function caller to +release data returned as result of function execution. + + +Source code +----------- + +mem.h +''''' + +.. literalinclude:: ../src_c/mem.h + :language: c + + +mem.c +''''' + +.. literalinclude:: ../src_c/mem.c + :language: c diff --git a/docs/read.rst b/docs/read.rst new file mode 100644 index 0000000..9d973f4 --- /dev/null +++ b/docs/read.rst @@ -0,0 +1,160 @@ +Data reader +=========== + +Data reader is responsible for reading from input stream and creating +data instances represented with input character sequences. In case of +Lisp, interpreter instructions are represented with data structures. Therefor, +parser for string representation of data is also parser form programming +language itself. Some of data types don't have string representation +(`builtin function`, `builtin syntax`, `function` and `syntax`) and +can not be produced by reader. + +Character ``;`` represents beginning of comment which spawns to the end of line +(until ``\n`` character is read). White space characters (`` ``, ``\n``, +``\r``, ``\t``) and comments are ignored by reader. Only significance of +white space characters is as a data delimiter. + + +Number +------ + +Numbers as encoded as sequence of decimal characters (``0`` to ``9``). +Start of number is detected by starting decimal character. + +Example of valid number representations:: + + 0 + 1 + 42 + + +String +------ + +String is represented with arbitrary number of characters enclosed between +``"`` (maximum string length is 2047). ``\`` is used as escape character +in representation of: + + ``\n``, ``\r``, ``\t``, ``\\``, ``\"`` + +Start of string is detected by ``"`` character. + +Example of valid string representations:: + + "" + "abc" + "\"" + + +Symbol +------ + +Symbol is represented with arbitrary sequence of characters. Sequence can not +include white space characters, ``(`` or ``)``. Sequence can not start with +decimal number character, ``"``, ``'``, ````` or ``,``. + +Example of valid symbol representations:: + + abc + - + +=/@ + symbol-123 + + +Pair/list +--------- + +Pair is represented with form:: + + ( . ) + +where ```` and ```` are arbitrary data representations +(including other pars/lists). + +Nested pairs of form:: + + ( . ( . (.... . ))) + +where ````, ````, ..., ```` is arbitrary data, can be +written as:: + + ( ... . ) + +Special case of pair is empty list represented as:: + + () + +which can be recursively specified as:: + + (() . ()) + +Sequence of nested pairs with empty list as last element is called list:: + + ( . ( . (.... . ( . ())))) + +and can be written as:: + + ( ... ) + +Example of valid pairs/lists representations:: + + () + (1 . 2) + (a . (1 . "b")) + (1 2 3 4) + ("abc" abc . 123) + + +Reader macros +------------- + +To enable more concise representation of complex forms, reader recognize +few builtin reader macros. These do not introduce new data types and are used +only as more convenient representation of other standard data forms: + + * quote + + Recognized by starting character ``'``. Forms:: + + ' + + are equivalent to:: + + (quote ) + + * quasiquote + + Recognized by starting character `````. Forms:: + + ` + + are equivalent to:: + + (quasiquote ) + + * unquote + + Recognized by starting character ``,``. Forms:: + + , + + are equivalent to:: + + (unquote ) + + +Source code +----------- + +read.h +'''''' + +.. literalinclude:: ../src_c/read.h + :language: c + + +read.c +'''''' + +.. literalinclude:: ../src_c/read.c + :language: c diff --git a/docs/repl.rst b/docs/repl.rst new file mode 100644 index 0000000..9046781 --- /dev/null +++ b/docs/repl.rst @@ -0,0 +1,40 @@ +REPL +==== + +REPL, as it's name suggest, is function implementing endless loop with +following actions: + + * read + + First step is reading data from input stream. + + * evaluate + + Data that was read from input stream represent expression that + should be evaluated by interpreter. + + * print + + Once evaluation finishes, result of evaluation is written to + output stream. In case resulting data is ``()``, print step is + skipped. + +This loop is stopped only in case closing of input or output stream is +detected. + + +Source code +----------- + +repl.h +'''''' + +.. literalinclude:: ../src_c/repl.h + :language: c + + +repl.c +'''''' + +.. literalinclude:: ../src_c/repl.c + :language: c diff --git a/docs/static/custom.css b/docs/static/custom.css new file mode 100644 index 0000000..dc6a004 --- /dev/null +++ b/docs/static/custom.css @@ -0,0 +1,22 @@ + +body > div > nav > div > div.wy-side-nav-search > div.version { + margin-bottom: 0px; +} + +dl { + padding: 5px; +} + +blockquote { + border-left: none; + font-style: normal; + padding: 0 0 0.5rem 0; + margin-block-start: 0.5em; + margin-block-end: 0.5em; + background: transparent; +} + +ol, ul { + margin-top: 0.5rem; + margin-bottom: 0.5rem; +} diff --git a/docs/status.rst b/docs/status.rst new file mode 100644 index 0000000..8ac3628 --- /dev/null +++ b/docs/status.rst @@ -0,0 +1,76 @@ +Result status codes +=================== + +This project adopts widely used convention of returning integer encoded status +codes as function results. + +Each function, that needs to notify it's execution status (success or error), +returns ``lsp_status_t`` (alias for ``lsp_int8_t``). + +Available status codes are: + + * ``LSP_SUCCESS`` + + Execution successful. + + * ``LSP_EOF`` + + End of file encountered during reading/writing. + + * ``LSP_ERR`` + + Generic error (unknown error). + + * ``LSP_ERR_MEM`` + + Memory error (usually out of memory). + + * ``LSP_ERR_CTX`` + + Context error (usually symbol resolution error). + + * ``LSP_ERR_READ`` + + Reader error. + + * ``LSP_ERR_WRITE`` + + Writer error. + + * ``LSP_ERR_EVAL`` + + Evaluation error. + + * ``LSP_ERR_APPLY`` + + Application error. + + * ``LSP_ERR_ARG_COUNT`` + + Invalid argument count. + + * ``LSP_ERR_ARG_TYPE`` + + Invalid argument type. + + * ``LSP_ERR_ARG_VALUE`` + + Invalid argument value. + + * ``LSP_ERR_USER`` + + Special status value representing beginning of user status codes. + +User has ability to raise user error with ``error`` builtin function. This +error is additionally described with integer value in range [0, 126] and +encoded as status code. + + +Source code +----------- + +status.h +'''''''' + +.. literalinclude:: ../src_c/status.h + :language: c diff --git a/docs/stream.rst b/docs/stream.rst new file mode 100644 index 0000000..7b90cec --- /dev/null +++ b/docs/stream.rst @@ -0,0 +1,54 @@ +Input/output stream +=================== + +To enable interaction with interpreter, basic input/output stream abstraction +is needed. This implementation uses platform specific functions defined +by architecture abstraction layer. + + +Input stream +------------ + +Input stream provides functionality of reading unsigned 8bit integers +representing input characters. Implementation utilized `lsp_stream_getchar_t` +function pointer provided during input stream initialization. Together with +`lsp_in_stream_read`, used for reading next available input character, input +stream contains single character buffer used for implementation of +`lsp_in_stream_peek` functionality. + + +Output stream +------------- + +Output stream provides functionality regarding writing character data. +It uses `lsp_stream_putchar_t` function pointer provided during output stream +initialization. Available functions include: + + * `lsp_out_stream_write` + + Write single character to output stream. + + * `lsp_out_stream_write_str` + + Write null terminated character sequence. + + * `lsp_out_stream_write_int` + + Write string representation of signed integer. + + +Source code +----------- + +stream.h +'''''''' + +.. literalinclude:: ../src_c/stream.h + :language: c + + +stream.c +'''''''' + +.. literalinclude:: ../src_c/stream.c + :language: c diff --git a/docs/syntax.rst b/docs/syntax.rst new file mode 100644 index 0000000..71bd35b --- /dev/null +++ b/docs/syntax.rst @@ -0,0 +1,116 @@ +Builtin syntaxes +================ + +This implementation includes minimal number of builtin syntaxes. All other +constructs should be defined as user defined syntaxes in Lisp itself. + +In following examples, lines starting with ``>`` represent characters +provided to input stream. Lines without starting ``>`` character represent +evaluation results written to output stream. + +Available builtin syntaxes are: + + * `lambda` + + Definition of new user defined function. + + Examples:: + + > ((lambda x x) 1 2 3) + (1 2 3) + + > ((lambda (x) x) 1) + 1 + + > ((lambda (x . y) y) 1 2 3) + (2 3) + + * `syntax` + + Definition of new user defined syntax. + + Examples:: + + > ((syntax x x) (lambda x x) 1 2 3) + (1 2 3) + + * `define` + + Add new symbol binding to current context. + + Examples:: + + > (define xyz 42) + > xyz + 42 + + * `set!` + + Change previously defined context entry. + + Examples:: + + > (define xyz 42) + > xyz + 42 + > (set! xyz 24) + > xyz + 24 + + * `begin` + + Evaluate multiple expressions and return result of last expression + evaluation. + + Examples:: + + > (begin 1 2 3) + 3 + + * `quote` + + Evaluates to provided argument. + + Examples:: + + > (quote (1 2 3)) + (1 2 3) + + > '(3 2 1) + (3 2 1) + + * `if` + + If first argument evaluates to `thruthy` value, `if` syntax returns + result of second argument evaluation. If first argument evaluates to + `falsy` value, result of third argument evaluation is returned or + ``()`` if third argument is not available. + + `Falsy` values are ``0``, ``()``, ``""`` and empty symbol. + + `Thruthy` values are all that are not `falsy`. + + Examples:: + + > (if 0 1 2) + 2 + + > (if "0" 1 2) + 1 + + +Source code +----------- + +syntax.h +'''''''' + +.. literalinclude:: ../src_c/syntax.h + :language: c + + +syntax.c +'''''''' + +.. literalinclude:: ../src_c/syntax.c + :language: c diff --git a/docs/write.rst b/docs/write.rst new file mode 100644 index 0000000..b90e2e1 --- /dev/null +++ b/docs/write.rst @@ -0,0 +1,27 @@ +Data writer +=========== + +Data writer enables serialization of data as string characters written +to output stream. Same encoding rules that apply to data reader also apply to +data writer. + +Additionally, data writer will provide informative representation of data +types which can not be parsed by data reader (`builtin function`, +`builtin syntax`, `function` and `syntax`). + + +Source code +----------- + +write.h +''''''' + +.. literalinclude:: ../src_c/write.h + :language: c + + +write.c +''''''' + +.. literalinclude:: ../src_c/write.c + :language: c -- cgit v1.2.3-70-g09d2