initHEAD master

author: bozo.kopic <bozo@kopic.xyz> 2022-08-02 01:20:12 +0200
committer: bozo.kopic <bozo@kopic.xyz> 2022-09-25 02:40:23 +0200
commit: 288727f09a1b3458c268497d111349e608c3f9fa (patch)
tree: d62565249fa3c7127856c65405752572fc41aca9 /docs
23 files changed, 1570 insertions, 0 deletions
diff --git a/docs/apply.rst b/docs/apply.rst
new file mode 100644
index 0000000..e45c31b
--- /dev/null
+++ b/docs/apply.rst
@@ -0,0 +1,63 @@
+Function/syntax application
+===========================
+
+Builtin function/syntax application
+-----------------------------------
+
+Data representing builtin function/syntax, together with type identification,
+contains number representing index of builtin entry. As described in previous
+chapters, each builtin entry contains function pointer:
+
+.. code-block:: c
+
+    typedef lsp_status_t (*lsp_builtin_cb_t)(lsp_env_t *e, lsp_addr_t ctx, lsp_addr_t args);
+
+Application of builtin function/syntax calls provided native function
+with arguments provided to function/syntax application.
+
+Only difference between builtin function and syntax application is in arguments
+provided to application itself. In case of function application, these
+arguments are previously evaluated and in case of syntax, they are provided
+without previous evaluation.
+
+
+User defined function/syntax application
+----------------------------------------
+
+Data representation of function/syntax references:
+
+    * parent context
+    * argument name list
+    * function/syntax body
+
+During function/syntax application, new context is created as copy of
+associated parent context. That context is used for evaluation of
+function/syntax content. Prior to content evaluation, provided arguments
+are added as new entries in context and are associated with symbols
+from argument name list based on their position in list. Once context
+is created and argument values are associated, function/syntax body is
+evaluated. Evaluation of function/syntax body is sequential evaluation
+of all available expression (from first to last) where value of last expression
+is used as resulting value.
+
+In case of function application, result of last expression evaluation is
+also value of function application itself. In case of syntax evaluation,
+result of last expression is data that is additionally evaluated in context
+from which syntax application was called.
+
+
+Source code
+-----------
+
+apply.h
+'''''''
+
+.. literalinclude:: ../src_c/apply.h
+    :language: c
+
+
+apply.c
+'''''''
+
+.. literalinclude:: ../src_c/apply.c
+    :language: c
diff --git a/docs/arch.rst b/docs/arch.rst
new file mode 100644
index 0000000..9eededf
--- /dev/null
+++ b/docs/arch.rst
@@ -0,0 +1,155 @@
+Architecture abstraction layer
+==============================
+
+To enable easier targeting of different execution platforms and provide
+decoupling from standard C library (or other dependencies), thin abstraction
+layer is required.
+
+
+Supported target platforms
+--------------------------
+
+Constant definition ``LSP_ARCH`` is used as identifier of desired target
+platform. This definition should be set during C source preprocessing to
+one of supported values:
+
+    * ``LSP_ARCH_POSIX`` (POSIX target)
+    * ``LSP_ARCH_AVR8`` (8bit AVR target)
+
+Depending on value of this constant, different platform specific
+implementations will be included.
+
+
+C data types
+------------
+
+`arch.h` defines fixed length integers (``lsp_int8_t``, ``lsp_int16_t``,
+``lsp_int32_t``, ``lsp_uint8_t``, ``lsp_uint16_t``, ``lsp_uint32_t``)
+independent of target platform. If target has available standard C library,
+this types are aliases to types defined by `stdint.h`. In case target doesn't
+have standard C library available, these types should be defined by
+appropriate C integer types (``char``, ``short``, ``int``, ``long``, ...).
+
+Additionally, ``lsp_bool_t`` is defined as alias to ``_Bool``. If ``_Bool``
+type is not available, ``lsp_uint8_t`` can be used. Care is taken not to depend
+on specifics of integer to ``_Bool`` conversions so that other integer types
+could be used as ``lsp_bool_t``.
+
+
+Platform specific functions
+---------------------------
+
+Each platform abstraction layer implementation (`arch/*.c`) is responsible
+for functions aliased with following names:
+
+    * ``LSP_ARCH_INIT``
+
+        Function, called before any other function, responsible for
+        initialization of platform specific state.
+
+    * ``LSP_ARCH_CREATE_MEM``
+
+        Function responsible for allocating memory that will represent
+        ``lsp_mem_t`` structure (described in following chapters) and it's
+        initialization.
+
+    * ``LSP_ARCH_FREE_MEM``
+
+        Function responsible for freeing previously allocated memory.
+
+    * ``LSP_ARCH_CREATE_IN_STREAM``
+
+        Function responsible for allocating memory that will represent
+        ``lsp_in_stream_t`` structure (described in following chapters) and
+        it's initialization.
+
+    * ``LSP_ARCH_FREE_IN_STREAM``
+
+        Function responsible for freeing previously allocated memory.
+
+    * ``LSP_ARCH_CREATE_OUT_STREAM``
+
+        Function responsible for allocating memory that will represent
+        ``lsp_out_stream_t`` structure (described in following chapters) and
+        it's initialization.
+
+    * ``LSP_ARCH_FREE_OUT_STREAM``
+
+        Function responsible for freeing previously allocated memory.
+
+In case of memory constrained targets (e.g. ``LSP_ARCH_AVR8``), functions 
+responsible for allocating memory can return pointers to statically
+preallocated memory instead of dynamically allocated. For this target
+platforms, associated freeing functions don't implement any functionality.
+
+
+Input stream
+------------
+
+During initialization of input stream, platform specific implementation
+is responsible for providing function pointer with signature:
+
+.. code-block:: c
+
+    typedef lsp_int16_t (*lsp_stream_getchar_t)(lsp_in_stream_t *s);
+
+Provided function is responsible for reading single character from input
+stream (used by REPL). In case of POSIX, it's behavior corresponds to
+``getchar`` provided by standard C library.
+
+Details of input stream implementation are available in following chapters.
+
+
+Output stream
+-------------
+
+During initialization of output stream, platform specific implementation
+is responsible for providing function pointer with signature:
+
+.. code-block:: c
+
+    typedef lsp_int16_t (*lsp_stream_putchar_t)(lsp_out_stream_t *s, lsp_int16_t v);
+
+Provided function is responsible for writing single character to output
+stream (used by REPL). In case of POSIX, it's behavior corresponds to
+``putchar`` provided by standard C library.
+
+Details of output stream implementation are available in following chapters.
+
+
+Source code
+-----------
+
+arch.h
+''''''
+
+.. literalinclude:: ../src_c/arch.h
+    :language: c
+
+
+arch/avr8.h
+'''''''''''
+
+.. literalinclude:: ../src_c/arch/avr8.h
+    :language: c
+
+
+arch/avr8.c
+'''''''''''
+
+.. literalinclude:: ../src_c/arch/avr8.c
+    :language: c
+
+
+arch/posix.h
+''''''''''''
+
+.. literalinclude:: ../src_c/arch/posix.h
+    :language: c
+
+
+arch/posix.c
+''''''''''''
+
+.. literalinclude:: ../src_c/arch/posix.c
+    :language: c
diff --git a/docs/buff.rst b/docs/buff.rst
new file mode 100644
index 0000000..b052483
--- /dev/null
+++ b/docs/buff.rst
@@ -0,0 +1,29 @@
+String buffer
+=============
+
+While reading character data from input stream, buffer used for storing
+string of arbitrary length is required. `lsp_buff_t` provides preallocated
+limited temporary storage which is used as intermediary buffer in construction
+of `string` data with arbitrary size.
+
+After initialization with `lsp_buff_init`, characters are added to buffer
+with `lsp_buff_push` function. Once all characters are appended, single
+`string` data containing all previously appended characters, can be obtained
+with `lsp_buff_pop` function.
+
+
+Source code
+-----------
+
+buff.h
+''''''
+
+.. literalinclude:: ../src_c/buff.h
+    :language: c
+
+
+buff.c
+''''''
+
+.. literalinclude:: ../src_c/buff.c
+    :language: c
diff --git a/docs/builtin.rst b/docs/builtin.rst
new file mode 100644
index 0000000..bb87f8c
--- /dev/null
+++ b/docs/builtin.rst
@@ -0,0 +1,30 @@
+Builtin
+=======
+
+Builtin functions and builtin syntaxes are represented with
+`lsp_builtin_entry_t` structures. Each entry is defined with unique
+name and function pointer:
+
+.. code-block:: c
+
+    typedef lsp_status_t (*lsp_builtin_cb_t)(lsp_env_t *e, lsp_addr_t ctx, lsp_addr_t args);
+
+Associated function pointer is called during function/syntax application
+(described in following chapters).
+
+
+Source code
+-----------
+
+builtin.h
+'''''''''
+
+.. literalinclude:: ../src_c/builtin.h
+    :language: c
+
+
+builtin.c
+'''''''''
+
+.. literalinclude:: ../src_c/builtin.c
+    :language: c
diff --git a/docs/cell.rst b/docs/cell.rst
new file mode 100644
index 0000000..7e66897
--- /dev/null
+++ b/docs/cell.rst
@@ -0,0 +1,270 @@
+Data types
+==========
+
+All data types are encoded as one or more consecutive 16bit words (cells).
+Most significant bit of each 16bit word is reserved for memory manager usage
+and remaining 15bits are used for identifying data types and encoding
+data values. Most significant data bits of first word identifies
+data type. Most significant bit is referenced as bit 15 and least significant
+bit as bit 0.
+
+Implementation mostly uses static inline functions defined in header file,
+instead of preprocessor definitions, to provide API more suitable for
+foreign function interface.
+
+
+Number
+------
+
+Data type representing signed integer values of arbitrary length. Although
+encoding itself doesn't limit value size, to provide easier interface for
+data manipulation, values are limited to signed integers represented with
+32bit dual complement encoding. Single number is encoded with one or more
+words:
+
+    +---------+---------------------------------+
+    | address | data                            |
+    +=========+=================================+
+    | n       | m 0 1 s v v v v v v v v v v v v |
+    +---------+---------------------------------+
+    | n + 1   | m 1 v v v v v v v v v v v v v v |
+    +---------+---------------------------------+
+    | ...     | ...                             |
+    +---------+---------------------------------+
+    | n + i   | m 1 v v v v v v v v v v v v v v |
+    +---------+---------------------------------+
+    | ...     | ...                             |
+    +---------+---------------------------------+
+    | n + m   | m 0 v v v v v v v v v v v v v v |
+    +---------+---------------------------------+
+
+where:
+
+    * Bit 15 of each word (`m`) is reserved for memory management.
+
+    * Bit 14 of first word (``0``) identifies number type.
+
+    * Bit 13 of first word and bit 14 of other words represents "more follows".
+      Only in last word (`n + m`) is this bit set to ``0``.
+
+    * Bit 12 of first word (`s`) identifies sign.
+
+    * Rest of bits are used as dual complement encoded integer value where
+      word `n` contains most significant bits and word `n + m` contains
+      least significant bits.
+
+
+Pair
+----
+
+Data type representing two addresses referencing word locations (usually
+known as cons cell). Address values are limited to 14bit unsigned integers
+which enables encoding of this type with two words:
+
+    +---------+---------------------------------+
+    | address | data                            |
+    +=========+=================================+
+    | n       | m 1 0 a a a a a a a a a a a a a |
+    +---------+---------------------------------+
+    | n + 1   | m a b b b b b b b b b b b b b b |
+    +---------+---------------------------------+
+
+where:
+
+    * Bit 15 of each word (`m`) is reserved for memory management.
+
+    * Bits 14 and 13 of first word (``10``) identify pair type.
+
+    * 14 `a` bits encode first address value.
+
+    * 14 `b` bits encode second address value.
+
+
+String
+------
+
+Data type representing zero of more 8bit values. Single string is represented
+with one or more words:
+
+    +---------+---------------------------------+
+    | address | data                            |
+    +=========+=================================+
+    | n       | m 1 1 0 0 s s s s s s s s s s s |
+    +---------+---------------------------------+
+    | n + 1   | m a a a a a a a a b b b b b b b |
+    +---------+---------------------------------+
+    | n + 2   | m b c c c c c c c c d d d d d d |
+    +---------+---------------------------------+
+    | ...     | ...                             |
+    +---------+---------------------------------+
+
+where:
+
+    * Bit 15 of each word (`m`) is reserved for memory management.
+
+    * Bits 14, 13, 12 and 11 of first word (``1100``) identify string type.
+
+    * 11 `s` bits represent string length (maximum string length is 2047).
+
+    * Bits `a`, `b`, `c`, ... represent 8bit values.
+
+This encoding schema tries to optimize memory usage but at the same time
+introduces significant overhead in manipulating string data.
+
+
+Symbol
+------
+
+Symbols are used as human readable labels associated with data values. They
+are encoded as 8bit characters similarly as string data:
+
+    +---------+---------------------------------+
+    | address | data                            |
+    +=========+=================================+
+    | n       | m 1 1 0 1 s s s s s s s s s s s |
+    +---------+---------------------------------+
+    | n + 1   | m a a a a a a a a b b b b b b b |
+    +---------+---------------------------------+
+    | n + 2   | m b c c c c c c c c d d d d d d |
+    +---------+---------------------------------+
+    | ...     | ...                             |
+    +---------+---------------------------------+
+
+where:
+
+    * Bit 15 of each word (`m`) is reserved for memory management.
+
+    * Bits 14, 13, 12 and 11 of first word (``1101``) identify symbol type.
+
+    * 11 `s` bits represent symbol length (maximum symbol length is 2047).
+
+    * Bits `a`, `b`, `c`, ... represent 8bit character values.
+
+
+Builtin function
+----------------
+
+Builtin functions are referenced by function's index and encoded with
+single word:
+
+    +---------+---------------------------------+
+    | address | data                            |
+    +=========+=================================+
+    | n       | m 1 1 1 0 0 i i i i i i i i i i |
+    +---------+---------------------------------+
+
+where:
+
+    * Bit 15 (`m`) is reserved for memory management.
+
+    * Bits 14, 13, 12, 11 and 10 (``11100``) identify builtin function type.
+
+    * 10 `i` bits represent builtin function index.
+
+
+Builtin syntax
+--------------
+
+Builtin syntaxes are referenced by syntax's index and encoded with
+single word:
+
+    +---------+---------------------------------+
+    | address | data                            |
+    +=========+=================================+
+    | n       | m 1 1 1 0 1 i i i i i i i i i i |
+    +---------+---------------------------------+
+
+where:
+
+    * Bit 15 (`m`) is reserved for memory management.
+
+    * Bits 14, 13, 12, 11 and 10 (``11101``) identify builtin syntax type.
+
+    * 10 `i` bits represent builtin syntax index.
+
+
+Function
+--------
+
+Functions are defined by parent context, list of argument names and function
+body. Type identifier together with 14bit addressees of associated values are
+encoded within 4 words:
+
+    +---------+---------------------------------+
+    | address | data                            |
+    +=========+=================================+
+    | n       | m 1 1 1 1 0 x x x x x x x x x x |
+    +---------+---------------------------------+
+    | n + 1   | m x c c c c c c c c c c c c c c |
+    +---------+---------------------------------+
+    | n + 2   | m x a a a a a a a a a a a a a a |
+    +---------+---------------------------------+
+    | n + 3   | m x b b b b b b b b b b b b b b |
+    +---------+---------------------------------+
+
+where:
+
+    * Bit 15 of each word (`m`) is reserved for memory management.
+
+    * Bits 14, 13, 12, 11 and 10 of first word (``11110``) identify function
+      type.
+
+    * 14 `c` bits represent parent context address.
+
+    * 14 `a` bits represent argument name list address.
+
+    * 14 `b` bits represent body definition address.
+
+    * `x` bits are not used.
+
+
+Syntax
+------
+
+Syntaxes are defined by parent context, list of argument names and syntax
+body. Type identifier together with 14bit addressees of associated values are
+encoded within 4 words:
+
+    +---------+---------------------------------+
+    | address | data                            |
+    +=========+=================================+
+    | n       | m 1 1 1 1 0 x x x x x x x x x x |
+    +---------+---------------------------------+
+    | n + 1   | m x c c c c c c c c c c c c c c |
+    +---------+---------------------------------+
+    | n + 2   | m x a a a a a a a a a a a a a a |
+    +---------+---------------------------------+
+    | n + 3   | m x b b b b b b b b b b b b b b |
+    +---------+---------------------------------+
+
+where:
+
+    * Bit 15 of each word (`m`) is reserved for memory management.
+
+    * Bits 14, 13, 12, 11 and 10 of first word (``11110``) identify syntax
+      type.
+
+    * 14 `c` bits represent parent context address.
+
+    * 14 `a` bits represent argument name list address.
+
+    * 14 `b` bits represent body definition address.
+
+    * `x` bits are not used.
+
+
+Source code
+-----------
+
+cell.h
+''''''
+
+.. literalinclude:: ../src_c/cell.h
+    :language: c
+
+
+cell.c
+''''''
+
+.. literalinclude:: ../src_c/cell.c
+    :language: c
diff --git a/docs/conf.py b/docs/conf.py
new file mode 100644
index 0000000..d9d0dc9
--- /dev/null
+++ b/docs/conf.py
@@ -0,0 +1,23 @@
+from pathlib import Path
+
+root_path = Path(__file__).parent.parent.resolve()
+
+extensions = ['sphinx.ext.todo']
+
+version = (root_path / 'VERSION').read_text(encoding='utf-8').strip()
+project = 'lisp16'
+copyright = '2022, Bozo Kopic'
+master_doc = 'index'
+
+html_theme = 'furo'
+html_static_path = ['static']
+html_css_files = ['custom.css']
+html_use_index = False
+html_show_sourcelink = False
+html_show_sphinx = False
+html_sidebars = {'**': ["sidebar/brand.html",
+                        "sidebar/scroll-start.html",
+                        "sidebar/navigation.html",
+                        "sidebar/scroll-end.html"]}
+
+todo_include_todos = True
diff --git a/docs/ctx.rst b/docs/ctx.rst
new file mode 100644
index 0000000..7eaa6d2
--- /dev/null
+++ b/docs/ctx.rst
@@ -0,0 +1,53 @@
+Context
+=======
+
+Context (also known as environment in some of Lisp implementations)
+is association between symbols and their data values that apply to
+specific evaluation scope. This implementation provides single
+lexical context for evaluation of all names regarding of associated data
+type (also known as Lisp-1 namespaces).
+
+Basic method of introducing new scope is function definition and application.
+During function definition, new function is associated with scope in which
+function itself is defined. Exact copy of current context as is in moment
+of definition is used as functions parent scope. During function application
+new context is created which inherit all associations that were available
+in parent scope.
+
+Initial context, which is used as starting context, is initialized with
+associations to builtin functions and syntaxes.
+
+Each context contains arbitrary number of mutable entries. Single entry
+defines association where any kind of data is referenced by symbol.
+Together with functions that add new entries (`lsp_ctx_add`) or obtain
+data associated with provided symbol (`lsp_ctx_get`), context enables
+modification of data referenced by provided symbol (`lsp_ctx_set`).
+During entry modification, previous data instance itself is not changed, only
+entry reference is modified to point to newly provided data instance.
+If child context modifies existing entry in parent context, this modifications
+will also be visible in parent context.
+
+Because of support for
+`tail call optimization <https://en.wikipedia.org/wiki/Tail_call>`_,
+implementation of context hierarchy relies on `lsp_ctx_copy` operation instead
+of referencing parent context from child context. This method induces
+additional overhead in context operation. Never the less, additional memory
+allocation overhead is mostly neutralized by usage of immutable linked list
+as basis for entry storage.
+
+
+Source code
+-----------
+
+ctx.h
+'''''
+
+.. literalinclude:: ../src_c/ctx.h
+    :language: c
+
+
+ctx.c
+'''''
+
+.. literalinclude:: ../src_c/ctx.c
+    :language: c
diff --git a/docs/env.rst b/docs/env.rst
new file mode 100644
index 0000000..5a4a243
--- /dev/null
+++ b/docs/env.rst
@@ -0,0 +1,34 @@
+Environment
+===========
+
+Environment (not to be mixed with context) represent current state of
+interpreter instance. It contains reference to memory, input stream and
+output stream. To enable
+`tail call optimization <https://en.wikipedia.org/wiki/Tail_call>`_,
+environment is also used as storage for next expression evaluation.
+
+Main method responsible for environment evaluation is `lsp_env_resolve`.
+This function implements evaluation loop (also known as trampoline), which
+iteratively evaluates sequence of expressions. Evaluation of single expression
+can result in direct data value (which is registered with
+`lsp_env_set_result_value` function) or can be delegated to execution
+of another expression (which is registered with `lsp_env_set_result_eval`
+function). Evaluation loop (trampoline) repeats expression evaluation
+until resulting data value is fully resolved.
+
+
+Source code
+-----------
+
+env.h
+'''''
+
+.. literalinclude:: ../src_c/env.h
+    :language: c
+
+
+env.c
+'''''
+
+.. literalinclude:: ../src_c/env.c
+    :language: c
diff --git a/docs/eval.rst b/docs/eval.rst
new file mode 100644
index 0000000..1e66698
--- /dev/null
+++ b/docs/eval.rst
@@ -0,0 +1,48 @@
+Evaluation
+==========
+
+Because Lisp code is represented with data structures, each data structure
+can be used as interpreter instruction. Evaluation of data, when used as
+interpreter expression, is defined according to data type:
+
+    * number, string, builtin function, builtin syntax, function, syntax
+
+        Data of this type evaluate to itself (expression consisting of
+        single instance of data evaluates to provided data instance).
+
+    * symbol
+
+        Symbols evaluate to data associated to provided symbol in
+        current evaluation context. If context entry associated with the symbol
+        is not available, evaluation error is signaled.
+
+    * pair/list
+
+        Lists evaluate to function/syntax application. As first step,
+        first element of list is evaluated. If first element evaluates
+        to function or builtin function, all remaining elements are also
+        evaluated and used as provided arguments. If first element evaluates
+        to syntax or builtin syntax, remaining list elements are not evaluated
+        and are used as provided arguments. After evaluation of first element
+        and possible argument evaluation (in case of functions), evaluation
+        is delegated to function/syntax application (described in following
+        chapters). If first list element doesn't evaluate to function/syntax,
+        evaluation error is signaled. Exception to this rule is empty
+        list which evaluates to itself.
+
+
+Source code
+-----------
+
+eval.h
+''''''
+
+.. literalinclude:: ../src_c/eval.h
+    :language: c
+
+
+eval.c
+''''''
+
+.. literalinclude:: ../src_c/eval.c
+    :language: c
diff --git a/docs/examples.rst b/docs/examples.rst
new file mode 100644
index 0000000..7905ef4
--- /dev/null
+++ b/docs/examples.rst
@@ -0,0 +1,20 @@
+Examples
+========
+
+Examples are run using interpreter build for POSIX platform. Before
+executing examples, interpreter is bootstrapped with `base-large.lsp`
+extensions.
+
+
+Factorial
+---------
+
+Simple implementation of factorials calculation.
+
+.. literalinclude:: ../examples/factorial.lsp
+    :language: scheme
+
+Example execution::
+
+    $ cat src_lsp/base-large.lsp examples/factorial.lsp | build/posix/lisp16
+    3628800
diff --git a/docs/extensions.rst b/docs/extensions.rst
new file mode 100644
index 0000000..97eab1e
--- /dev/null
+++ b/docs/extensions.rst
@@ -0,0 +1,28 @@
+Extensions
+==========
+
+Base Lisp language, provided by interpreter with builtin functions/syntaxes,
+is deliberately designed with limited set of functionality.
+
+To provide more usable development environment, additional extensions
+written in Lisp are available. These extensions are implemented as set of
+instructions that should be evaluated by interpreter immediately after
+startup. After this bootstrapping evaluation finished, additional
+functionalities are available as part of active context.
+
+
+Source code
+-----------
+
+base-small.lsp
+''''''''''''''
+
+.. literalinclude:: ../src_lsp/base-small.lsp
+    :language: scheme
+
+
+base-large.lsp
+''''''''''''''
+
+.. literalinclude:: ../src_lsp/base-large.lsp
+    :language: scheme
diff --git a/docs/function.rst b/docs/function.rst
new file mode 100644
index 0000000..4f895a5
--- /dev/null
+++ b/docs/function.rst
@@ -0,0 +1,83 @@
+Builtin functions
+=================
+
+In following examples, lines starting with ``>`` represent characters
+provided to input stream. Lines without starting ``>`` character represent
+evaluation results written to output stream.
+
+Available builtin functions:
+
+    * `eval`
+
+    * `apply`
+
+    * `error`
+
+    * `cons`
+
+    * `set-car!`
+
+    * `set-cdr!`
+
+    * `number?`
+
+    * `pair?`
+
+    * `string?`
+
+    * `symbol?`
+
+    * `function?`
+
+    * `syntax?`
+
+    * `eq?`
+
+    * `equal?`
+
+    * `>`
+
+    * `<`
+
+    * `+`
+
+    * `-`
+
+    * `*`
+
+    * `/`
+
+    * `read`
+
+    * `read-u8`
+
+    * `peek-u8`
+
+    * `write`
+
+    * `write-u8`
+
+    * `make-string`
+
+    * `string-length`
+
+    * `string-ref`
+
+    * `string-set!`
+
+
+Source code
+-----------
+
+function.h
+''''''''''
+
+.. literalinclude:: ../src_c/function.h
+    :language: c
+
+
+function.c
+''''''''''
+
+.. literalinclude:: ../src_c/function.c
+    :language: c
diff --git a/docs/index.rst b/docs/index.rst
new file mode 100644
index 0000000..11ca931
--- /dev/null
+++ b/docs/index.rst
@@ -0,0 +1,29 @@
+.. include:: ../README.rst
+
+
+Content
+-------
+
+.. toctree::
+    :maxdepth: 1
+
+    introduction
+    arch
+    status
+    cell
+    mem
+    stream
+    buff
+    read
+    write
+    builtin
+    ctx
+    env
+    eval
+    apply
+    syntax
+    function
+    repl
+    main
+    extensions
+    examples
diff --git a/docs/introduction.rst b/docs/introduction.rst
new file mode 100644
index 0000000..e4d57be
--- /dev/null
+++ b/docs/introduction.rst
@@ -0,0 +1,69 @@
+Introduction
+============
+
+Modern `microcontrollers <https://en.wikipedia.org/wiki/Microcontroller>`_
+provide powerful computational platforms in form of
+affordable and widely usable integrated circuit packages. Although, execution
+performance is often more than enough for execution of complex algorithms,
+some of constraints can represent challenge in implementing certain kind of
+applications. In case of interpreters for high level languages, amount of
+available RAM is significant factor which should be taken into account during
+design of interpreter and interpreted applications. This project explores
+these and similar impacts by implementing Lisp interpreter capable of
+interactive execution on 8bit/32bit microcontrollers.
+
+`Lisp <https://en.wikipedia.org/wiki/Lisp_(programming_language)>`_ is family
+of high level programming languages/dialects characterized with
+concise `homoiconic <https://en.wikipedia.org/wiki/Homoiconicity>`_ syntax.
+`Scheme <https://en.wikipedia.org/wiki/Scheme_(programming_language)>`_, as one
+of Lisp dialects, provides few powerful core functionalities which can be used
+to extend language itself with wide variety of derived constructs applicable to
+different programming paradigms and domains. Therefor, reduced core of Scheme,
+with only few data types and builtin functions/syntaxes, can provide good base
+system for execution of high level applications on embedded/constrain
+platforms.
+
+One of characteristics of most Lisp implementations is support for interactive
+programming based on
+`REPL <https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop>`_.
+If we take into account that microcontrollers usually provide wide range of
+different I/O peripherals, REPL executing instructions on microcontroller can
+represent exploratory environment for testing interaction with low level
+peripheries not usually available on more powerful general purpose computing
+platforms. Although, this kind of environment can be provided by splitting
+functionality between microcontroller and more powerful general purpose
+computer which communicates with microcontroller, that approach relies on
+availability of general purpose computer. By executing interpreter as a whole
+on microcontroller, more self sufficient interactive platform is available in
+form of a single microcontroller. Interaction with interpreter running on
+microcontroller can be based on full duplex
+`UART <https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter>`_
+communication. In this way, wide range of simple terminals can be used (even
+those utilizing other microcontrollers).
+
+Because implementation of Lisp interpreter in C programming language doesn't
+require a lot of dependencies specific to single target platform,
+implementation can be based on thin abstraction layer that will provide
+necessary interaction with host platform. This enables running implementation
+based on same source code on different platforms, including more powerful
+general purpose computers. Implementation targeting POSIX systems enables
+easier testing and provides more developer friendly environment for exploring
+interpreter characteristics. Usability of single code base for different
+execution environments impacts some of API design decisions. Most notably,
+interpreter uses memory locations referenced by additional pointer indirection
+instead of statically allocated locations. This approach induces some penalties
+in form of execution speed but at the same time provides API more suitable
+for interaction with
+`foreign function interface <https://en.wikipedia.org/wiki/Foreign_function_interface>`_
+and usage of multiple independent interpreter instances as part of single POSIX
+process.
+
+Existence of this project is mostly motivated with educational and research
+reasons. Therefore, significant emphasis is given on this documentation as
+integrated part of project. Rest of documentation tries to explain and document
+implementation in gradual bottom-up approach. It is advised to read this
+documentation sequential in order because each chapter depends on explanations
+available in previous chapters. Source code is organized to closely follow
+documentation structure and its listing is available as part of each associated
+chapter. Reader's knowledge of C programming language and related
+platform/memory model is assumed.
diff --git a/docs/main.rst b/docs/main.rst
new file mode 100644
index 0000000..075cc1c
--- /dev/null
+++ b/docs/main.rst
@@ -0,0 +1,15 @@
+Main
+====
+
+Implementation of `main` function is responsible for initialization of
+required structures and calling REPL function.
+
+
+Source code
+-----------
+
+main.c
+''''''
+
+.. literalinclude:: ../src_c/main.c
+    :language: c
diff --git a/docs/mem.rst b/docs/mem.rst
new file mode 100644
index 0000000..20353db
--- /dev/null
+++ b/docs/mem.rst
@@ -0,0 +1,126 @@
+Memory management
+=================
+
+Functions declared in `mem.h` provide interface for usage of dynamically
+allocated data. All other modules interact with dynamic data only through
+these functions.
+
+
+Memory layout
+-------------
+
+State of memory management is represented with structure `lsp_mem_t`. It
+contains continuous memory block of platform specific word count.
+Addresses used for referencing each data correspond to this memory block
+starting word index. Because addresses are represented with 14bit unsigned
+integer values, usable memory for allocation of dynamic data is limited to
+16384 words (32768 bytes). In case of memory constrained systems, this
+size is even smaller.
+
+
+Data allocation
+---------------
+
+Each word (event those which are not single data starting words) have
+single bit dedicated for memory management usage. This most significant bit
+represent words current usage state where ``0`` represents used word (word is
+used for representing data content) and ``1`` represent unused word (word is
+available for allocation of new data instance).
+
+During allocation of new data, all available words are searched for
+continuous block of unused words which could be used to represent newly
+allocated data content. If such word block could not be found, garbage
+collection procedure is triggered, after which search for free block is
+repeated. If search is not successful for the second time, allocation
+of new data fails. If search is successful, address of newly allocated data
+is remembered and used as starting address for future allocation searches.
+
+All `lsp_mem_create_*` functions, used for data allocation, add allocated
+data to list of accessible root data. Management of accessible data is
+possible with `lsp_mem_inc_ref` and `lsp_mem_dec_ref` which increase and
+decrease references to root data. Once data is not part of root list or is not
+referenced by other data which are part of root list, it is considered
+not accessible and can be reclaimed by garbage collector.
+
+During memory initialization, often used data instances are preallocated and
+available as part of `lsp_mem_t` structure (`nil`, `zero`, `one`, `quote`,
+`quasiquote`, `unquote` and `unquote_splicing`).
+
+
+Garbage collector
+-----------------
+
+Garbage collector is based on variant of simple mark and sweep design.
+Initially, all memory words are marked as free. Once words are used for data
+representation, used words are immediately marked as non free. After
+repeated allocation of new data instances, pool of available free words
+is depleted and garbage procedure is started.
+
+As first step of garbage collection, all memory words are marked as free.
+Then, words used for representation of root list are marked as non free by
+usage of recursive function which, together with immediately used words, marks
+all other referenced data words. Data types which reference other data are
+`pair`, `function` and `syntax`. After all data directly and indirectly
+referenced by root list is marked as used, garbage collection finishes
+(all other words remain marked as unused).
+
+
+Data usage
+----------
+
+Together with data allocation function, memory management functions include
+interface for manipulation and usage of allocated data. These include
+`lsp_mem_is_*` function for data type assertion and `lsp_mem_get_*` functions
+for data content retrieval. This functions provide thin wrapper for `cell.h`
+functions with mapping of data addresses to data words.
+
+All data types, except for `string` and `pair` are considered immutable -
+data content is initialized during allocation and is not mutated afterwards.
+In case of `string` and `pair` data, `lsp_mem_set_*` functions enable
+in place modification of data instance content. In case of `string` data,
+only content of preallocated size can be modified (size of string can not
+change after allocation).
+
+All other parts of interpreter use only `lsp_mem_*` wrappers instead of
+direct `lsp_cell_*` function usage.
+
+
+Symbol reusability
+------------------
+
+Because symbols are immutable, memory usage optimization can be done during
+symbol allocation. Each time new symbol should be allocated, content of
+all memory words is searched for already allocated symbol with same content.
+If such instance is found, reference to already existing symbol is returned
+together with incrementing this reference in root list. This optimization
+comes with cost of additional memory search during each symbol allocation.
+
+
+Ownership conventions
+---------------------
+
+Since data availability is controlled with usage create function and
+reference incrementation/decrementation, convention for memory ownership is
+required. In case of this project, if not explicitly specified otherwise,
+caller of function is owner of all input arguments and remains their owner
+after function execution finished (function temporary borrows ownership
+of input arguments). If function returns data, ownership of returned data
+is passed to function caller. It is responsibility of function caller to
+release data returned as result of function execution.
+
+
+Source code
+-----------
+
+mem.h
+'''''
+
+.. literalinclude:: ../src_c/mem.h
+    :language: c
+
+
+mem.c
+'''''
+
+.. literalinclude:: ../src_c/mem.c
+    :language: c
diff --git a/docs/read.rst b/docs/read.rst
new file mode 100644
index 0000000..9d973f4
--- /dev/null
+++ b/docs/read.rst
@@ -0,0 +1,160 @@
+Data reader
+===========
+
+Data reader is responsible for reading from input stream and creating
+data instances represented with input character sequences. In case of
+Lisp, interpreter instructions are represented with data structures. Therefor,
+parser for string representation of data is also parser form programming
+language itself. Some of data types don't have string representation
+(`builtin function`, `builtin syntax`, `function` and `syntax`) and
+can not be produced by reader.
+
+Character ``;`` represents beginning of comment which spawns to the end of line
+(until ``\n`` character is read). White space characters (`` ``, ``\n``,
+``\r``, ``\t``) and comments are ignored by reader. Only significance of
+white space characters is as a data delimiter.
+
+
+Number
+------
+
+Numbers as encoded as sequence of decimal characters (``0`` to ``9``).
+Start of number is detected by starting decimal character.
+
+Example of valid number representations::
+
+    0
+    1
+    42
+
+
+String
+------
+
+String is represented with arbitrary number of characters enclosed between
+``"`` (maximum string length is 2047). ``\`` is used as escape character
+in representation of:
+
+    ``\n``, ``\r``, ``\t``, ``\\``, ``\"``
+
+Start of string is detected by ``"`` character.
+
+Example of valid string representations::
+
+    ""
+    "abc"
+    "\""
+
+
+Symbol
+------
+
+Symbol is represented with arbitrary sequence of characters. Sequence can not
+include white space characters, ``(`` or ``)``. Sequence can not start with
+decimal number character, ``"``, ``'``, ````` or ``,``.
+
+Example of valid symbol representations::
+
+    abc
+    -
+    +=/@
+    symbol-123
+
+
+Pair/list
+---------
+
+Pair is represented with form::
+
+    (<first> . <second>)
+
+where ``<first>`` and ``<second>`` are arbitrary data representations
+(including other pars/lists).
+
+Nested pairs of form::
+
+    (<el_1> . (<el_2> . (.... . <el_n>)))
+
+where ``<el_1>``, ``<el_2>``, ..., ``<el_n>`` is arbitrary data, can be
+written as::
+
+    (<el_1> <el_2> ... . <el_n>)
+
+Special case of pair is empty list represented as::
+
+    ()
+
+which can be recursively specified as::
+
+    (() . ())
+
+Sequence of nested pairs with empty list as last element is called list::
+
+    (<el_1> . (<el_2> . (.... . (<el_n> . ()))))
+
+and can be written as::
+
+    (<el_1> <el_2> ... <el_n>)
+
+Example of valid pairs/lists representations::
+
+    ()
+    (1 . 2)
+    (a . (1 . "b"))
+    (1 2 3 4)
+    ("abc" abc . 123)
+
+
+Reader macros
+-------------
+
+To enable more concise representation of complex forms, reader recognize
+few builtin reader macros. These do not introduce new data types and are used
+only as more convenient representation of other standard data forms:
+
+    * quote
+
+        Recognized by starting character ``'``. Forms::
+
+            '<data>
+
+        are equivalent to::
+
+            (quote <data>)
+
+    * quasiquote
+
+        Recognized by starting character `````. Forms::
+
+            `<data>
+
+        are equivalent to::
+
+            (quasiquote <data>)
+
+    * unquote
+
+        Recognized by starting character ``,``. Forms::
+
+            ,<data>
+
+        are equivalent to::
+
+            (unquote <data>)
+
+
+Source code
+-----------
+
+read.h
+''''''
+
+.. literalinclude:: ../src_c/read.h
+    :language: c
+
+
+read.c
+''''''
+
+.. literalinclude:: ../src_c/read.c
+    :language: c
diff --git a/docs/repl.rst b/docs/repl.rst
new file mode 100644
index 0000000..9046781
--- /dev/null
+++ b/docs/repl.rst
@@ -0,0 +1,40 @@
+REPL
+====
+
+REPL, as it's name suggest, is function implementing endless loop with
+following actions:
+
+    * read
+
+        First step is reading data from input stream.
+
+    * evaluate
+
+        Data that was read from input stream represent expression that
+        should be evaluated by interpreter.
+
+    * print
+
+        Once evaluation finishes, result of evaluation is written to
+        output stream. In case resulting data is ``()``, print step is
+        skipped.
+
+This loop is stopped only in case closing of input or output stream is
+detected.
+
+
+Source code
+-----------
+
+repl.h
+''''''
+
+.. literalinclude:: ../src_c/repl.h
+    :language: c
+
+
+repl.c
+''''''
+
+.. literalinclude:: ../src_c/repl.c
+    :language: c
diff --git a/docs/static/custom.css b/docs/static/custom.css
new file mode 100644
index 0000000..dc6a004
--- /dev/null
+++ b/docs/static/custom.css
@@ -0,0 +1,22 @@
+
+body > div > nav > div > div.wy-side-nav-search > div.version {
+    margin-bottom: 0px;
+}
+
+dl {
+    padding: 5px;
+}
+
+blockquote {
+    border-left: none;
+    font-style: normal;
+    padding: 0 0 0.5rem 0;
+    margin-block-start: 0.5em;
+    margin-block-end: 0.5em;
+    background: transparent;
+}
+
+ol, ul {
+    margin-top: 0.5rem;
+    margin-bottom: 0.5rem;
+}
diff --git a/docs/status.rst b/docs/status.rst
new file mode 100644
index 0000000..8ac3628
--- /dev/null
+++ b/docs/status.rst
@@ -0,0 +1,76 @@
+Result status codes
+===================
+
+This project adopts widely used convention of returning integer encoded status
+codes as function results.
+
+Each function, that needs to notify it's execution status (success or error),
+returns ``lsp_status_t`` (alias for ``lsp_int8_t``).
+
+Available status codes are:
+
+    * ``LSP_SUCCESS``
+
+        Execution successful.
+
+    * ``LSP_EOF``
+
+        End of file encountered during reading/writing.
+
+    * ``LSP_ERR``
+
+        Generic error (unknown error).
+
+    * ``LSP_ERR_MEM``
+
+        Memory error (usually out of memory).
+
+    * ``LSP_ERR_CTX``
+
+        Context error (usually symbol resolution error).
+
+    * ``LSP_ERR_READ``
+
+        Reader error.
+
+    * ``LSP_ERR_WRITE``
+
+        Writer error.
+
+    * ``LSP_ERR_EVAL``
+
+        Evaluation error.
+
+    * ``LSP_ERR_APPLY``
+
+        Application error.
+
+    * ``LSP_ERR_ARG_COUNT``
+
+        Invalid argument count.
+
+    * ``LSP_ERR_ARG_TYPE``
+
+        Invalid argument type.
+
+    * ``LSP_ERR_ARG_VALUE``
+
+        Invalid argument value.
+
+    * ``LSP_ERR_USER``
+
+        Special status value representing beginning of user status codes.
+
+User has ability to raise user error with ``error`` builtin function. This
+error is additionally described with integer value in range [0, 126] and
+encoded as status code.
+
+
+Source code
+-----------
+
+status.h
+''''''''
+
+.. literalinclude:: ../src_c/status.h
+    :language: c
diff --git a/docs/stream.rst b/docs/stream.rst
new file mode 100644
index 0000000..7b90cec
--- /dev/null
+++ b/docs/stream.rst
@@ -0,0 +1,54 @@
+Input/output stream
+===================
+
+To enable interaction with interpreter, basic input/output stream abstraction
+is needed. This implementation uses platform specific functions defined
+by architecture abstraction layer.
+
+
+Input stream
+------------
+
+Input stream provides functionality of reading unsigned 8bit integers
+representing input characters. Implementation utilized `lsp_stream_getchar_t`
+function pointer provided during input stream initialization. Together with
+`lsp_in_stream_read`, used for reading next available input character, input
+stream contains single character buffer used for implementation of
+`lsp_in_stream_peek` functionality.
+
+
+Output stream
+-------------
+
+Output stream provides functionality regarding writing character data.
+It uses `lsp_stream_putchar_t` function pointer provided during output stream
+initialization. Available functions include:
+
+    * `lsp_out_stream_write`
+
+        Write single character to output stream.
+
+    * `lsp_out_stream_write_str`
+
+        Write null terminated character sequence.
+
+    * `lsp_out_stream_write_int`
+
+        Write string representation of signed integer.
+
+
+Source code
+-----------
+
+stream.h
+''''''''
+
+.. literalinclude:: ../src_c/stream.h
+    :language: c
+
+
+stream.c
+''''''''
+
+.. literalinclude:: ../src_c/stream.c
+    :language: c
diff --git a/docs/syntax.rst b/docs/syntax.rst
new file mode 100644
index 0000000..71bd35b
--- /dev/null
+++ b/docs/syntax.rst
@@ -0,0 +1,116 @@
+Builtin syntaxes
+================
+
+This implementation includes minimal number of builtin syntaxes. All other
+constructs should be defined as user defined syntaxes in Lisp itself.
+
+In following examples, lines starting with ``>`` represent characters
+provided to input stream. Lines without starting ``>`` character represent
+evaluation results written to output stream.
+
+Available builtin syntaxes are:
+
+    * `lambda`
+
+        Definition of new user defined function.
+
+        Examples::
+
+            > ((lambda x x) 1 2 3)
+            (1 2 3)
+
+            > ((lambda (x) x) 1)
+            1
+
+            > ((lambda (x . y) y) 1 2 3)
+            (2 3)
+
+    * `syntax`
+
+        Definition of new user defined syntax.
+
+        Examples::
+
+            > ((syntax x x) (lambda x x) 1 2 3)
+            (1 2 3)
+
+    * `define`
+
+        Add new symbol binding to current context.
+
+        Examples::
+
+            > (define xyz 42)
+            > xyz
+            42
+
+    * `set!`
+
+        Change previously defined context entry.
+
+        Examples::
+
+            > (define xyz 42)
+            > xyz
+            42
+            > (set! xyz 24)
+            > xyz
+            24
+
+    * `begin`
+
+        Evaluate multiple expressions and return result of last expression
+        evaluation.
+
+        Examples::
+
+            > (begin 1 2 3)
+            3
+
+    * `quote`
+
+        Evaluates to provided argument.
+
+        Examples::
+
+            > (quote (1 2 3))
+            (1 2 3)
+
+            > '(3 2 1)
+            (3 2 1)
+
+    * `if`
+
+        If first argument evaluates to `thruthy` value, `if` syntax returns
+        result of second argument evaluation. If first argument evaluates to
+        `falsy` value, result of third argument evaluation is returned or
+        ``()`` if third argument is not available.
+
+        `Falsy` values are ``0``, ``()``, ``""`` and empty symbol.
+
+        `Thruthy` values are all that are not `falsy`.
+
+        Examples::
+
+            > (if 0 1 2)
+            2
+
+            > (if "0" 1 2)
+            1
+
+
+Source code
+-----------
+
+syntax.h
+''''''''
+
+.. literalinclude:: ../src_c/syntax.h
+    :language: c
+
+
+syntax.c
+''''''''
+
+.. literalinclude:: ../src_c/syntax.c
+    :language: c
diff --git a/docs/write.rst b/docs/write.rst
new file mode 100644
index 0000000..b90e2e1
--- /dev/null
+++ b/docs/write.rst
@@ -0,0 +1,27 @@
+Data writer
+===========
+
+Data writer enables serialization of data as string characters written
+to output stream. Same encoding rules that apply to data reader also apply to
+data writer.
+
+Additionally, data writer will provide informative representation of data
+types which can not be parsed by data reader (`builtin function`,
+`builtin syntax`, `function` and `syntax`).
+
+
+Source code
+-----------
+
+write.h
+'''''''
+
+.. literalinclude:: ../src_c/write.h
+    :language: c
+
+
+write.c
+'''''''
+
+.. literalinclude:: ../src_c/write.c
+    :language: c
author	bozo.kopic <bozo@kopic.xyz>	2022-08-02 01:20:12 +0200
committer	bozo.kopic <bozo@kopic.xyz>	2022-09-25 02:40:23 +0200
commit	288727f09a1b3458c268497d111349e608c3f9fa (patch)
tree	d62565249fa3c7127856c65405752572fc41aca9 /docs