pre-recording backup

2023-05-17 03:49:26 +01:00
parent 126494c63f
commit 330ddbe399
29 changed files with 404 additions and 195 deletions
--- a/notes/papers/report/parts/macros/+index.md
+++ b/notes/papers/report/parts/macros/+index.md
@@ -0,0 +1,76 @@
+# Macros
+
+Left-associative unparenthesized function calls are intuitive in the typical case of just applying functions to a limited number of arguments, but they're not very flexible. Haskell solves this problem by defining a diverse array of syntax primitives for individual use cases such as `do` blocks for monadic operations. This system is fairly rigid. In contrast, Rust and Lisp enable library developers to invent their own syntax that intuitively describes the concepts the library at hand encodes. In Orchid's codebase, I defined several macros to streamline tasks like defining functions in Rust that are visible to Orchid, or translating between various intermediate representations.
+
+## Generalized kerning
+
+In the referenced video essay, a proof of the Turing completeness of generalized kerning is presented. The proof involves encoding a Turing machine in a string and some kerning rules. The state of the machine is next to the read-write head and all previous states are enumerated next to the tape because kerning rules are reversible. The end result looks something like this:
+
+```
+abcbcddddef|1110000110[0]a00111010011101110
+```
+
+The rules are translated into kerning rules. For a rule
+
+> in state `a` seeing `0`: new state is `b`, write `1` and go `left`
+
+the kerning rule would look like this (template instantiated for all possible characters):
+
+```
+$1 [ 0 ] a equals a < $1 ] b 0
+```
+
+Some global rules are also needed, also instantiated for all possible characters in the templated positions
+
+```
+$1 $2 <  equals  $2 < $1  unless $1 is |
+| $1 <   equals  $1 | >
+> $1 $2  equals  $1 > $2  unless $2 is ]
+> $1 ]   equals  [ $1 ]
+```
+
+What I really appreciate in this proof is how visual it is; based on this, it's easy to imagine how one would go about encoding a pushdown automaton, lambda calculus or other interesting tree-walking procedures. This is exactly why I based my preprocessor on this system.
+
+## Namespaced tokens
+
+Rust macros operate on the bare tokens and therefore are prone to accidental aliasing. Every other item in Rust follows a rigorous namespacing scheme, but macros break this structure, probably because macro execution happens before namespace resolution. The language doesn't suffer too much from this problem, but the relativity of namespacing
+limits their potential.
+
+Orchid's substitution rules operate on namespaced tokens. This means that the macros can hook into each other. Consider the following example, which is a modified version of a real rule included in the prelude:
+
+in _procedural.orc_
+```orchid
+export do { ...$statement ; ...$rest:1 } =10_001=> (
+  statement (...$statement) do { ...$rest } 
+)
+export do { ...$return } =10_000=> (...$return)
+export statement (let $_name = ...$value) ...$next =10_000=> (
+  (\$_name. ...$next) (...$value)
+)
+```
+
+in _cpsio.orc_
+```orchid
+import procedural::statement
+
+export statement (cps $_name = ...$operation) ...$next =10_001=> (
+  (...$operation) \$_name. ...$next
+)
+export statement (cps ...$operation) ...$next =10_000=> (
+  (...$operation) (...$next)
+)
+```
+
+in _main.orc_
+```orchid
+import procedural::(do, let, ;)
+import cpsio::cps
+
+export main := do{
+  cps data = readline;
+  let a = parse_float data * 2;
+  cps print (data ++ " doubled is " ++ stringify a)
+}
+```
+
+Notice how, despite heavy use of macros, it's never ambiguous where a particular name is coming from. Namespacing, including import statements, is entirely unaffected by the macro system. The source of names is completely invariant.
--- a/notes/papers/report/parts/macros/implementation.md
+++ b/notes/papers/report/parts/macros/implementation.md
@@ -0,0 +1,33 @@
+# Implementation
+
+THe optimization of this macro execution algorithm is an interesting challenge with a diverse range of potential optimizations. The current solution is very far from ideal, but it scales to the small experimental workloads I've tried so far and it can accommodate future improvements without any major restructuring.
+
+The scheduling of macros is delegated to a unit called the rule repository, while the matching of rules to a given clause sequence is delegated to a unit called the matcher. Other tasks are split out into distinct self-contained functions, but these two have well-defined interfaces and encapsulate data. Constants are processed by the repository one at a time, which means that the data processed by this subsystem typically corresponds to a single struct, function or other top-level source item.
+
+## keyword dependencies
+
+The most straightforward optimization is to skip patterns that doesn contain tokens that don't appear in the code at all. This is done by the repository to skip entire rules, but not by the rules on the level of individual slices. This is a possible path of improvement for the future.
+
+## Matchers
+
+There are various ways to implement matching. To keep the architecture flexible, the repository is generic over the matcher bounded with a very small trait.
+
+The current implementation of the matcher attempts to build a tree of matchers rooted in the highest priority vectorial placeholder. On each level  The specializations are defined as follows:
+
+- `VecMatcher` corresponds to a subpattern that starts and ends with a vectorial. Each matcher also matches the scalars in between its submatchers, this is not explicitly mentioned.
+  
+  - `Placeholder` corresponds to a vectorial placeholder with no lower priority vectorials around it
+
+    It may reject zero-length slices but contains no other logic
+
+  - `Scan` corresponds to a high priority vectorial on one side of the pattern with lower priority vectorials next to it.
+  
+    It moves the boundary - consisting of scalars - from one side to the other
+
+  - `Middle` corresponds to a high priority vectorial surrounded on both sides by lower priority vectorials.
+
+    This requires by far the most complicated logic, collecting matches for its scalar separators on either side, sorting their pairings by the length of the gap, then applying the submatchers on either side until a match is found. This uses copious heap allocations and it's generally not very efficient. Luckily, this kind of pattern almost never appears in practice.
+
+- `ScalMatcher` tests a single token. Since vectorials in subtrees are strictly lower priority than those in parent enclosing sequences `S` and `Lambda` don't require a lot of advanced negotiation logic. They normally appear in sequence, as a their operations are trivially generalizable to a static sequence of them.
+
+- `AnyMatcher` tests a sequence and wraps either a sequence of `ScalMatcher` or a single `VecMatcher` surrounded by two sequences of `ScalMatcher`.
--- a/notes/papers/report/parts/macros/order.md
+++ b/notes/papers/report/parts/macros/order.md
@@ -0,0 +1,56 @@
+## Execution order
+
+The macros describe several independent sequential programs that are expected to be able to interact with each other. To make debugging easier, the order of execution of internal steps within independent macros has to be relatively static.
+
+The macro executor follows a manually specified priority cascade, with priorities ranging from 0 to 0xep255, exclusive. Priorities are accepted in any valid floating point format, but usually written in binary or hexadecimal natural form, as this format represents floating point precision on the syntax level, thus making precision errors extremely unlikely.
+
+The range of valid priorities is divided up into bands, much like radio bands. In this case, the bands serve to establish a high level ordering between instructions.
+
+The bands are each an even 32 orders of magnitude, with space in between for future expansion
+
+|               |          |             |              |
+| :-----------: | :------: | :---------: | :----------: |
+|      0-7      |   8-15   |    16-23    |    24-31     |
+| optimizations |    x     |             |              |
+|     32-39     |  40-47   |    48-55    |    56-63     |
+|   operators   |          |             |      x       |
+|     64-71     |  72-79   |    80-87    |    88-95     |
+|               |          | expressions |              |
+|    96-103     | 104-111  |   112-119   |   120-127    |
+|               |    x     |             |              |
+|    128-135    | 136-143  |   144-151   |   152-159    |
+|   bindings    |          |             |      x       |
+|    160-167    | 168-175  |   176-183   |   184-191    |
+|               |          |      x      |              |
+|    192-199    | 200-207  |   208-215   |   216-223    |
+|               | aliases* |             |              |
+|    224-231    | 232-239  |   240-247   |     248-     |
+| integrations  |          |             | transitional |
+
+### Transitional states
+
+Transitional states produced and consumed by the same macro program occupy the unbounded top region of the f64 field. Nothing in this range should be written by the user or triggered by an interaction of distinct macro programs, the purpose of this high range is to prevent devices such as carriages from interacting. Any transformation sequence in this range can assume that the tree is inert other than its own operation.
+
+### Integrations
+
+Integrations expect an inert syntax tree but at least one token in the pattern is external to the macro program that resolves the rule, so it's critical that all macro programs be in a documented state at the time of resolution.
+
+### Aliases
+
+Fragments of code extracted for readability are all at exactly 0x1p800. These may be written by programmers who are not comfortable with macros or metaprogramming. They must have unique single token patterns. Because their priority is higher than any entry point, they can safely contain parts of other macro invocations. They have a single priority number because they can't conceivably require internal ordering adjustments and their usage is meant to be be as straightforward as possible.
+
+### Binding builders
+
+Syntax elements that manipulate bindings should be executed earlier. `do` blocks and (future) `match` statements are good examples of this category. Anything with a lower priority trigger can assume that all names are correctly bound.
+
+### Expressions
+
+Things that essentially work like function calls just with added structure, such as `if`/`then`/`else` or `loop`. These are usually just more intuitive custom forms that are otherwise identical to a macro
+
+### Operators
+
+Binary and unary operators that process the chunks of text on either side. Within the band, these macros are prioritized in inverse precedence order and apply to the entire range of clauses before and after themselves, to ensure that function calls have the highest perceived priority.
+
+### Optimizations
+
+Macros that operate on a fully resolved lambda code and look for known patterns that can be simplified. I did not manage to create a working example of this but for instance repeated string concatenation is a good example.