Final commit before submission

2023-05-17 16:16:11 +01:00
parent df429c4770
commit 8bb82b8ead
18 changed files with 143 additions and 362 deletions
--- a/README.md
+++ b/README.md
@@ -1,314 +1,5 @@
-Orchid will be a compiled functional language with a powerful macro
+All you need to run the project is a nighly rust toolchain. Go to one of the folders within `examples` and run
 language and optimizer.
-# Examples
+```sh
-
+cargo run -- -p .
-Hello World in Orchid
+```
 ```orchid
 import std::io::(println, out)
 main := println out "Hello World!"
 ```
 Basic command line calculator
 ```orchid
 import std::io::(readln, printf, in, out)
 main := (
  readln in >>= int |> \a. 
  readln in >>= \op.
  readln in >>= int |> \b.
  printf out "the result is {}\n", [match op (
    "+" => a + b,
    "-" => a - b,
    "*" => a * b,
    "/" => a / b
  )]
 )
 ```
 Grep
 ```orchid
 import std::io::(readln, println, in, out, getarg)
 main := loop \r. (
  readln in >>= \line.
  if (substring (getarg 1) line)
  then (println out ln >>= r)
  else r
 )
 ```
 Filter through an arbitrary collection
 ```orchid
 filter := @C:Type -> Type. @:Map C. @T. \f:T -> Bool. \coll:C T. (
  coll >> \el. if (f el) then (Some el) else Nil
 ):(C T)
 ```
 # Explanation
 This explanation is not a tutorial. It follows a constructive order,
 gradually introducing language features to better demonstrate their
 purpose. It also assumes that the reader is familiar with functional
 programming.
 ## Lambda calculus recap
 The language is almost entirely based on lambda calculus, so everything
 is immutable and evaluation is lazy. The following is an anonymous
 function that takes an integer argument and multiplies it by 2:
 ```orchid
 \x:int. imul 2 x
 ```
 Multiple parameters are represented using currying, so the above is
 equivalent to
 ```orchid
 imul 2
 ```
 Recursion is accomplished using the Y combinator (called `loop`), which
 is a function that takes a function as its single parameter and applies
 it to itself. A naiive implementation of `imul` might look like this.
 ```orchid
 \a:int.\b:int. loop \r. (\i.
  ifthenelse (ieq i 0)
    b
    (iadd b (r (isub i 1))
 ) a
 ```
 `ifthenelse` takes a boolean as its first parameter and selects one of the
 following two expressions (of identical type) accordingly. `ieq`, `iadd`
 and `isub` are self explanatory.
 ## Auto parameters (generics, polymorphism)
 Although I didin't specify the type of `i` in the above example, it is
 known at compile time because the recursion is applied to `a` which is an
 integer. I could have omitted the second argument, then I would have
 had to specify `i`'s type as an integer, because for plain lambda
 expressions all types have to be statically known at compile time.
 Polymorphism is achieved using parametric constructs called auto
 parameters. An auto parameter is a placeholder filled in during
 compilation, syntactically remarkably similar to lambda expressions:
 ```orchid
@T. --[ body of expression referencing T ]--
 ```
 Autos have two closely related uses. First, they are used to represent
 generic type parameters. If an auto is used as the type of an argument
 or some other subexpression that can be trivially deduced from the calling
 context, it is filled in.
 The second usage of autos is for constraints, if they have a type that
 references another auto. Because these parameters are filled in by the
 compiler, referencing them is equivalent to the statement that a default
 value assignable to the specified type exists. Default values are declared
 explicitly and identified by their type, where that type itself may be
 parametric and may specify its own constraints which are resolved
 recursively. If the referenced default is itself a useful value or
 function you can give it a name and use it as such, but you can also omit
 the name, using the default as a hint to the compiler to be able to call
 functions that also have defaults of the same types, or possibly other
 types whose defaults have implmentations based on your defaults.
 For a demonstration, here's a sample implementation of the Option monad.
 ```orchid
 --[[ The definition of Monad ]]--
 define Monad $M:(Type -> Type) as (Pair
  (@T. @U. (T -> M U) -> M T -> M U) -- bind
  (@T. T -> M T) -- return
 )
 bind := @M:Type -> Type. @monad:Monad M. fst monad
 return := @M:Type -> Type. @monad:Monad M. snd monad
 --[[ The definition of Option ]]--
 define Option $T as @U. U -> (T -> U) -> U
 --[ Constructors ]--
 export Some := @T. \data:T. categorise @(Option T) ( \default. \map. map data )
 export None := @T.      categorise @(Option T) ( \default. \map. default )
 --[ Implement Monad ]--
 impl Monad Option via (makePair
  ( @T. @U. \f:T -> U. \opt:Option T. opt None \x. Some f ) -- bind
  Some -- return
 )
 --[ Sample function that works on unknown monad to demonstrate HKTs.
  Turns (Option (M T)) into (M (Option T)), "raising" the unknown monad
  out of the Option ]--
 export raise := @M:Type -> Type. @T. @:Monad M. \opt:Option (M T). (
  opt (return None) (\m. bind m (\x. Some x))
 ):(M (Option T))
 ```
 Typeclasses may be implmented in any module that also defines at least one of
 the types in the definition, which includes both the type of the
 expression and the types of its auto parameters. They always have a name,
 which can be used to override known defaults with which your definiton
 may overlap. For example, if addition is defined elementwise for all
 applicative functors, the author of List might want for concatenation to
 take precedence in the case where all element types match. Notice how
 Add has three arguments, two are the types of the operands and one is
 the result:
 ```orchid
 impl @T. Add (List T) (List T) (List T) by concatListAdd over elementwiseAdd via (
  ...
 )
 ```
 For completeness' sake, the original definition might look like this:
 ```orchid
 impl
  @C:Type -> Type. @T. @U. @V. -- variables
  @:(Applicative C). @:(Add T U V). -- conditions
  Add (C T) (C U) (C V) -- target
 by elementwiseAdd via (
  ...
 )
 ```
 With the use of autos, here's what the recursive multiplication
 implementation looks like:
 ```orchid
 impl @T. @:(Add T T T). Multiply T int T by iterativeMultiply via (
  \a:int. \b:T. loop \r. (\i.
    ifthenelse (ieq i 0)
      b
      (add b (r (isub i 1)) -- notice how iadd is now add
  ) a
 )
 ```
 This could then be applied to any type that's closed over addition
 ```orchid
 aroundTheWorldLyrics := (
  mult 18 (add (mult 4 "Around the World\n") "\n")
 )
 ```
 For my notes on the declare/impl system, see [notes/type_system]
 ## Preprocessor
 The above code samples have one notable difference from the Examples
 section above; they're ugly and hard to read. The solution to this is a
 powerful preprocessor which is used internally to define all sorts of
 syntax sugar from operators to complex syntax patterns and even pattern
 matching, and can also be used to define custom syntax. The preprocessor
 reads the source as an S-tree while executing substitution rules which
 have a real numbered priority.
 In the following example, seq matches a list of arbitrary tokens and its
 parameter is the order of resolution. The order can be used for example to
 make sure that `if a then b else if c then d else e` becomes
 `(ifthenelse a b (ifthenelse c d e))` and not
 `(ifthenelse a b if) c then d else e`. It's worth highlighting here that
 preprocessing works on the typeless AST and matchers are constructed
 using inclusion rather than exclusion, so it would not be possible to
 selectively allow the above example without enforcing that if-statements
 are searched back-to-front. If order is still a problem, you can always
 parenthesize subexpressions at the callsite.
 ```orchid
 (..$pre:2 if ...$cond then ...$true else ...$false) =10=> (
  ..$pre
  (ifthenelse (...$cond) (...$true) (...$false))
 )
 ...$a + ...$b =2=> (add (...$a) (...$b))
 ...$a = ...$b =5=> (eq $a $b)
 ...$a - ...$b =2=> (sub (...$a) (...$b))
 ```
 The recursive addition function now looks like this
 ```orchid
 impl @T. @:(Add T T T). Multiply T int T by iterativeMultiply via (
  \a:int.\b:T. loop \r. (\i.
    if (i = 0) then b
    else (b + (r (i - 1)))
  ) a
 )
 ```
 ### Traversal using carriages
 While it may not be immediately apparent, these substitution rules are
 actually Turing complete. They can be used quite intuitively to traverse
 the token tree with unique "carriage" symbols that move according to their
 environment and can carry structured data payloads.
 Here's an example of a carriage being used to turn a square-bracketed
 list expression into a lambda expression that matches a conslist. Notice
 how the square brackets pair up, as all three variants of brackets
 are considered branches in the S-tree rather than individual tokens.
 ```orchid
 -- Initial step, eliminates entry condition (square brackets) and constructs
 -- carriage and other working symbols
 [...$data:1] =1000.1=> (cons_start ...$data cons_carriage(none))
 -- Shortcut with higher priority
 [] =1000.5=> none
 -- Step
 , $item cons_carriage($tail) =1000.1=> cons_carriage((some (cons $item $tail)))
 -- End, removes carriage and working symbols and leaves valid source code
 cons_start $item cons_carriage($tail) =1000.1=> some (cons $item $tail)
 -- Low priority rules should turn leftover symbols into errors.
 cons_start =0=> cons_err
 cons_carriage($data) =0=> cons_err
 cons_err =0=> (macro_error "Malformed conslist expression")
 -- macro_error will probably have its own rules for composition and
 -- bubbling such that the output for an erratic expression would be a
 -- single macro_error to be decoded by developer tooling
 ```
 (an up-to-date version of this example can be found in the examples
 folder)
 Another thing to note is that although it may look like cons_carriage is
 a global string, it's in fact namespaced to whatever file provides the
 macro. Symbols can be exported either by prefixing the pattern with
 `export` or separately via the following syntax if no single rule is
 equipped to dictate the exported token set.
 ```orchid
 export ::(some_name, other_name)
 ```
 # Module system
 Files are the smallest unit of namespacing, automatically grouped into
 folders and forming a tree the leaves of which are the actual symbols. An
 exported symbol is a name referenced in an exported substitution rule
 or assigned to an exported function. Imported symbols are considered
 identical to the same symbol directly imported from the same module for
 the purposes of substitution. The module syntax is very similar to
 Rust's, and since each token gets its own export with most rules
 comprising several local symbols, the most common import option is
 probably ::* (import all).
 # Optimization
 This is very far away so I don't want to make promises, but I have some
 ideas. 
 - [ ] early execution of functions on any subset of their arguments where
  it could provide substantial speedup
 - [ ] tracking copies of expressions and evaluating them only once
 - [ ] Many cases of single recursion converted to loops
  - [ ] tail recursion
  - [ ] 2 distinct loops where the tail doesn't use the arguments
    - [ ] reorder operations to favour this scenario 
 - [ ] reactive calculation of values that are deemed to be read more often
  than written
 - [ ] automatic profiling based on performance metrics generated by debug
  builds
--- a/notes/papers/report/parts/abbreviations.md
+++ b/notes/papers/report/parts/abbreviations.md
@@ -0,0 +1,4 @@
 Table of abbreviations:
 - **CPS**: Continuation passing style, a technique of transferring control to a function in a lazy language by passing the rest of the current function in a lambda.
--- a/notes/papers/report/parts/ethics.md
+++ b/notes/papers/report/parts/ethics.md
@@ -0,0 +1,5 @@
 # Statement of Ethics
 People other than the author, living creatures or experiments on infrastructure were not involved in the project, so the principles of **do no harm**, **confidentiality of data** and **informed consent** are not relevant.
 As a language developer, my **social responsibility** is to build reliable languages. Orchid is a tool in service of whatever goal the programmer has in mind.
--- a/notes/papers/report/parts/future_work.md
+++ b/notes/papers/report/parts/future_work.md
@@ -78,4 +78,3 @@ Originally, Orchid was meant to have a type system that used Orchid itself to bu
 ### Alternatives
 During initial testing of the working version, I found that the most common kind of programming error in lambda calculus appears to be arity mismatch or syntax error that results in arity mismatch. Without any kind of type checking this is especially difficult to debug as every function looks the same. This can be addressed with a much simpler type system similar to System-F. Any such type checker would have to be constructed so as to only verify user-provided information regarding the arity of functions without attempting to find the arity of every expression, since System-F is strongly normalising and Orchid like any general purpose language supports potentially infinite loops.
--- a/notes/papers/report/parts/haskell.md
+++ b/notes/papers/report/parts/haskell.md
@@ -6,14 +6,10 @@ In addition the handling of all syntax sugar is delegated to the compiler. This
 **Syntax-level metaprogramming:**  [Template Haskell][th1] is Haskell's tool for syntax-level macros. I learned about it after I built Orchid, and it addresses a lot of my problems.
-[th1]: https://wiki.haskell.org/Template_Haskell
+[th1]: ./literature/macros.md
-**Type system:** Haskell's type system is very powerful but to be able to represent some really interesting structures it requires a long list of GHC extensions to be enabled which in turn make typeclass implementation matching undecidable and the heuristic rather bad (understandably so, it was clearly not designed for that; it wasn't really even designed to be a heuristic).
+**Type system:** Haskell's type system is very powerful but to be able to represent some really interesting structures it requires a long list of GHC extensions to be enabled which in turn make typeclass implementation matching undecidable and the heuristic rather bad (understandably so, it was clearly not designed for that).
-My plan for Orchid was to use Orchid itself as a type system as well; rather than aiming for a decidable type system and then extending it until it [inevitably][tc1] [becomes][tc2] [turing-complete][tc3], my type-system would be undecidable from the start and progress would point towards improving the type checker to recognize more and more cases.
+My plan for Orchid was to use Orchid itself as a type system as well; rather than aiming for a decidable type system and then extending it until it inevitably becomes turing-complete [1][2][3], my type-system would be undecidable from the start and progress would point towards improving the type checker to recognize more and more cases.
 [tc1]: https://en.cppreference.com/w/cpp/language/template_metaprogramming
 [tc2]: https://blog.rust-lang.org/2022/10/28/gats-stabilization.html
 [tc3]: https://wiki.haskell.org/Type_SK
 A description of the planned type system is available in [[type_system/+index|Appendix T]]
--- a/notes/papers/report/parts/interner.md
+++ b/notes/papers/report/parts/interner.md
@@ -1,4 +1,4 @@
-# Interner
+## Interner
 To fix a very serious performance problem with the initial POC, all tokens and all namespaced names in Orchid are interned.
@@ -8,36 +8,36 @@ For the sake of simplicity in Rust it is usually done by replacing Strings with
 Interning is of course not limited to strings, but one has to be careful in applying it to distinct concepts as the lifetimes of every single interned thing are tied together, and sometimes the added constraints and complexity aren't worth the performance improvements. Orchid's interner is completely type-agnostic so that the possibility is there. The interning of Orchid string literals is on the roadmap hawever.
-## Initial implementation
+### Initial implementation
 Initially, the interner used Lasso, which is an established string interner with a wide user base.
-### Singleton
+#### Singleton
 A string interner is inherently a memory leak, so making it static would have likely proven problematic in the future. At the same time, magic strings should be internable by any function with or without access to the interner since embedders of Orchid should be able to reference concrete names in their Rust code conveniently. To get around these constraints, the [[oss#static_init|static_init]] crate was used to retain a global singleton instance of the interner and intern magic strings with it. After the first non-static instance of the interner is created, the functions used to interact with the singleton would panic. I also tried using the iconic lazy_static crate, but unfortunately it evaluates the expressions upon first dereference which for functions that take an interner as parameter is always after the creation of the first non-static interner.
-### The Interner Trait
+#### The Interner Trait
 The interner supported exchanging strings or sequences of tokens for tokens. To avoid accidentally comparing the token for a string with the token for a string sequence, or attempting to resolve a token referring to a string sequence as a string, the tokens have a rank, encoded as a dependent type parameter. Strings are exchanged for tokens of rank 0, and sequences of tokens of rank N are exchanged for tokens of rank N+1.
-### Lasso shim
+#### Lasso shim
 Because the type represented by a token is statically guaranteed, we can fearlessly store differently encoded values together without annotation. Thanks to this, strings can simply be forwarded to lasso without overhead. Token sequences are more problematic because the data is ultimately a sequence of numbers and we can't easily assert that they will constitute a valid utf8 string. My temporary solution was to encode the binary data in base64.
-## Revised implementation
+### Revised implementation
 The singleton ended completely defunct because `static_init` apparently also evaluates init expressions on first dereference. Fixing this issue was a good occasion to come up with a better design for the interner.
-### monotype
+#### monotype
 The logic for interning itself is encapsulated by a `monotype` struct. This stores values of a single homogenous type using a hashmap for value->token lookup and a vector for token->value lookup. It is based on, although considerably simpler than Lasso.
-### polytype
+#### polytype
 The actual Interner stores a `HashMap<typeid, Box<dyn Any>>`, which is essentially a store of values of unique type keyed by the type. The values in this case are monotype interners.
 Unlike the naiive initial implementation, this version also operates on references, so interning and externing values causes no unnecessary copying and heap allocations.
-## The InternedDisplay Trait
+### The InternedDisplay Trait
 For refined error reporting most structures derive `Debug` and also implement `Display`. In most cases where the structure at hand describes code of some kind, `Display` attempts to print a fragment of valid code. With every name in the codebase interned this is really difficult because interner tokens can't be resolved from `Display` implementations. To solve this, a new trait was defined called `InternedDisplay` which has the same surface as `Display` except for the fact that `fmt`'s mirror image also takes an additional reference to Interner. The syntax sugar for string formatting is in this way unfortunately lost, but the functionality and the division of responsibilities remains.
--- a/notes/papers/report/parts/interpreter.md
+++ b/notes/papers/report/parts/interpreter.md
@@ -0,0 +1,13 @@
 ## Interpreter
 The Orchid interpreter exposes one main function called `run`. This function takes an expression to reduce and the symbol table returned by the pipeline and processed by the macro repository. It's also possible to specify a reduction step limit to make sure the function returns in a timely manner.
 ### Interfacing with an embedder
 An embedding application essentially interacts with Orchid by way of queries, that is, it invokes the interpreter with a prepared function call. The Orchid code then replies with a return value, which the embedder can either read directly or use as a component in subsequent questions, and so the conversation develops. All communication is initiated, regulated and the conclusions executed entirely by the embedder.
 Although external functions are exposed to Orchid and they can be called at any time (within a computation), they are expected to be pure and any calls to them may be elided by the optimizer if it can deduce the return value from precedent or circumstances.
 One common way to use a query API is to define a single query that is conceptually equivalent to "What would you like to do?" and a set of valid answers which each incorporate some way to pass data through to the next (identical) query. HTTP does this, historically client state was preserved in cookies and pre-filled form inputs, later with client-side Javascript and LocalStorage. 
 Orchid offers a way to do this using the `Handler` trait and the `run_handler` function which is the interpreter's second important export. Essentially, this trait offers a way to combine functions that match and process various types implmeenting `Atomic`. This allows embedders to specify an API where external functions return special, inert `Atomic` instances corresponding to environmental actions the code can take, each of which also carries the continuation of the logic. This is a variation of continuation passing style, a common way of encoding effects in pure languages. It is inspired by algebraic effects
--- a/notes/papers/report/parts/introduction.md
+++ b/notes/papers/report/parts/introduction.md
@@ -2,7 +2,7 @@
 Orchid is a lazy, pure functional programming language with an execution model inspired by Haskell and a powerful syntax-level preprocessor for encoding rich DSLs that adhere to the language's core guarantees.
-# Immutability
+## Immutability
 The merits of pure functional code are well known, but I would like to highlight some of them that are particularly relevant in the case of Orchid;
@@ -10,11 +10,11 @@ The merits of pure functional code are well known, but I would like to highlight
 - **Self-containment** Arguments to the current toplevel function are all completely self-contained expressions, which means that they can be serialized and sent over the network provided that an equivalent for all atoms and externally defined functions exists on both sides, which makes Orchid a prime query language.
  > **note**
-  > Although this is possible using Javascript's `Function` constructor, it is a catastrophic security vulnerability since code sent this way can access all host APIs. In the case of Orchid it is not only perfectly safe from an information access perspective since all references are bound on the sender side and undergo explicit translation, but also from a computational resource perspective since external functions allow the recipient to apply step limits (gas) to the untrusted expression, interleave it with local tasks, and monitor its size and memory footprint.
+  > Although this is possible using Javascript's `Function` constructor, it is a catastrophic security vulnerability since code sent this way can access all host APIs. In the case of Orchid it is not only perfectly safe from an information access perspective since all references are bound on the sender side and undergo explicit translation, but also from a computational resource perspective since the recipient can apply step limits to the untrusted expression, interleave it with local tasks, and monitor its size and memory footprint.
 - **reentrancy** in low reliability environments it is common to run multiple instances of an algorithm in parallel and regularly compare and correct their state using some form of consensus. In an impure language this must be done explicitly and mistakes can result in divergence. In a pure language the executor can be configured to check its state with others every so many steps.
-# Laziness
+## Laziness
 Reactive programming is an increasingly popular paradigm for enabling systems to interact with changing state without recomputing subresults that have not been modified. It is getting popular despite the fact that enabling this kind of programming in classical languages - most notably javascript, where it appears to be the most popular - involves lots of boilerplate and complicated constructs using many many lambda functions. In a lazy language this is essentially the default.
--- a/notes/papers/report/parts/literature/effects.md
+++ b/notes/papers/report/parts/literature/effects.md
@@ -0,0 +1,31 @@
 https://www.unison-lang.org/learn/fundamentals/abilities/
 An excellent description of algebraic effects that lead me to understand how they work and why they present an alternative to monads.
 Algebraic effects essentially associate a set of special types representing families of requests to a function that it may return other than its own return type. Effects usually carry a thunk or function to enable resuming normal processing, and handlers usually respond to the requests represented by the effects by implementing them on top of other effects such as IO. The interesting part to me is that all of this is mainly just convention, so algebraic effects provide type system support for expressing arbitrary requests using CPS.
 Although Orchid doesn't have a type system, CPS is a straightforward way to express side effects
 ---
 https://github.com/zesterer/tao
 The first place where I encountered algebraic effects, otherwise a very interesting language that I definitely hope to adopt features from in the future
 Tao is made by the same person who created Chumsky, the parser combinator used in Orchid. It demonstrates a lot of intersting concepts, its pattern matching is one of a kind. The language is focused mostly on static verification and efficiency neither of which are particularly strong points of Orchid, but some of its auxiliary features are interesting to an untyped, interpreted language too. One of these is generic effects.
 ---
 https://wiki.haskell.org/All_About_Monads#A_Catalog_of_Standard_Monads
 Originally, I intended to have dedicated objects for all action types, and transformations similar to Haskell's monad functions.
 A monad is a container that can store any type and supports three key operations:
 1. Constructing a new instance of the container around a value
 2. Flattening an instance of the container that contains another instance of it into a single container of the inner nested value
 3. applying a transformation to the value inside the container that produces a different type
 The defining characteristic of monads is that whether and when the transformations are applied is flexible since information can't easily leave the monad.
 This system is extremely similar to effects, and at least in an untyped context they're essentially equally powerful. I opted for effects because their defaults seem more sensible.
--- a/notes/papers/report/parts/literature/macros.md
+++ b/notes/papers/report/parts/literature/macros.md
@@ -0,0 +1,34 @@
 https://doc.rust-lang.org/reference/macros-by-example.html
 Rust's macro system was both an invaluable tool and an example while defining Orchid's macros.
 Rust supports declarative macros in what they call "macros by example". These use a state machine-like simplistic parser model to match tokens within the strictly bounded parameter tree. Most notably, Rust's declarative macros don't support any kind of backtracking. They are computationally equivalent to a finite state machine.
 ---
 https://wiki.haskell.org/Template_Haskell
 Template haskell is haskell's macro system that I learned about a little bit too late.
 Throughout this project I was under the impression that Haskell didn't support macros at all, as I didn't discover template haskell until very recently. It is a fairly powerful system, although like Rust's macros their range is bounded, so they can hardly be used to define entirely new syntax. There also seem to be a lot of technical limitations due to this feature not being a priority to GHC.
 ---
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0707r4.pdf
 https://www.youtube.com/watch?v=4AfRAVcThyA
 This paper and the corresponding CppCon talk motivated me to research more natural, integrated forms of metaprogramming.
 The paper describes a way to define default behaviour for user-defined groups of types extending the analogy of enums, structs and classes using a compile-time evaluated function that processes a parameter describing the contents of a declaration. It is the first metaprogramming system I encountered that intended to write meta-programs entirely inline, using the same tools the value-level program uses.
 This eventually lead to the concept of macros over fully namespaced tokens.
 ---
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2392r0.pdf
 https://www.youtube.com/watch?v=raB_289NxBk
 This paper and the corresponding CppCon talk demonstrate a very intersting syntax extension to C++.
 C++ is historically an object-oriented or procedural language, however in recent standards a significant movement towards declarative, functional patterns manifested. This paper in particular proposes a very deep change to the syntax of the language, an entirely new class of statements that simultaneously bind an arbitrary number of names and return a boolean, that may result in objects being constructed, partially moved and destroyed. The syntax extensions appear very fundamental and yet quite convenient, but what little C++ has in terms of local reasoning suffers. This was interesting and inspirational to me because it demonstrated that considerate syntax extensions can entirely redefine a language, while also reminding about C++'s heritage.
--- a/notes/papers/report/parts/macros/+index.md
+++ b/notes/papers/report/parts/macros/+index.md
@@ -1,8 +1,8 @@
-# Macros
+## Macros
-Left-associative unparenthesized function calls are intuitive in the typical case of just applying functions to a limited number of arguments, but they're not very flexible. Haskell solves this problem by defining a diverse array of syntax primitives for individual use cases such as `do` blocks for monadic operations. This system is fairly rigid. In contrast, Rust and Lisp enable library developers to invent their own syntax that intuitively describes the concepts the library at hand encodes. In Orchid's codebase, I defined several macros to streamline tasks like defining functions in Rust that are visible to Orchid, or translating between various intermediate representations.
+Left-associative unparenthesized function calls are intuitive in the typical case of just applying functions to a limited number of arguments, but they're not very flexible. Haskell solves this problem by defining a diverse array of syntax primitives for individual use cases such as `do` blocks for monadic operations. This system is fairly rigid. In contrast, Rust enables library developers to invent their own syntax that intuitively describes the concepts the library at hand encodes. In Orchid's codebase, I defined several macros to streamline tasks like defining functions in Rust that are visible to Orchid, or translating between various intermediate representations.
-## Generalized kerning
+### Generalized kerning
 In the referenced video essay, a proof of the Turing completeness of generalized kerning is presented. The proof involves encoding a Turing machine in a string and some kerning rules. The state of the machine is next to the read-write head and all previous states are enumerated next to the tape because kerning rules are reversible. The end result looks something like this:
@@ -31,7 +31,7 @@ $1 $2 <  equals  $2 < $1  unless $1 is |
 What I really appreciate in this proof is how visual it is; based on this, it's easy to imagine how one would go about encoding a pushdown automaton, lambda calculus or other interesting tree-walking procedures. This is exactly why I based my preprocessor on this system.
-## Namespaced tokens
+### Namespaced tokens
 Rust macros operate on the bare tokens and therefore are prone to accidental aliasing. Every other item in Rust follows a rigorous namespacing scheme, but macros break this structure, probably because macro execution happens before namespace resolution. The language doesn't suffer too much from this problem, but the relativity of namespacing
 limits their potential.
--- a/notes/papers/report/parts/macros/implementation.md
+++ b/notes/papers/report/parts/macros/implementation.md
@@ -1,14 +1,14 @@
-# Implementation
+## Implementation
 THe optimization of this macro execution algorithm is an interesting challenge with a diverse range of potential optimizations. The current solution is very far from ideal, but it scales to the small experimental workloads I've tried so far and it can accommodate future improvements without any major restructuring.
 The scheduling of macros is delegated to a unit called the rule repository, while the matching of rules to a given clause sequence is delegated to a unit called the matcher. Other tasks are split out into distinct self-contained functions, but these two have well-defined interfaces and encapsulate data. Constants are processed by the repository one at a time, which means that the data processed by this subsystem typically corresponds to a single struct, function or other top-level source item.
-## keyword dependencies
+### keyword dependencies
 The most straightforward optimization is to skip patterns that doesn contain tokens that don't appear in the code at all. This is done by the repository to skip entire rules, but not by the rules on the level of individual slices. This is a possible path of improvement for the future.
-## Matchers
+### Matchers
 There are various ways to implement matching. To keep the architecture flexible, the repository is generic over the matcher bounded with a very small trait.
--- a/notes/papers/report/parts/macros/order.md
+++ b/notes/papers/report/parts/macros/order.md
@@ -1,4 +1,4 @@
-## Execution order
+### Execution order
 The macros describe several independent sequential programs that are expected to be able to interact with each other. To make debugging easier, the order of execution of internal steps within independent macros has to be relatively static.
@@ -27,30 +27,30 @@ The bands are each an even 32 orders of magnitude, with space in between for fut
 |    224-231    | 232-239  |   240-247   |     248-     |
 | integrations  |          |             | transitional |
-### Transitional states
+#### Transitional states
 Transitional states produced and consumed by the same macro program occupy the unbounded top region of the f64 field. Nothing in this range should be written by the user or triggered by an interaction of distinct macro programs, the purpose of this high range is to prevent devices such as carriages from interacting. Any transformation sequence in this range can assume that the tree is inert other than its own operation.
-### Integrations
+#### Integrations
 Integrations expect an inert syntax tree but at least one token in the pattern is external to the macro program that resolves the rule, so it's critical that all macro programs be in a documented state at the time of resolution.
-### Aliases
+#### Aliases
 Fragments of code extracted for readability are all at exactly 0x1p800. These may be written by programmers who are not comfortable with macros or metaprogramming. They must have unique single token patterns. Because their priority is higher than any entry point, they can safely contain parts of other macro invocations. They have a single priority number because they can't conceivably require internal ordering adjustments and their usage is meant to be be as straightforward as possible.
-### Binding builders
+#### Binding builders
 Syntax elements that manipulate bindings should be executed earlier. `do` blocks and (future) `match` statements are good examples of this category. Anything with a lower priority trigger can assume that all names are correctly bound.
-### Expressions
+#### Expressions
 Things that essentially work like function calls just with added structure, such as `if`/`then`/`else` or `loop`. These are usually just more intuitive custom forms that are otherwise identical to a macro
-### Operators
+#### Operators
 Binary and unary operators that process the chunks of text on either side. Within the band, these macros are prioritized in inverse precedence order and apply to the entire range of clauses before and after themselves, to ensure that function calls have the highest perceived priority.
-### Optimizations
+#### Optimizations
 Macros that operate on a fully resolved lambda code and look for known patterns that can be simplified. I did not manage to create a working example of this but for instance repeated string concatenation is a good example.
--- a/notes/papers/report/parts/pipeline.md
+++ b/notes/papers/report/parts/pipeline.md
@@ -1,8 +1,8 @@
-# The pipeline
+## The pipeline
 The conversion of Orchid files into a collection of macro rules is a relatively complicated process that took several attempts to get right.
-## Push vs pull logistics
+### Push vs pull logistics
 The initial POC implementation of Orchid used pull logistics aka lazy evaluation everywhere. This meant that specially annotated units of computation would only be executed when other units referenced their result. This is a classic functional optimization, but its implementation in Rust had a couple drawbacks; First, lazy evaluation conflicts with most other optimizations, because it's impossible to assert the impact of a function call. Also - although this is probably a problem with my implementation - because the caching wrapper stores a trait object of Fn, every call to a stage is equivalent to a virtual function call which alone is sometimes an excessive penalty. Second, all values must live on the heap and have static lifetimes. Eventually nearly all fields referenced by the pipeline or its stages were wrapped in Rc.
@@ -10,13 +10,13 @@ Additionally, in a lot of cases lazy evaluation is undesirable. Most programmers
 To address these issues, the second iteration only uses pull logistics for the preparsing and file collection phase, and the only errors guaranteed to be produced by this stage are imports from missing files and syntax errors regarding the structure of the S-expressions.
-## Stages
+### Stages
 As of writing, the pipeline consists of three main stages; source loading, tree-building and name resolution. These break down into multiple substages.
 All stages support various ways to introduce blind spots and precomputed values into their processing. This is used to load the standard library, prelude, and possibly externally defined intermediate stages of injected code.
-### Source loading
+#### Source loading
 This stage encapsulates pull logistics. It collects all source files that should be included in the compilation in a hashmap keyed by their project-relative path. All subsequent operations are executed on every element of this map unconditionally.
@@ -33,19 +33,19 @@ Parsing itself is outsourced to a Chumsky parser defined separately. This parser
 This information is compiled into a very barebones module representation and returned alongside the loaded source code.
-### Tree building
+#### Tree building
 This stage aims to collect all modules in a single tree. To achieve this, it re-parses each file with the set of operators collected from the datastructure built during preparsing. The glob imports in the resulting FileEntry lists are eliminated, and the names in the bodies of expressions and macro rules are prefixed with the module path in preparation for macro execution.
 Operator collection can be advised about the exports of injected modules using a callback, and a prelude in the form of a list of line objects - in the shape emitted by the parser - can be injected before the contents of every module to define universally accessible names. Since these lines are processed for every file, it's generally best to just insert a single glob import from a module that defines everything. The interpreter inserts `import prelude::*`.
-### Import resolution
+#### Import resolution
 This stage aims to produce a tree ready for consumption by a macro executor or any other subsystem. It replaces every name originating from imported namespaces in every module with the original name.
 Injection is supported with a function which takes a path and, if it's valid in the injected tree, returns its original value even if that's the path itself. This is used both to skip resolving names in the injected modules - which are expected to have already been processed using this step - and of course to find the origin of imports from the injected tree.
-## Layered parsing
+### Layered parsing
 The most important export of the pipeline is the `parse_layer` function, which acts as a façade over the complex system described above. The environment in which user code runs is bootstrapped using repeated invocations of this function. It has the following options
@@ -62,6 +62,6 @@ The most important export of the pipeline is the `parse_layer` function, which a
    The interpreter sets this to `import prelude::*`. If the embedder defines its own prelude it's a good idea to append it.
-### The first layer
+#### The first layer
 The other important exports of the pipeline are `ConstTree` and `from_const_tree`. These are used to define a base layer that exposes extern functions. `ConstTree` implements `Add` so distinct libraries of extern functions can be intuitively combined.
--- a/notes/papers/report/parts/references.md
+++ b/notes/papers/report/parts/references.md
@@ -0,0 +1,10 @@
 # References
 [1] various authors, "C++ Programming/Templates/Template Meta-Programming" https://en.wikibooks.org/wiki/C++_Programming/Templates/Template_Meta-Programming (accessed May 5, 2023)
 [2] J. Huey on behalf of The Types Team, "Generic associated types to be stable in Rust 1.65" https://blog.rust-lang.org/2022/10/28/gats-stabilization.html (accessed May 5, 2023)
 [3] K. Wnasbrough, "Instance Declarations are Uniuersal" https://www.lochan.org/keith/publications/undec.html (accessed May 5, 2023)
 [4] M. Stay, "Allow classes to be parametric in other parametric classes" https://github.com/microsoft/TypeScript/issues/1213 (accessed May 5, 2023)
--- a/notes/papers/report/parts/timeline.md
+++ b/notes/papers/report/parts/timeline.md
@@ -6,9 +6,7 @@ This is also when I came up with the name. I read an article about how orchids d
 Having tested that my idea could work, at the start of the academic year I switched to the type system. When the project synopsis was written, I imagined that the type system would be an appropriately sized chunk of the work for a final year project; its title was "Orchid's Type System".
-Around the end of November I had researched enough type theory to decide what kind of type system I would want. My choice was advised by a number of grievances I had with Typescript such as the lack of higher-kinded types which comes up [surprisingly often][1] in Javascript, lack of support for nominal types and the difficulty of using dependent types. I appreciated however the powerful type transformation techniques.
+Around the end of November I had researched enough type theory to decide what kind of type system I would want. My choice was advised by a number of grievances I had with Typescript such as the lack of higher-kinded types which comes up surprisingly often[4] in Javascript, lack of support for nominal types and the difficulty of using dependent types. I appreciated however the powerful type transformation techniques.
 [1]: https://github.com/microsoft/TypeScript/issues/1213
 However, building a type system proved too difficult; on February 23 I decided to cut my losses and focus on building an interpreter. The proof-of-concept interpreter was finished on March 10, but the macro executor was still using the naiive implementation completed over the summer so it would take around 15 seconds to load an example file of 20 lines, and a range of other issues cropped up as well cumulatively impacting every corner of the codebase. A full rewrite was necessary.
--- a/notes/papers/report/parts/type_system/+index.md
+++ b/notes/papers/report/parts/type_system/+index.md
@@ -1,4 +1,4 @@
-# Type system
+## Type system
 This is a description of the type system originally designed for Orchid which never reached the MVP stage.
@@ -8,7 +8,7 @@ At the core the type system consists of three concepts:
 - `impl` provides instances of typeclasses
 - a universal parametric construct that serves as both a `forall` (or generic) and a `where` (or constraint). This was temporarily named `auto` but is probably more aptly described by the word `given`.
-## Unification
+### Unification
 The backbone of any type system is unification. In this case, this is an especially interesting question because the type expressions are built with code and nontermination is outstandingly common.
--- a/notes/papers/report/parts/type_system/02-given.md
+++ b/notes/papers/report/parts/type_system/02-given.md
@@ -1,4 +1,4 @@
-## Given (formerly Auto)
+### Given (formerly Auto)
 `given` bindings have the form `@Name:type. body`. Either the `Name` or the  `:type` part can be optional but at least one is required. The central idea is that wherever a binding is unwrapped by an operation the language attempts to find a value for the name. Bindings are unwrapped in the following situations:
		`@@ -0,0 +1,4 @@`
							`Table of abbreviations:`

							`- CPS: Continuation passing style, a technique of transferring control to a function in a lazy language by passing the rest of the current function in a lambda.`
`@@ -78,4 +78,3 @@ Originally, Orchid was meant to have a type system that used Orchid itself to bu`
	`### Alternatives`	`### Alternatives`

	During initial testing of the working version, I found that the most common kind of programming error in lambda calculus appears to be arity mismatch or syntax error that results in arity mismatch. Without any kind of type checking this is especially difficult to debug as every function looks the same. This can be addressed with a much simpler type system similar to System-F. Any such type checker would have to be constructed so as to only verify user-provided information regarding the arity of functions without attempting to find the arity of every expression, since System-F is strongly normalising and Orchid like any general purpose language supports potentially infinite loops.	During initial testing of the working version, I found that the most common kind of programming error in lambda calculus appears to be arity mismatch or syntax error that results in arity mismatch. Without any kind of type checking this is especially difficult to debug as every function looks the same. This can be addressed with a much simpler type system similar to System-F. Any such type checker would have to be constructed so as to only verify user-provided information regarding the arity of functions without attempting to find the arity of every expression, since System-F is strongly normalising and Orchid like any general purpose language supports potentially infinite loops.
`@@ -1,4 +1,4 @@`
	`## Given (formerly Auto)`	`### Given (formerly Auto)`

	`given` bindings have the form `@Name:type. body`. Either the `Name` or the `:type` part can be optional but at least one is required. The central idea is that wherever a binding is unwrapped by an operation the language attempts to find a value for the name. Bindings are unwrapped in the following situations:	`given` bindings have the form `@Name:type. body`. Either the `Name` or the `:type` part can be optional but at least one is required. The central idea is that wherever a binding is unwrapped by an operation the language attempts to find a value for the name. Bindings are unwrapped in the following situations: