forked from Orchid/orchid
Final commit before submission
This commit is contained in:
317
README.md
317
README.md
@@ -1,314 +1,5 @@
|
|||||||
Orchid will be a compiled functional language with a powerful macro
|
All you need to run the project is a nighly rust toolchain. Go to one of the folders within `examples` and run
|
||||||
language and optimizer.
|
|
||||||
|
|
||||||
# Examples
|
```sh
|
||||||
|
cargo run -- -p .
|
||||||
Hello World in Orchid
|
```
|
||||||
```orchid
|
|
||||||
import std::io::(println, out)
|
|
||||||
|
|
||||||
main := println out "Hello World!"
|
|
||||||
```
|
|
||||||
|
|
||||||
Basic command line calculator
|
|
||||||
```orchid
|
|
||||||
import std::io::(readln, printf, in, out)
|
|
||||||
|
|
||||||
main := (
|
|
||||||
readln in >>= int |> \a.
|
|
||||||
readln in >>= \op.
|
|
||||||
readln in >>= int |> \b.
|
|
||||||
printf out "the result is {}\n", [match op (
|
|
||||||
"+" => a + b,
|
|
||||||
"-" => a - b,
|
|
||||||
"*" => a * b,
|
|
||||||
"/" => a / b
|
|
||||||
)]
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
Grep
|
|
||||||
```orchid
|
|
||||||
import std::io::(readln, println, in, out, getarg)
|
|
||||||
|
|
||||||
main := loop \r. (
|
|
||||||
readln in >>= \line.
|
|
||||||
if (substring (getarg 1) line)
|
|
||||||
then (println out ln >>= r)
|
|
||||||
else r
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
Filter through an arbitrary collection
|
|
||||||
```orchid
|
|
||||||
filter := @C:Type -> Type. @:Map C. @T. \f:T -> Bool. \coll:C T. (
|
|
||||||
coll >> \el. if (f el) then (Some el) else Nil
|
|
||||||
):(C T)
|
|
||||||
```
|
|
||||||
|
|
||||||
# Explanation
|
|
||||||
|
|
||||||
This explanation is not a tutorial. It follows a constructive order,
|
|
||||||
gradually introducing language features to better demonstrate their
|
|
||||||
purpose. It also assumes that the reader is familiar with functional
|
|
||||||
programming.
|
|
||||||
|
|
||||||
## Lambda calculus recap
|
|
||||||
|
|
||||||
The language is almost entirely based on lambda calculus, so everything
|
|
||||||
is immutable and evaluation is lazy. The following is an anonymous
|
|
||||||
function that takes an integer argument and multiplies it by 2:
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
\x:int. imul 2 x
|
|
||||||
```
|
|
||||||
|
|
||||||
Multiple parameters are represented using currying, so the above is
|
|
||||||
equivalent to
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
imul 2
|
|
||||||
```
|
|
||||||
|
|
||||||
Recursion is accomplished using the Y combinator (called `loop`), which
|
|
||||||
is a function that takes a function as its single parameter and applies
|
|
||||||
it to itself. A naiive implementation of `imul` might look like this.
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
\a:int.\b:int. loop \r. (\i.
|
|
||||||
ifthenelse (ieq i 0)
|
|
||||||
b
|
|
||||||
(iadd b (r (isub i 1))
|
|
||||||
) a
|
|
||||||
```
|
|
||||||
|
|
||||||
`ifthenelse` takes a boolean as its first parameter and selects one of the
|
|
||||||
following two expressions (of identical type) accordingly. `ieq`, `iadd`
|
|
||||||
and `isub` are self explanatory.
|
|
||||||
|
|
||||||
## Auto parameters (generics, polymorphism)
|
|
||||||
|
|
||||||
Although I didin't specify the type of `i` in the above example, it is
|
|
||||||
known at compile time because the recursion is applied to `a` which is an
|
|
||||||
integer. I could have omitted the second argument, then I would have
|
|
||||||
had to specify `i`'s type as an integer, because for plain lambda
|
|
||||||
expressions all types have to be statically known at compile time.
|
|
||||||
|
|
||||||
Polymorphism is achieved using parametric constructs called auto
|
|
||||||
parameters. An auto parameter is a placeholder filled in during
|
|
||||||
compilation, syntactically remarkably similar to lambda expressions:
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
@T. --[ body of expression referencing T ]--
|
|
||||||
```
|
|
||||||
|
|
||||||
Autos have two closely related uses. First, they are used to represent
|
|
||||||
generic type parameters. If an auto is used as the type of an argument
|
|
||||||
or some other subexpression that can be trivially deduced from the calling
|
|
||||||
context, it is filled in.
|
|
||||||
|
|
||||||
The second usage of autos is for constraints, if they have a type that
|
|
||||||
references another auto. Because these parameters are filled in by the
|
|
||||||
compiler, referencing them is equivalent to the statement that a default
|
|
||||||
value assignable to the specified type exists. Default values are declared
|
|
||||||
explicitly and identified by their type, where that type itself may be
|
|
||||||
parametric and may specify its own constraints which are resolved
|
|
||||||
recursively. If the referenced default is itself a useful value or
|
|
||||||
function you can give it a name and use it as such, but you can also omit
|
|
||||||
the name, using the default as a hint to the compiler to be able to call
|
|
||||||
functions that also have defaults of the same types, or possibly other
|
|
||||||
types whose defaults have implmentations based on your defaults.
|
|
||||||
|
|
||||||
For a demonstration, here's a sample implementation of the Option monad.
|
|
||||||
```orchid
|
|
||||||
--[[ The definition of Monad ]]--
|
|
||||||
define Monad $M:(Type -> Type) as (Pair
|
|
||||||
(@T. @U. (T -> M U) -> M T -> M U) -- bind
|
|
||||||
(@T. T -> M T) -- return
|
|
||||||
)
|
|
||||||
|
|
||||||
bind := @M:Type -> Type. @monad:Monad M. fst monad
|
|
||||||
return := @M:Type -> Type. @monad:Monad M. snd monad
|
|
||||||
|
|
||||||
--[[ The definition of Option ]]--
|
|
||||||
define Option $T as @U. U -> (T -> U) -> U
|
|
||||||
--[ Constructors ]--
|
|
||||||
export Some := @T. \data:T. categorise @(Option T) ( \default. \map. map data )
|
|
||||||
export None := @T. categorise @(Option T) ( \default. \map. default )
|
|
||||||
--[ Implement Monad ]--
|
|
||||||
impl Monad Option via (makePair
|
|
||||||
( @T. @U. \f:T -> U. \opt:Option T. opt None \x. Some f ) -- bind
|
|
||||||
Some -- return
|
|
||||||
)
|
|
||||||
--[ Sample function that works on unknown monad to demonstrate HKTs.
|
|
||||||
Turns (Option (M T)) into (M (Option T)), "raising" the unknown monad
|
|
||||||
out of the Option ]--
|
|
||||||
export raise := @M:Type -> Type. @T. @:Monad M. \opt:Option (M T). (
|
|
||||||
opt (return None) (\m. bind m (\x. Some x))
|
|
||||||
):(M (Option T))
|
|
||||||
```
|
|
||||||
|
|
||||||
Typeclasses may be implmented in any module that also defines at least one of
|
|
||||||
the types in the definition, which includes both the type of the
|
|
||||||
expression and the types of its auto parameters. They always have a name,
|
|
||||||
which can be used to override known defaults with which your definiton
|
|
||||||
may overlap. For example, if addition is defined elementwise for all
|
|
||||||
applicative functors, the author of List might want for concatenation to
|
|
||||||
take precedence in the case where all element types match. Notice how
|
|
||||||
Add has three arguments, two are the types of the operands and one is
|
|
||||||
the result:
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
impl @T. Add (List T) (List T) (List T) by concatListAdd over elementwiseAdd via (
|
|
||||||
...
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
For completeness' sake, the original definition might look like this:
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
impl
|
|
||||||
@C:Type -> Type. @T. @U. @V. -- variables
|
|
||||||
@:(Applicative C). @:(Add T U V). -- conditions
|
|
||||||
Add (C T) (C U) (C V) -- target
|
|
||||||
by elementwiseAdd via (
|
|
||||||
...
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
With the use of autos, here's what the recursive multiplication
|
|
||||||
implementation looks like:
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
impl @T. @:(Add T T T). Multiply T int T by iterativeMultiply via (
|
|
||||||
\a:int. \b:T. loop \r. (\i.
|
|
||||||
ifthenelse (ieq i 0)
|
|
||||||
b
|
|
||||||
(add b (r (isub i 1)) -- notice how iadd is now add
|
|
||||||
) a
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
This could then be applied to any type that's closed over addition
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
aroundTheWorldLyrics := (
|
|
||||||
mult 18 (add (mult 4 "Around the World\n") "\n")
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
For my notes on the declare/impl system, see [notes/type_system]
|
|
||||||
|
|
||||||
## Preprocessor
|
|
||||||
|
|
||||||
The above code samples have one notable difference from the Examples
|
|
||||||
section above; they're ugly and hard to read. The solution to this is a
|
|
||||||
powerful preprocessor which is used internally to define all sorts of
|
|
||||||
syntax sugar from operators to complex syntax patterns and even pattern
|
|
||||||
matching, and can also be used to define custom syntax. The preprocessor
|
|
||||||
reads the source as an S-tree while executing substitution rules which
|
|
||||||
have a real numbered priority.
|
|
||||||
|
|
||||||
In the following example, seq matches a list of arbitrary tokens and its
|
|
||||||
parameter is the order of resolution. The order can be used for example to
|
|
||||||
make sure that `if a then b else if c then d else e` becomes
|
|
||||||
`(ifthenelse a b (ifthenelse c d e))` and not
|
|
||||||
`(ifthenelse a b if) c then d else e`. It's worth highlighting here that
|
|
||||||
preprocessing works on the typeless AST and matchers are constructed
|
|
||||||
using inclusion rather than exclusion, so it would not be possible to
|
|
||||||
selectively allow the above example without enforcing that if-statements
|
|
||||||
are searched back-to-front. If order is still a problem, you can always
|
|
||||||
parenthesize subexpressions at the callsite.
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
(..$pre:2 if ...$cond then ...$true else ...$false) =10=> (
|
|
||||||
..$pre
|
|
||||||
(ifthenelse (...$cond) (...$true) (...$false))
|
|
||||||
)
|
|
||||||
...$a + ...$b =2=> (add (...$a) (...$b))
|
|
||||||
...$a = ...$b =5=> (eq $a $b)
|
|
||||||
...$a - ...$b =2=> (sub (...$a) (...$b))
|
|
||||||
```
|
|
||||||
|
|
||||||
The recursive addition function now looks like this
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
impl @T. @:(Add T T T). Multiply T int T by iterativeMultiply via (
|
|
||||||
\a:int.\b:T. loop \r. (\i.
|
|
||||||
if (i = 0) then b
|
|
||||||
else (b + (r (i - 1)))
|
|
||||||
) a
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Traversal using carriages
|
|
||||||
|
|
||||||
While it may not be immediately apparent, these substitution rules are
|
|
||||||
actually Turing complete. They can be used quite intuitively to traverse
|
|
||||||
the token tree with unique "carriage" symbols that move according to their
|
|
||||||
environment and can carry structured data payloads.
|
|
||||||
|
|
||||||
Here's an example of a carriage being used to turn a square-bracketed
|
|
||||||
list expression into a lambda expression that matches a conslist. Notice
|
|
||||||
how the square brackets pair up, as all three variants of brackets
|
|
||||||
are considered branches in the S-tree rather than individual tokens.
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
-- Initial step, eliminates entry condition (square brackets) and constructs
|
|
||||||
-- carriage and other working symbols
|
|
||||||
[...$data:1] =1000.1=> (cons_start ...$data cons_carriage(none))
|
|
||||||
-- Shortcut with higher priority
|
|
||||||
[] =1000.5=> none
|
|
||||||
-- Step
|
|
||||||
, $item cons_carriage($tail) =1000.1=> cons_carriage((some (cons $item $tail)))
|
|
||||||
-- End, removes carriage and working symbols and leaves valid source code
|
|
||||||
cons_start $item cons_carriage($tail) =1000.1=> some (cons $item $tail)
|
|
||||||
-- Low priority rules should turn leftover symbols into errors.
|
|
||||||
cons_start =0=> cons_err
|
|
||||||
cons_carriage($data) =0=> cons_err
|
|
||||||
cons_err =0=> (macro_error "Malformed conslist expression")
|
|
||||||
-- macro_error will probably have its own rules for composition and
|
|
||||||
-- bubbling such that the output for an erratic expression would be a
|
|
||||||
-- single macro_error to be decoded by developer tooling
|
|
||||||
```
|
|
||||||
(an up-to-date version of this example can be found in the examples
|
|
||||||
folder)
|
|
||||||
|
|
||||||
Another thing to note is that although it may look like cons_carriage is
|
|
||||||
a global string, it's in fact namespaced to whatever file provides the
|
|
||||||
macro. Symbols can be exported either by prefixing the pattern with
|
|
||||||
`export` or separately via the following syntax if no single rule is
|
|
||||||
equipped to dictate the exported token set.
|
|
||||||
|
|
||||||
```orchid
|
|
||||||
export ::(some_name, other_name)
|
|
||||||
```
|
|
||||||
|
|
||||||
# Module system
|
|
||||||
|
|
||||||
Files are the smallest unit of namespacing, automatically grouped into
|
|
||||||
folders and forming a tree the leaves of which are the actual symbols. An
|
|
||||||
exported symbol is a name referenced in an exported substitution rule
|
|
||||||
or assigned to an exported function. Imported symbols are considered
|
|
||||||
identical to the same symbol directly imported from the same module for
|
|
||||||
the purposes of substitution. The module syntax is very similar to
|
|
||||||
Rust's, and since each token gets its own export with most rules
|
|
||||||
comprising several local symbols, the most common import option is
|
|
||||||
probably ::* (import all).
|
|
||||||
|
|
||||||
# Optimization
|
|
||||||
|
|
||||||
This is very far away so I don't want to make promises, but I have some
|
|
||||||
ideas.
|
|
||||||
|
|
||||||
- [ ] early execution of functions on any subset of their arguments where
|
|
||||||
it could provide substantial speedup
|
|
||||||
- [ ] tracking copies of expressions and evaluating them only once
|
|
||||||
- [ ] Many cases of single recursion converted to loops
|
|
||||||
- [ ] tail recursion
|
|
||||||
- [ ] 2 distinct loops where the tail doesn't use the arguments
|
|
||||||
- [ ] reorder operations to favour this scenario
|
|
||||||
- [ ] reactive calculation of values that are deemed to be read more often
|
|
||||||
than written
|
|
||||||
- [ ] automatic profiling based on performance metrics generated by debug
|
|
||||||
builds
|
|
||||||
4
notes/papers/report/parts/abbreviations.md
Normal file
4
notes/papers/report/parts/abbreviations.md
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
Table of abbreviations:
|
||||||
|
|
||||||
|
- **CPS**: Continuation passing style, a technique of transferring control to a function in a lazy language by passing the rest of the current function in a lambda.
|
||||||
|
|
||||||
5
notes/papers/report/parts/ethics.md
Normal file
5
notes/papers/report/parts/ethics.md
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
# Statement of Ethics
|
||||||
|
|
||||||
|
People other than the author, living creatures or experiments on infrastructure were not involved in the project, so the principles of **do no harm**, **confidentiality of data** and **informed consent** are not relevant.
|
||||||
|
|
||||||
|
As a language developer, my **social responsibility** is to build reliable languages. Orchid is a tool in service of whatever goal the programmer has in mind.
|
||||||
@@ -78,4 +78,3 @@ Originally, Orchid was meant to have a type system that used Orchid itself to bu
|
|||||||
### Alternatives
|
### Alternatives
|
||||||
|
|
||||||
During initial testing of the working version, I found that the most common kind of programming error in lambda calculus appears to be arity mismatch or syntax error that results in arity mismatch. Without any kind of type checking this is especially difficult to debug as every function looks the same. This can be addressed with a much simpler type system similar to System-F. Any such type checker would have to be constructed so as to only verify user-provided information regarding the arity of functions without attempting to find the arity of every expression, since System-F is strongly normalising and Orchid like any general purpose language supports potentially infinite loops.
|
During initial testing of the working version, I found that the most common kind of programming error in lambda calculus appears to be arity mismatch or syntax error that results in arity mismatch. Without any kind of type checking this is especially difficult to debug as every function looks the same. This can be addressed with a much simpler type system similar to System-F. Any such type checker would have to be constructed so as to only verify user-provided information regarding the arity of functions without attempting to find the arity of every expression, since System-F is strongly normalising and Orchid like any general purpose language supports potentially infinite loops.
|
||||||
|
|
||||||
|
|||||||
@@ -6,14 +6,10 @@ In addition the handling of all syntax sugar is delegated to the compiler. This
|
|||||||
|
|
||||||
**Syntax-level metaprogramming:** [Template Haskell][th1] is Haskell's tool for syntax-level macros. I learned about it after I built Orchid, and it addresses a lot of my problems.
|
**Syntax-level metaprogramming:** [Template Haskell][th1] is Haskell's tool for syntax-level macros. I learned about it after I built Orchid, and it addresses a lot of my problems.
|
||||||
|
|
||||||
[th1]: https://wiki.haskell.org/Template_Haskell
|
[th1]: ./literature/macros.md
|
||||||
|
|
||||||
**Type system:** Haskell's type system is very powerful but to be able to represent some really interesting structures it requires a long list of GHC extensions to be enabled which in turn make typeclass implementation matching undecidable and the heuristic rather bad (understandably so, it was clearly not designed for that; it wasn't really even designed to be a heuristic).
|
**Type system:** Haskell's type system is very powerful but to be able to represent some really interesting structures it requires a long list of GHC extensions to be enabled which in turn make typeclass implementation matching undecidable and the heuristic rather bad (understandably so, it was clearly not designed for that).
|
||||||
|
|
||||||
My plan for Orchid was to use Orchid itself as a type system as well; rather than aiming for a decidable type system and then extending it until it [inevitably][tc1] [becomes][tc2] [turing-complete][tc3], my type-system would be undecidable from the start and progress would point towards improving the type checker to recognize more and more cases.
|
My plan for Orchid was to use Orchid itself as a type system as well; rather than aiming for a decidable type system and then extending it until it inevitably becomes turing-complete [1][2][3], my type-system would be undecidable from the start and progress would point towards improving the type checker to recognize more and more cases.
|
||||||
|
|
||||||
[tc1]: https://en.cppreference.com/w/cpp/language/template_metaprogramming
|
|
||||||
[tc2]: https://blog.rust-lang.org/2022/10/28/gats-stabilization.html
|
|
||||||
[tc3]: https://wiki.haskell.org/Type_SK
|
|
||||||
|
|
||||||
A description of the planned type system is available in [[type_system/+index|Appendix T]]
|
A description of the planned type system is available in [[type_system/+index|Appendix T]]
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
# Interner
|
## Interner
|
||||||
|
|
||||||
To fix a very serious performance problem with the initial POC, all tokens and all namespaced names in Orchid are interned.
|
To fix a very serious performance problem with the initial POC, all tokens and all namespaced names in Orchid are interned.
|
||||||
|
|
||||||
@@ -8,36 +8,36 @@ For the sake of simplicity in Rust it is usually done by replacing Strings with
|
|||||||
|
|
||||||
Interning is of course not limited to strings, but one has to be careful in applying it to distinct concepts as the lifetimes of every single interned thing are tied together, and sometimes the added constraints and complexity aren't worth the performance improvements. Orchid's interner is completely type-agnostic so that the possibility is there. The interning of Orchid string literals is on the roadmap hawever.
|
Interning is of course not limited to strings, but one has to be careful in applying it to distinct concepts as the lifetimes of every single interned thing are tied together, and sometimes the added constraints and complexity aren't worth the performance improvements. Orchid's interner is completely type-agnostic so that the possibility is there. The interning of Orchid string literals is on the roadmap hawever.
|
||||||
|
|
||||||
## Initial implementation
|
### Initial implementation
|
||||||
|
|
||||||
Initially, the interner used Lasso, which is an established string interner with a wide user base.
|
Initially, the interner used Lasso, which is an established string interner with a wide user base.
|
||||||
|
|
||||||
### Singleton
|
#### Singleton
|
||||||
|
|
||||||
A string interner is inherently a memory leak, so making it static would have likely proven problematic in the future. At the same time, magic strings should be internable by any function with or without access to the interner since embedders of Orchid should be able to reference concrete names in their Rust code conveniently. To get around these constraints, the [[oss#static_init|static_init]] crate was used to retain a global singleton instance of the interner and intern magic strings with it. After the first non-static instance of the interner is created, the functions used to interact with the singleton would panic. I also tried using the iconic lazy_static crate, but unfortunately it evaluates the expressions upon first dereference which for functions that take an interner as parameter is always after the creation of the first non-static interner.
|
A string interner is inherently a memory leak, so making it static would have likely proven problematic in the future. At the same time, magic strings should be internable by any function with or without access to the interner since embedders of Orchid should be able to reference concrete names in their Rust code conveniently. To get around these constraints, the [[oss#static_init|static_init]] crate was used to retain a global singleton instance of the interner and intern magic strings with it. After the first non-static instance of the interner is created, the functions used to interact with the singleton would panic. I also tried using the iconic lazy_static crate, but unfortunately it evaluates the expressions upon first dereference which for functions that take an interner as parameter is always after the creation of the first non-static interner.
|
||||||
|
|
||||||
### The Interner Trait
|
#### The Interner Trait
|
||||||
|
|
||||||
The interner supported exchanging strings or sequences of tokens for tokens. To avoid accidentally comparing the token for a string with the token for a string sequence, or attempting to resolve a token referring to a string sequence as a string, the tokens have a rank, encoded as a dependent type parameter. Strings are exchanged for tokens of rank 0, and sequences of tokens of rank N are exchanged for tokens of rank N+1.
|
The interner supported exchanging strings or sequences of tokens for tokens. To avoid accidentally comparing the token for a string with the token for a string sequence, or attempting to resolve a token referring to a string sequence as a string, the tokens have a rank, encoded as a dependent type parameter. Strings are exchanged for tokens of rank 0, and sequences of tokens of rank N are exchanged for tokens of rank N+1.
|
||||||
|
|
||||||
### Lasso shim
|
#### Lasso shim
|
||||||
|
|
||||||
Because the type represented by a token is statically guaranteed, we can fearlessly store differently encoded values together without annotation. Thanks to this, strings can simply be forwarded to lasso without overhead. Token sequences are more problematic because the data is ultimately a sequence of numbers and we can't easily assert that they will constitute a valid utf8 string. My temporary solution was to encode the binary data in base64.
|
Because the type represented by a token is statically guaranteed, we can fearlessly store differently encoded values together without annotation. Thanks to this, strings can simply be forwarded to lasso without overhead. Token sequences are more problematic because the data is ultimately a sequence of numbers and we can't easily assert that they will constitute a valid utf8 string. My temporary solution was to encode the binary data in base64.
|
||||||
|
|
||||||
## Revised implementation
|
### Revised implementation
|
||||||
|
|
||||||
The singleton ended completely defunct because `static_init` apparently also evaluates init expressions on first dereference. Fixing this issue was a good occasion to come up with a better design for the interner.
|
The singleton ended completely defunct because `static_init` apparently also evaluates init expressions on first dereference. Fixing this issue was a good occasion to come up with a better design for the interner.
|
||||||
|
|
||||||
### monotype
|
#### monotype
|
||||||
|
|
||||||
The logic for interning itself is encapsulated by a `monotype` struct. This stores values of a single homogenous type using a hashmap for value->token lookup and a vector for token->value lookup. It is based on, although considerably simpler than Lasso.
|
The logic for interning itself is encapsulated by a `monotype` struct. This stores values of a single homogenous type using a hashmap for value->token lookup and a vector for token->value lookup. It is based on, although considerably simpler than Lasso.
|
||||||
|
|
||||||
### polytype
|
#### polytype
|
||||||
|
|
||||||
The actual Interner stores a `HashMap<typeid, Box<dyn Any>>`, which is essentially a store of values of unique type keyed by the type. The values in this case are monotype interners.
|
The actual Interner stores a `HashMap<typeid, Box<dyn Any>>`, which is essentially a store of values of unique type keyed by the type. The values in this case are monotype interners.
|
||||||
|
|
||||||
Unlike the naiive initial implementation, this version also operates on references, so interning and externing values causes no unnecessary copying and heap allocations.
|
Unlike the naiive initial implementation, this version also operates on references, so interning and externing values causes no unnecessary copying and heap allocations.
|
||||||
|
|
||||||
## The InternedDisplay Trait
|
### The InternedDisplay Trait
|
||||||
|
|
||||||
For refined error reporting most structures derive `Debug` and also implement `Display`. In most cases where the structure at hand describes code of some kind, `Display` attempts to print a fragment of valid code. With every name in the codebase interned this is really difficult because interner tokens can't be resolved from `Display` implementations. To solve this, a new trait was defined called `InternedDisplay` which has the same surface as `Display` except for the fact that `fmt`'s mirror image also takes an additional reference to Interner. The syntax sugar for string formatting is in this way unfortunately lost, but the functionality and the division of responsibilities remains.
|
For refined error reporting most structures derive `Debug` and also implement `Display`. In most cases where the structure at hand describes code of some kind, `Display` attempts to print a fragment of valid code. With every name in the codebase interned this is really difficult because interner tokens can't be resolved from `Display` implementations. To solve this, a new trait was defined called `InternedDisplay` which has the same surface as `Display` except for the fact that `fmt`'s mirror image also takes an additional reference to Interner. The syntax sugar for string formatting is in this way unfortunately lost, but the functionality and the division of responsibilities remains.
|
||||||
13
notes/papers/report/parts/interpreter.md
Normal file
13
notes/papers/report/parts/interpreter.md
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
## Interpreter
|
||||||
|
|
||||||
|
The Orchid interpreter exposes one main function called `run`. This function takes an expression to reduce and the symbol table returned by the pipeline and processed by the macro repository. It's also possible to specify a reduction step limit to make sure the function returns in a timely manner.
|
||||||
|
|
||||||
|
### Interfacing with an embedder
|
||||||
|
|
||||||
|
An embedding application essentially interacts with Orchid by way of queries, that is, it invokes the interpreter with a prepared function call. The Orchid code then replies with a return value, which the embedder can either read directly or use as a component in subsequent questions, and so the conversation develops. All communication is initiated, regulated and the conclusions executed entirely by the embedder.
|
||||||
|
|
||||||
|
Although external functions are exposed to Orchid and they can be called at any time (within a computation), they are expected to be pure and any calls to them may be elided by the optimizer if it can deduce the return value from precedent or circumstances.
|
||||||
|
|
||||||
|
One common way to use a query API is to define a single query that is conceptually equivalent to "What would you like to do?" and a set of valid answers which each incorporate some way to pass data through to the next (identical) query. HTTP does this, historically client state was preserved in cookies and pre-filled form inputs, later with client-side Javascript and LocalStorage.
|
||||||
|
|
||||||
|
Orchid offers a way to do this using the `Handler` trait and the `run_handler` function which is the interpreter's second important export. Essentially, this trait offers a way to combine functions that match and process various types implmeenting `Atomic`. This allows embedders to specify an API where external functions return special, inert `Atomic` instances corresponding to environmental actions the code can take, each of which also carries the continuation of the logic. This is a variation of continuation passing style, a common way of encoding effects in pure languages. It is inspired by algebraic effects
|
||||||
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
Orchid is a lazy, pure functional programming language with an execution model inspired by Haskell and a powerful syntax-level preprocessor for encoding rich DSLs that adhere to the language's core guarantees.
|
Orchid is a lazy, pure functional programming language with an execution model inspired by Haskell and a powerful syntax-level preprocessor for encoding rich DSLs that adhere to the language's core guarantees.
|
||||||
|
|
||||||
# Immutability
|
## Immutability
|
||||||
|
|
||||||
The merits of pure functional code are well known, but I would like to highlight some of them that are particularly relevant in the case of Orchid;
|
The merits of pure functional code are well known, but I would like to highlight some of them that are particularly relevant in the case of Orchid;
|
||||||
|
|
||||||
@@ -10,11 +10,11 @@ The merits of pure functional code are well known, but I would like to highlight
|
|||||||
|
|
||||||
- **Self-containment** Arguments to the current toplevel function are all completely self-contained expressions, which means that they can be serialized and sent over the network provided that an equivalent for all atoms and externally defined functions exists on both sides, which makes Orchid a prime query language.
|
- **Self-containment** Arguments to the current toplevel function are all completely self-contained expressions, which means that they can be serialized and sent over the network provided that an equivalent for all atoms and externally defined functions exists on both sides, which makes Orchid a prime query language.
|
||||||
> **note**
|
> **note**
|
||||||
> Although this is possible using Javascript's `Function` constructor, it is a catastrophic security vulnerability since code sent this way can access all host APIs. In the case of Orchid it is not only perfectly safe from an information access perspective since all references are bound on the sender side and undergo explicit translation, but also from a computational resource perspective since external functions allow the recipient to apply step limits (gas) to the untrusted expression, interleave it with local tasks, and monitor its size and memory footprint.
|
> Although this is possible using Javascript's `Function` constructor, it is a catastrophic security vulnerability since code sent this way can access all host APIs. In the case of Orchid it is not only perfectly safe from an information access perspective since all references are bound on the sender side and undergo explicit translation, but also from a computational resource perspective since the recipient can apply step limits to the untrusted expression, interleave it with local tasks, and monitor its size and memory footprint.
|
||||||
|
|
||||||
- **reentrancy** in low reliability environments it is common to run multiple instances of an algorithm in parallel and regularly compare and correct their state using some form of consensus. In an impure language this must be done explicitly and mistakes can result in divergence. In a pure language the executor can be configured to check its state with others every so many steps.
|
- **reentrancy** in low reliability environments it is common to run multiple instances of an algorithm in parallel and regularly compare and correct their state using some form of consensus. In an impure language this must be done explicitly and mistakes can result in divergence. In a pure language the executor can be configured to check its state with others every so many steps.
|
||||||
|
|
||||||
# Laziness
|
## Laziness
|
||||||
|
|
||||||
Reactive programming is an increasingly popular paradigm for enabling systems to interact with changing state without recomputing subresults that have not been modified. It is getting popular despite the fact that enabling this kind of programming in classical languages - most notably javascript, where it appears to be the most popular - involves lots of boilerplate and complicated constructs using many many lambda functions. In a lazy language this is essentially the default.
|
Reactive programming is an increasingly popular paradigm for enabling systems to interact with changing state without recomputing subresults that have not been modified. It is getting popular despite the fact that enabling this kind of programming in classical languages - most notably javascript, where it appears to be the most popular - involves lots of boilerplate and complicated constructs using many many lambda functions. In a lazy language this is essentially the default.
|
||||||
|
|
||||||
31
notes/papers/report/parts/literature/effects.md
Normal file
31
notes/papers/report/parts/literature/effects.md
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
https://www.unison-lang.org/learn/fundamentals/abilities/
|
||||||
|
|
||||||
|
An excellent description of algebraic effects that lead me to understand how they work and why they present an alternative to monads.
|
||||||
|
|
||||||
|
Algebraic effects essentially associate a set of special types representing families of requests to a function that it may return other than its own return type. Effects usually carry a thunk or function to enable resuming normal processing, and handlers usually respond to the requests represented by the effects by implementing them on top of other effects such as IO. The interesting part to me is that all of this is mainly just convention, so algebraic effects provide type system support for expressing arbitrary requests using CPS.
|
||||||
|
|
||||||
|
Although Orchid doesn't have a type system, CPS is a straightforward way to express side effects
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
https://github.com/zesterer/tao
|
||||||
|
|
||||||
|
The first place where I encountered algebraic effects, otherwise a very interesting language that I definitely hope to adopt features from in the future
|
||||||
|
|
||||||
|
Tao is made by the same person who created Chumsky, the parser combinator used in Orchid. It demonstrates a lot of intersting concepts, its pattern matching is one of a kind. The language is focused mostly on static verification and efficiency neither of which are particularly strong points of Orchid, but some of its auxiliary features are interesting to an untyped, interpreted language too. One of these is generic effects.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
https://wiki.haskell.org/All_About_Monads#A_Catalog_of_Standard_Monads
|
||||||
|
|
||||||
|
Originally, I intended to have dedicated objects for all action types, and transformations similar to Haskell's monad functions.
|
||||||
|
|
||||||
|
A monad is a container that can store any type and supports three key operations:
|
||||||
|
|
||||||
|
1. Constructing a new instance of the container around a value
|
||||||
|
2. Flattening an instance of the container that contains another instance of it into a single container of the inner nested value
|
||||||
|
3. applying a transformation to the value inside the container that produces a different type
|
||||||
|
|
||||||
|
The defining characteristic of monads is that whether and when the transformations are applied is flexible since information can't easily leave the monad.
|
||||||
|
|
||||||
|
This system is extremely similar to effects, and at least in an untyped context they're essentially equally powerful. I opted for effects because their defaults seem more sensible.
|
||||||
34
notes/papers/report/parts/literature/macros.md
Normal file
34
notes/papers/report/parts/literature/macros.md
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
https://doc.rust-lang.org/reference/macros-by-example.html
|
||||||
|
|
||||||
|
Rust's macro system was both an invaluable tool and an example while defining Orchid's macros.
|
||||||
|
|
||||||
|
Rust supports declarative macros in what they call "macros by example". These use a state machine-like simplistic parser model to match tokens within the strictly bounded parameter tree. Most notably, Rust's declarative macros don't support any kind of backtracking. They are computationally equivalent to a finite state machine.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
https://wiki.haskell.org/Template_Haskell
|
||||||
|
|
||||||
|
Template haskell is haskell's macro system that I learned about a little bit too late.
|
||||||
|
|
||||||
|
Throughout this project I was under the impression that Haskell didn't support macros at all, as I didn't discover template haskell until very recently. It is a fairly powerful system, although like Rust's macros their range is bounded, so they can hardly be used to define entirely new syntax. There also seem to be a lot of technical limitations due to this feature not being a priority to GHC.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0707r4.pdf
|
||||||
|
https://www.youtube.com/watch?v=4AfRAVcThyA
|
||||||
|
|
||||||
|
This paper and the corresponding CppCon talk motivated me to research more natural, integrated forms of metaprogramming.
|
||||||
|
|
||||||
|
The paper describes a way to define default behaviour for user-defined groups of types extending the analogy of enums, structs and classes using a compile-time evaluated function that processes a parameter describing the contents of a declaration. It is the first metaprogramming system I encountered that intended to write meta-programs entirely inline, using the same tools the value-level program uses.
|
||||||
|
|
||||||
|
This eventually lead to the concept of macros over fully namespaced tokens.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2392r0.pdf
|
||||||
|
https://www.youtube.com/watch?v=raB_289NxBk
|
||||||
|
|
||||||
|
This paper and the corresponding CppCon talk demonstrate a very intersting syntax extension to C++.
|
||||||
|
|
||||||
|
C++ is historically an object-oriented or procedural language, however in recent standards a significant movement towards declarative, functional patterns manifested. This paper in particular proposes a very deep change to the syntax of the language, an entirely new class of statements that simultaneously bind an arbitrary number of names and return a boolean, that may result in objects being constructed, partially moved and destroyed. The syntax extensions appear very fundamental and yet quite convenient, but what little C++ has in terms of local reasoning suffers. This was interesting and inspirational to me because it demonstrated that considerate syntax extensions can entirely redefine a language, while also reminding about C++'s heritage.
|
||||||
|
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
# Macros
|
## Macros
|
||||||
|
|
||||||
Left-associative unparenthesized function calls are intuitive in the typical case of just applying functions to a limited number of arguments, but they're not very flexible. Haskell solves this problem by defining a diverse array of syntax primitives for individual use cases such as `do` blocks for monadic operations. This system is fairly rigid. In contrast, Rust and Lisp enable library developers to invent their own syntax that intuitively describes the concepts the library at hand encodes. In Orchid's codebase, I defined several macros to streamline tasks like defining functions in Rust that are visible to Orchid, or translating between various intermediate representations.
|
Left-associative unparenthesized function calls are intuitive in the typical case of just applying functions to a limited number of arguments, but they're not very flexible. Haskell solves this problem by defining a diverse array of syntax primitives for individual use cases such as `do` blocks for monadic operations. This system is fairly rigid. In contrast, Rust enables library developers to invent their own syntax that intuitively describes the concepts the library at hand encodes. In Orchid's codebase, I defined several macros to streamline tasks like defining functions in Rust that are visible to Orchid, or translating between various intermediate representations.
|
||||||
|
|
||||||
## Generalized kerning
|
### Generalized kerning
|
||||||
|
|
||||||
In the referenced video essay, a proof of the Turing completeness of generalized kerning is presented. The proof involves encoding a Turing machine in a string and some kerning rules. The state of the machine is next to the read-write head and all previous states are enumerated next to the tape because kerning rules are reversible. The end result looks something like this:
|
In the referenced video essay, a proof of the Turing completeness of generalized kerning is presented. The proof involves encoding a Turing machine in a string and some kerning rules. The state of the machine is next to the read-write head and all previous states are enumerated next to the tape because kerning rules are reversible. The end result looks something like this:
|
||||||
|
|
||||||
@@ -31,7 +31,7 @@ $1 $2 < equals $2 < $1 unless $1 is |
|
|||||||
|
|
||||||
What I really appreciate in this proof is how visual it is; based on this, it's easy to imagine how one would go about encoding a pushdown automaton, lambda calculus or other interesting tree-walking procedures. This is exactly why I based my preprocessor on this system.
|
What I really appreciate in this proof is how visual it is; based on this, it's easy to imagine how one would go about encoding a pushdown automaton, lambda calculus or other interesting tree-walking procedures. This is exactly why I based my preprocessor on this system.
|
||||||
|
|
||||||
## Namespaced tokens
|
### Namespaced tokens
|
||||||
|
|
||||||
Rust macros operate on the bare tokens and therefore are prone to accidental aliasing. Every other item in Rust follows a rigorous namespacing scheme, but macros break this structure, probably because macro execution happens before namespace resolution. The language doesn't suffer too much from this problem, but the relativity of namespacing
|
Rust macros operate on the bare tokens and therefore are prone to accidental aliasing. Every other item in Rust follows a rigorous namespacing scheme, but macros break this structure, probably because macro execution happens before namespace resolution. The language doesn't suffer too much from this problem, but the relativity of namespacing
|
||||||
limits their potential.
|
limits their potential.
|
||||||
|
|||||||
@@ -1,14 +1,14 @@
|
|||||||
# Implementation
|
## Implementation
|
||||||
|
|
||||||
THe optimization of this macro execution algorithm is an interesting challenge with a diverse range of potential optimizations. The current solution is very far from ideal, but it scales to the small experimental workloads I've tried so far and it can accommodate future improvements without any major restructuring.
|
THe optimization of this macro execution algorithm is an interesting challenge with a diverse range of potential optimizations. The current solution is very far from ideal, but it scales to the small experimental workloads I've tried so far and it can accommodate future improvements without any major restructuring.
|
||||||
|
|
||||||
The scheduling of macros is delegated to a unit called the rule repository, while the matching of rules to a given clause sequence is delegated to a unit called the matcher. Other tasks are split out into distinct self-contained functions, but these two have well-defined interfaces and encapsulate data. Constants are processed by the repository one at a time, which means that the data processed by this subsystem typically corresponds to a single struct, function or other top-level source item.
|
The scheduling of macros is delegated to a unit called the rule repository, while the matching of rules to a given clause sequence is delegated to a unit called the matcher. Other tasks are split out into distinct self-contained functions, but these two have well-defined interfaces and encapsulate data. Constants are processed by the repository one at a time, which means that the data processed by this subsystem typically corresponds to a single struct, function or other top-level source item.
|
||||||
|
|
||||||
## keyword dependencies
|
### keyword dependencies
|
||||||
|
|
||||||
The most straightforward optimization is to skip patterns that doesn contain tokens that don't appear in the code at all. This is done by the repository to skip entire rules, but not by the rules on the level of individual slices. This is a possible path of improvement for the future.
|
The most straightforward optimization is to skip patterns that doesn contain tokens that don't appear in the code at all. This is done by the repository to skip entire rules, but not by the rules on the level of individual slices. This is a possible path of improvement for the future.
|
||||||
|
|
||||||
## Matchers
|
### Matchers
|
||||||
|
|
||||||
There are various ways to implement matching. To keep the architecture flexible, the repository is generic over the matcher bounded with a very small trait.
|
There are various ways to implement matching. To keep the architecture flexible, the repository is generic over the matcher bounded with a very small trait.
|
||||||
|
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
## Execution order
|
### Execution order
|
||||||
|
|
||||||
The macros describe several independent sequential programs that are expected to be able to interact with each other. To make debugging easier, the order of execution of internal steps within independent macros has to be relatively static.
|
The macros describe several independent sequential programs that are expected to be able to interact with each other. To make debugging easier, the order of execution of internal steps within independent macros has to be relatively static.
|
||||||
|
|
||||||
@@ -27,30 +27,30 @@ The bands are each an even 32 orders of magnitude, with space in between for fut
|
|||||||
| 224-231 | 232-239 | 240-247 | 248- |
|
| 224-231 | 232-239 | 240-247 | 248- |
|
||||||
| integrations | | | transitional |
|
| integrations | | | transitional |
|
||||||
|
|
||||||
### Transitional states
|
#### Transitional states
|
||||||
|
|
||||||
Transitional states produced and consumed by the same macro program occupy the unbounded top region of the f64 field. Nothing in this range should be written by the user or triggered by an interaction of distinct macro programs, the purpose of this high range is to prevent devices such as carriages from interacting. Any transformation sequence in this range can assume that the tree is inert other than its own operation.
|
Transitional states produced and consumed by the same macro program occupy the unbounded top region of the f64 field. Nothing in this range should be written by the user or triggered by an interaction of distinct macro programs, the purpose of this high range is to prevent devices such as carriages from interacting. Any transformation sequence in this range can assume that the tree is inert other than its own operation.
|
||||||
|
|
||||||
### Integrations
|
#### Integrations
|
||||||
|
|
||||||
Integrations expect an inert syntax tree but at least one token in the pattern is external to the macro program that resolves the rule, so it's critical that all macro programs be in a documented state at the time of resolution.
|
Integrations expect an inert syntax tree but at least one token in the pattern is external to the macro program that resolves the rule, so it's critical that all macro programs be in a documented state at the time of resolution.
|
||||||
|
|
||||||
### Aliases
|
#### Aliases
|
||||||
|
|
||||||
Fragments of code extracted for readability are all at exactly 0x1p800. These may be written by programmers who are not comfortable with macros or metaprogramming. They must have unique single token patterns. Because their priority is higher than any entry point, they can safely contain parts of other macro invocations. They have a single priority number because they can't conceivably require internal ordering adjustments and their usage is meant to be be as straightforward as possible.
|
Fragments of code extracted for readability are all at exactly 0x1p800. These may be written by programmers who are not comfortable with macros or metaprogramming. They must have unique single token patterns. Because their priority is higher than any entry point, they can safely contain parts of other macro invocations. They have a single priority number because they can't conceivably require internal ordering adjustments and their usage is meant to be be as straightforward as possible.
|
||||||
|
|
||||||
### Binding builders
|
#### Binding builders
|
||||||
|
|
||||||
Syntax elements that manipulate bindings should be executed earlier. `do` blocks and (future) `match` statements are good examples of this category. Anything with a lower priority trigger can assume that all names are correctly bound.
|
Syntax elements that manipulate bindings should be executed earlier. `do` blocks and (future) `match` statements are good examples of this category. Anything with a lower priority trigger can assume that all names are correctly bound.
|
||||||
|
|
||||||
### Expressions
|
#### Expressions
|
||||||
|
|
||||||
Things that essentially work like function calls just with added structure, such as `if`/`then`/`else` or `loop`. These are usually just more intuitive custom forms that are otherwise identical to a macro
|
Things that essentially work like function calls just with added structure, such as `if`/`then`/`else` or `loop`. These are usually just more intuitive custom forms that are otherwise identical to a macro
|
||||||
|
|
||||||
### Operators
|
#### Operators
|
||||||
|
|
||||||
Binary and unary operators that process the chunks of text on either side. Within the band, these macros are prioritized in inverse precedence order and apply to the entire range of clauses before and after themselves, to ensure that function calls have the highest perceived priority.
|
Binary and unary operators that process the chunks of text on either side. Within the band, these macros are prioritized in inverse precedence order and apply to the entire range of clauses before and after themselves, to ensure that function calls have the highest perceived priority.
|
||||||
|
|
||||||
### Optimizations
|
#### Optimizations
|
||||||
|
|
||||||
Macros that operate on a fully resolved lambda code and look for known patterns that can be simplified. I did not manage to create a working example of this but for instance repeated string concatenation is a good example.
|
Macros that operate on a fully resolved lambda code and look for known patterns that can be simplified. I did not manage to create a working example of this but for instance repeated string concatenation is a good example.
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
# The pipeline
|
## The pipeline
|
||||||
|
|
||||||
The conversion of Orchid files into a collection of macro rules is a relatively complicated process that took several attempts to get right.
|
The conversion of Orchid files into a collection of macro rules is a relatively complicated process that took several attempts to get right.
|
||||||
|
|
||||||
## Push vs pull logistics
|
### Push vs pull logistics
|
||||||
|
|
||||||
The initial POC implementation of Orchid used pull logistics aka lazy evaluation everywhere. This meant that specially annotated units of computation would only be executed when other units referenced their result. This is a classic functional optimization, but its implementation in Rust had a couple drawbacks; First, lazy evaluation conflicts with most other optimizations, because it's impossible to assert the impact of a function call. Also - although this is probably a problem with my implementation - because the caching wrapper stores a trait object of Fn, every call to a stage is equivalent to a virtual function call which alone is sometimes an excessive penalty. Second, all values must live on the heap and have static lifetimes. Eventually nearly all fields referenced by the pipeline or its stages were wrapped in Rc.
|
The initial POC implementation of Orchid used pull logistics aka lazy evaluation everywhere. This meant that specially annotated units of computation would only be executed when other units referenced their result. This is a classic functional optimization, but its implementation in Rust had a couple drawbacks; First, lazy evaluation conflicts with most other optimizations, because it's impossible to assert the impact of a function call. Also - although this is probably a problem with my implementation - because the caching wrapper stores a trait object of Fn, every call to a stage is equivalent to a virtual function call which alone is sometimes an excessive penalty. Second, all values must live on the heap and have static lifetimes. Eventually nearly all fields referenced by the pipeline or its stages were wrapped in Rc.
|
||||||
|
|
||||||
@@ -10,13 +10,13 @@ Additionally, in a lot of cases lazy evaluation is undesirable. Most programmers
|
|||||||
|
|
||||||
To address these issues, the second iteration only uses pull logistics for the preparsing and file collection phase, and the only errors guaranteed to be produced by this stage are imports from missing files and syntax errors regarding the structure of the S-expressions.
|
To address these issues, the second iteration only uses pull logistics for the preparsing and file collection phase, and the only errors guaranteed to be produced by this stage are imports from missing files and syntax errors regarding the structure of the S-expressions.
|
||||||
|
|
||||||
## Stages
|
### Stages
|
||||||
|
|
||||||
As of writing, the pipeline consists of three main stages; source loading, tree-building and name resolution. These break down into multiple substages.
|
As of writing, the pipeline consists of three main stages; source loading, tree-building and name resolution. These break down into multiple substages.
|
||||||
|
|
||||||
All stages support various ways to introduce blind spots and precomputed values into their processing. This is used to load the standard library, prelude, and possibly externally defined intermediate stages of injected code.
|
All stages support various ways to introduce blind spots and precomputed values into their processing. This is used to load the standard library, prelude, and possibly externally defined intermediate stages of injected code.
|
||||||
|
|
||||||
### Source loading
|
#### Source loading
|
||||||
|
|
||||||
This stage encapsulates pull logistics. It collects all source files that should be included in the compilation in a hashmap keyed by their project-relative path. All subsequent operations are executed on every element of this map unconditionally.
|
This stage encapsulates pull logistics. It collects all source files that should be included in the compilation in a hashmap keyed by their project-relative path. All subsequent operations are executed on every element of this map unconditionally.
|
||||||
|
|
||||||
@@ -33,19 +33,19 @@ Parsing itself is outsourced to a Chumsky parser defined separately. This parser
|
|||||||
|
|
||||||
This information is compiled into a very barebones module representation and returned alongside the loaded source code.
|
This information is compiled into a very barebones module representation and returned alongside the loaded source code.
|
||||||
|
|
||||||
### Tree building
|
#### Tree building
|
||||||
|
|
||||||
This stage aims to collect all modules in a single tree. To achieve this, it re-parses each file with the set of operators collected from the datastructure built during preparsing. The glob imports in the resulting FileEntry lists are eliminated, and the names in the bodies of expressions and macro rules are prefixed with the module path in preparation for macro execution.
|
This stage aims to collect all modules in a single tree. To achieve this, it re-parses each file with the set of operators collected from the datastructure built during preparsing. The glob imports in the resulting FileEntry lists are eliminated, and the names in the bodies of expressions and macro rules are prefixed with the module path in preparation for macro execution.
|
||||||
|
|
||||||
Operator collection can be advised about the exports of injected modules using a callback, and a prelude in the form of a list of line objects - in the shape emitted by the parser - can be injected before the contents of every module to define universally accessible names. Since these lines are processed for every file, it's generally best to just insert a single glob import from a module that defines everything. The interpreter inserts `import prelude::*`.
|
Operator collection can be advised about the exports of injected modules using a callback, and a prelude in the form of a list of line objects - in the shape emitted by the parser - can be injected before the contents of every module to define universally accessible names. Since these lines are processed for every file, it's generally best to just insert a single glob import from a module that defines everything. The interpreter inserts `import prelude::*`.
|
||||||
|
|
||||||
### Import resolution
|
#### Import resolution
|
||||||
|
|
||||||
This stage aims to produce a tree ready for consumption by a macro executor or any other subsystem. It replaces every name originating from imported namespaces in every module with the original name.
|
This stage aims to produce a tree ready for consumption by a macro executor or any other subsystem. It replaces every name originating from imported namespaces in every module with the original name.
|
||||||
|
|
||||||
Injection is supported with a function which takes a path and, if it's valid in the injected tree, returns its original value even if that's the path itself. This is used both to skip resolving names in the injected modules - which are expected to have already been processed using this step - and of course to find the origin of imports from the injected tree.
|
Injection is supported with a function which takes a path and, if it's valid in the injected tree, returns its original value even if that's the path itself. This is used both to skip resolving names in the injected modules - which are expected to have already been processed using this step - and of course to find the origin of imports from the injected tree.
|
||||||
|
|
||||||
## Layered parsing
|
### Layered parsing
|
||||||
|
|
||||||
The most important export of the pipeline is the `parse_layer` function, which acts as a façade over the complex system described above. The environment in which user code runs is bootstrapped using repeated invocations of this function. It has the following options
|
The most important export of the pipeline is the `parse_layer` function, which acts as a façade over the complex system described above. The environment in which user code runs is bootstrapped using repeated invocations of this function. It has the following options
|
||||||
|
|
||||||
@@ -62,6 +62,6 @@ The most important export of the pipeline is the `parse_layer` function, which a
|
|||||||
|
|
||||||
The interpreter sets this to `import prelude::*`. If the embedder defines its own prelude it's a good idea to append it.
|
The interpreter sets this to `import prelude::*`. If the embedder defines its own prelude it's a good idea to append it.
|
||||||
|
|
||||||
### The first layer
|
#### The first layer
|
||||||
|
|
||||||
The other important exports of the pipeline are `ConstTree` and `from_const_tree`. These are used to define a base layer that exposes extern functions. `ConstTree` implements `Add` so distinct libraries of extern functions can be intuitively combined.
|
The other important exports of the pipeline are `ConstTree` and `from_const_tree`. These are used to define a base layer that exposes extern functions. `ConstTree` implements `Add` so distinct libraries of extern functions can be intuitively combined.
|
||||||
10
notes/papers/report/parts/references.md
Normal file
10
notes/papers/report/parts/references.md
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
# References
|
||||||
|
|
||||||
|
[1] various authors, "C++ Programming/Templates/Template Meta-Programming" https://en.wikibooks.org/wiki/C++_Programming/Templates/Template_Meta-Programming (accessed May 5, 2023)
|
||||||
|
|
||||||
|
[2] J. Huey on behalf of The Types Team, "Generic associated types to be stable in Rust 1.65" https://blog.rust-lang.org/2022/10/28/gats-stabilization.html (accessed May 5, 2023)
|
||||||
|
|
||||||
|
[3] K. Wnasbrough, "Instance Declarations are Uniuersal" https://www.lochan.org/keith/publications/undec.html (accessed May 5, 2023)
|
||||||
|
|
||||||
|
[4] M. Stay, "Allow classes to be parametric in other parametric classes" https://github.com/microsoft/TypeScript/issues/1213 (accessed May 5, 2023)
|
||||||
|
|
||||||
@@ -6,9 +6,7 @@ This is also when I came up with the name. I read an article about how orchids d
|
|||||||
|
|
||||||
Having tested that my idea could work, at the start of the academic year I switched to the type system. When the project synopsis was written, I imagined that the type system would be an appropriately sized chunk of the work for a final year project; its title was "Orchid's Type System".
|
Having tested that my idea could work, at the start of the academic year I switched to the type system. When the project synopsis was written, I imagined that the type system would be an appropriately sized chunk of the work for a final year project; its title was "Orchid's Type System".
|
||||||
|
|
||||||
Around the end of November I had researched enough type theory to decide what kind of type system I would want. My choice was advised by a number of grievances I had with Typescript such as the lack of higher-kinded types which comes up [surprisingly often][1] in Javascript, lack of support for nominal types and the difficulty of using dependent types. I appreciated however the powerful type transformation techniques.
|
Around the end of November I had researched enough type theory to decide what kind of type system I would want. My choice was advised by a number of grievances I had with Typescript such as the lack of higher-kinded types which comes up surprisingly often[4] in Javascript, lack of support for nominal types and the difficulty of using dependent types. I appreciated however the powerful type transformation techniques.
|
||||||
|
|
||||||
[1]: https://github.com/microsoft/TypeScript/issues/1213
|
|
||||||
|
|
||||||
However, building a type system proved too difficult; on February 23 I decided to cut my losses and focus on building an interpreter. The proof-of-concept interpreter was finished on March 10, but the macro executor was still using the naiive implementation completed over the summer so it would take around 15 seconds to load an example file of 20 lines, and a range of other issues cropped up as well cumulatively impacting every corner of the codebase. A full rewrite was necessary.
|
However, building a type system proved too difficult; on February 23 I decided to cut my losses and focus on building an interpreter. The proof-of-concept interpreter was finished on March 10, but the macro executor was still using the naiive implementation completed over the summer so it would take around 15 seconds to load an example file of 20 lines, and a range of other issues cropped up as well cumulatively impacting every corner of the codebase. A full rewrite was necessary.
|
||||||
|
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
# Type system
|
## Type system
|
||||||
|
|
||||||
This is a description of the type system originally designed for Orchid which never reached the MVP stage.
|
This is a description of the type system originally designed for Orchid which never reached the MVP stage.
|
||||||
|
|
||||||
@@ -8,7 +8,7 @@ At the core the type system consists of three concepts:
|
|||||||
- `impl` provides instances of typeclasses
|
- `impl` provides instances of typeclasses
|
||||||
- a universal parametric construct that serves as both a `forall` (or generic) and a `where` (or constraint). This was temporarily named `auto` but is probably more aptly described by the word `given`.
|
- a universal parametric construct that serves as both a `forall` (or generic) and a `where` (or constraint). This was temporarily named `auto` but is probably more aptly described by the word `given`.
|
||||||
|
|
||||||
## Unification
|
### Unification
|
||||||
|
|
||||||
The backbone of any type system is unification. In this case, this is an especially interesting question because the type expressions are built with code and nontermination is outstandingly common.
|
The backbone of any type system is unification. In this case, this is an especially interesting question because the type expressions are built with code and nontermination is outstandingly common.
|
||||||
|
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
## Given (formerly Auto)
|
### Given (formerly Auto)
|
||||||
|
|
||||||
`given` bindings have the form `@Name:type. body`. Either the `Name` or the `:type` part can be optional but at least one is required. The central idea is that wherever a binding is unwrapped by an operation the language attempts to find a value for the name. Bindings are unwrapped in the following situations:
|
`given` bindings have the form `@Name:type. body`. Either the `Name` or the `:type` part can be optional but at least one is required. The central idea is that wherever a binding is unwrapped by an operation the language attempts to find a value for the name. Bindings are unwrapped in the following situations:
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user