Transfer commit

This commit is contained in:
2023-03-21 19:36:40 +00:00
parent 180ebb56fa
commit f3ce910f66
63 changed files with 1410 additions and 1023 deletions

View File

@@ -0,0 +1,49 @@
# List of open-source packages I used
## [thiserror](https://github.com/dtolnay/thiserror)
_License: Apache 2.0 or MIT_
Helps derive `Error` for aggregate errors, although I eventually stopped trying to do so as it was simpler to just treat error types as bags of data about the failure.
## [chumsky](https://github.com/zesterer/chumsky)
_License: MIT_
A fantastic parser combinator that allowed me to specify things like the nuanced conditions under which a float token can be promoted to an uint token in a declarative way. In hindsight passes after tokenization could have been written by hand, tokenized Orchid is not that hard to parse into an AST and it would have probably made some tasks such as allowing `.` (dot) as a token considerably easier.
## [hashbrown](https://github.com/rust-lang/hashbrown)
_License: Apache 2.0 or MIT_
Google's swisstable. Almost perfectly identical to `HashMap` in std, with a couple additional APIs. I use it for the raw entry API which the generic processing step cache requires to avoid unnecessary clones of potentially very large trees.
## [mappable-rc](https://github.com/JakobDegen/mappable-rc)
_License: Apache 2.0 or MIT_
A refcounting pointer which can be updated to dereference to some part of the value it holds similarly to C++'s `shared_ptr`. Using this crate was ultimately a mistake on my part, in early stages of development (early stages of my Rust journey) I wanted to store arbitrary subsections of an expression during macro execution without dealing with lifetimes. Removing all uses of this crate and instead just dealing with lifetimes is on the roadmap.
## [ordered-float](https://github.com/reem/rust-ordered-float)
_License: MIT_
A wrapper around floating point numbers that removes `NaN` from the set of possible values, promoting `<` and `>` to total orderings and `==` to an equivalence relation. Orchid does not have `NaN` because it's a silent error. All operations that would produce `NaN` either abort or indicate the failure in their return type.
## [itertools](https://github.com/rust-itertools/itertools)
_License: Apache 2.0 or MIT_
A utility crate, I use it everywhere.
## [smallvec](https://github.com/servo/references-smallvec)
_License: Apache 2.0 or MIT_
small vector optimization - allocates space for a statically known number of elements on the stack to save heap allocations. This is a gamble since the stack space is wasted if the data does spill to the heap, but it can improve performance massively in hot paths.
## [dyn-clone](https://github.com/dtolnay/dyn-clone)
_License: Apache 2.0 or MIT_
All expressions in Orchid are clonable, and to allow for optimizations, Atoms have control over their own cloning logic, so this object-safe version of `Clone` is used.

View File

@@ -45,10 +45,10 @@ $1 [ 0 ] a equals a < $1 ] b 0
Some global rules are also needed, also instantiated for all possible characters in the templated positions
```
$1 $2 < equals $2 < $1 unless $1 is |
| $1 < equals $1 | >
> $1 $2 equals $1 > $2 unless $2 is ]
> $1 ] equals [ $1 ]
$1 $2 < equals $2 < $1 unless $1 is |
| $1 < equals $1 | >
> $1 $2 equals $1 > $2 unless $2 is ]
> $1 ] equals [ $1 ]
```
What I really appreciate in this proof is how visual it is; based on this, it's easy to imagine how one would go about encoding a pushdown automaton, lambda calculus or other interesting tree-walking procedures. This is exactly why I based my preprocessor on this system.
@@ -57,10 +57,41 @@ What I really appreciate in this proof is how visual it is; based on this, it's
I found two major problems with C and Rust macros which vastly limit their potential. They're relatively closed systems, and prone to aliasing. Every other item in Rust follows a rigorous namespacing scheme, but the macros break this seal, I presume the reason is that macro execution happens before namespace resolution.
Orchid's macros - substitution rules - operate on namespaced tokens. This means that users can safely give their macros short and intuitive names, but it also means that the macros can hook into each other. Consider for example the following hypothetical example.
a widely known module implements a unique way of transforming iterators using an SQL-like syntax.
Orchid's macros - substitution rules - operate on namespaced tokens. This means that users can safely give their macros short and intuitive names, but it also means that the macros can hook into each other. Consider for example the following example, which is a slightly modified version of a
real rule included in the prelude:
in _procedural.or_
```orchid
select ...$collist from ...$
export do { ...$statement ; ...$rest:1 } =10_001=> (
statement (...$statement) do { ...$rest }
)
export do { ...$return } =10_000=> (...$return)
export statement (let $_name = ...$value) ...$next =10_000=> (
(\$_name. ...$next) (...$value)
)
```
in _cpsio.or_
```orchid
import procedural::statement
export statement (cps $_name = ...$operation) ...$next =10_001=> (
(...$operation) \$_name. ...$next
)
export statement (cps ...$operation) ...$next =10_000=> (
(...$operation) (...$next)
)
```
in _main.or_
```orchid
import procedural::(do, let, ;)
import cpsio::cps
export main := do{
cps data = readline;
let a = parse_float data * 2;
cps print (data ++ " doubled is " ++ stringify a)
}
```

View File

@@ -0,0 +1,101 @@
# Parsing
Orchid expressions are similar in nature to lambda calculus or haskell, except whitespace is mostly irrelevant.
## Names
`name` and `ns_name` tokens appear all over the place in this spec. They represent operators, function names, arguments, modules. A `name` is
1. the universally recognized operators `,`, `.`, `..` and `...` (comma and single, double and triple dot)
2. any C identifier
3. any sequence of name-safe characters starting with a character that cannot begin a C identifier. A name-safe character is any non-whitespace Unicode character other than
- digits
- the namespace separator `:`,
- the parametric expression starters `\` and `@`,
- the string and char delimiters `"` and `'`,
- the various brackets`(`, `)`, `[`, `]`, `{` and `}`,
- `,`, `.` and `$`
This means that, in absence of a known list of names, `!importatn!` is a single name but `importatn!` is two names, as a name that starts as a C identifier cannot contain special characters. It also means that using non-English characters in Orchid variables is a really bad idea. This is intentional, identifiers that need to be repeated verbatim should only contain characters that appear on all latin keyboards.
There are also reserved words that cannot be used as names; `export` and `import`.
A `ns_name` is a sequence of one or more `name` tokens separated by the namespace separator `::`.
All tokens that do not contain `::` in the code may be `name` or `ns_name` depending on their context.
## Clauses
Clauses are the building blocks of Orchid's syntax. They belong to one of a couple categories:
- S-expressions are a parenthesized sequence of space-delimited `clause`s. All three types of brackets `()`, `[]` and `{}` are supported.
- Lambdas start with `\<name>.`, followed by a sequence of `clause`s where `<name>` is a single `name` or `$_` followed by a C identifier. This is a greedy pattern that ends at the end of an enclosing S-expression, or the end of input.
- numbers can be in decimal, binary with the `0b` prefix, hexadecimal with the `0x` prefix, or octal with the `0` prefix. All bases support the decimal point, exponential notation or both. The exponent is prefixed with `p`, always written in decimal, may be negative, and it represents a power of the base rather than a power of 10. For example, `0xf0.4p-2` is `0xf04 / 16 ^ 3` or ~0.9385.
- Strings are delimited with `"`, support `\` escapes and four digit unicode escapes of the form `\uXXXX`. They may contain line breaks.
- Chars are a single character or escape from the above description of a string delimited by `'`.
- Placeholders are either of three styles; `$name`, `..$name`, `...$name`, `..$name:p`, `...$name:p`. the name is always a C identifier, p is an integer growth priority.
- Names are a single `ns_name`
## Files
Files are separated into lines. A line is delimited by newlines and only contains newlines within brackets. A line may be an import, rule, exported rule, or explicit export.
### Rules
Rules have the following form
```
pattern =priority=> template
```
The pattern is able to define new operators implicitly by referencing them, so all tokens must be delimited by spaces. The template is inserted in place of the pattern without parentheses, so unless it's meant to be part of a pattern matched by another rule which expects a particular parenthesization, when more than one token is produced the output should be wrapped in parentheses.
A shorthand syntax is available for functions:
```
name := value
```
name in this case must be a single `name`. Value is automatically parenthesized, and the priority of these rules is always zero.
### Explicit exports and exported rules
An explicit export consists of `export :: ( <names> )` where `<names>` is a comma-separated list of `name`s.
An exported rule consists of the keyword `export` followed by a regular rule. It both counts as a rule and an export of all the `name`s within the pattern.
### Imports
An import is a line starting with the keyword `import`, followed by a tree of imported names.
```
import_tree = name
| name :: import_tree
| name :: *
| ( import_tree [, import_tree]+ )
```
Some examples of valid imports:
```
import std::cpsio
import std::(conv::parse_float, cpsio, str::*)
import std
```
Some examples of invalid imports:
```
import std::()
import std::cpsio::(print, *)
import std::(cpsio)
```
> **info**
>
> while none of these are guaranteed to work currently, there's little reason they would have to be invalid, so future specifications may allow them.
An import can be normalized into a list of independent imports ending either with a `*` called wildcard imports or with a `name`. wildcard imports are normalized to imports for all the `name`s exported from the parent module. All Name clauses in the file starting with the same `name` one of these imports ended with are prefixed with the full import path. The rest of the Name clauses are prefixed with the full path of the current module.
Reference cycles in Orchid modules are never allowed, so the dependency of a module's exports on its imports and a wildcard's import's value on the referenced module's exports does not introduce the risk of circular dependencies, it just specifies the order of processing for files.

View File

@@ -0,0 +1,45 @@
# Macros
After parsing, what remains is a set of macro rules, each with a pattern, priority and template. Modules aren't tracked in this stage, their purpose was to namespace the tokens within the rules.
By employing custom import logic, it's also possible to add rules bypassing the parser. Starting with the macro phase, `clause`s may also be `atom`s or `externfn`s. The role of these is detailed in the [[03-runtime]] section.
Macros are executed in reverse priority order, each macro is checked against each subsection of each clause sequence. When a match is found, the substitution is performed and all macros are executed again.
## Placeholders
Patterns fall into two categories
- scalar placeholders
- `$name` matches exactly one clause
- `$_name` matches exactly one Name clause
- vectorial placeholders
- `..$name` matches zero or more clauses
- `...$name` matches one or more clauses
`$_name` is uniquely valid in the position of an argument name within a lambda.
Vectorial placeholders may also have a positive decimal integer growth priority specified after the name, separated with a `:` like so: `...$cond:2`. If it isn't specified, the growth priority defaults to 0.
The template may only include placeholders referenced in the pattern. All occurences of a placeholder within a rule must match the same things.
## Execution
Each clause in the pattern matches clauses as follows:
- Name matches name with the same full path.
- Lambda matches a lambda with matching argument name and matching body. If the argument name in the pattern is a name-placeholder (as in `\$_phname.`), the argument name in the source is treated as a module-local Name clause.
- Parenthesized expressions match each other if the contained sequences match and both use the same kind of parentheses.
- Placeholders' matched sets are as listed in [Placeholders].
If a pattern contains the same placeholder name more than once, matches where they don't match perfectly identical clauses, names or clause sequences are discarded.
### Order of preference
The growth order of vectorial placeholders is
- Outside before inside parentheses
- descending growth priority
- left-to-right by occurrence in the pattern.
If a pattern matches a sequence in more than one way, whichever match allocates more clauses to the first vectorial placeholder in growth order is preferred.

View File

@@ -0,0 +1,32 @@
# Runtime
Orchid is evaluated lazily. This means that everything operates on unevaluated expressions. This has the advantage that unused values never need to be computed, but it also introduces a great deal of complexity in interoperability.
## Execution mode
The executor supports step-by-step execution, multiple steps at once, and running an expression to completion. Once an Orchid program reaches a nonreducible state, it is either an external item, a literal, or a lambda function.
## external API
In order to do anything useful, Orchid provides an API for defining clauses that have additional behaviour implemented in Rust. Basic arithmetic is defined using these.
### Atomic
atomics are opaque units of foreign data, with the following operations:
- functions for the same three execution modes the language itself supports
- downcasting to a concrete type
Atomics can be used to represent processes. Given enough processing cycles, these return a different clause.
They can also be used to wrap data addressed to other external code. This category of atomics reports nonreducible at all times, and relies on the downcasting API to interact with ExternFn-s.
It's possible to use a combination of these for conditional optimizations - for instance, to recognize chains of processes that can be more efficiently expressed as a single task.
### ExternFn
external functions can be combined with another clause to form a new clause. Most of the time, this new clause would be an Atomic which forwards processing to the arguments until they can't be normalized any further, at which point it either returns an ExternFn to take another argument or executes the operation associated with the function and returns.
Because this combination of operations is so common, several macros are provided to streamline it.
Sometimes, eg. when encoding effectful functions in continuation passing style, an ExternFn returns its argument without modification. It is always a logic error to run expressions outside a run call, or to expect an expression to be of any particular shape without ensuring that run returned nonreducible in the past.