diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 225baab5..fac7635f 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -122,3 +122,142 @@ flowchart LR compiler-frontend end ``` + +# Design Principles + +Nevalang is built on a set of principles. They were rather naturally derived from the development process rather artificially created beforehand. + +> WARNING: Language is under heavy development and these principles are not guarantees we can give you at the moment, but rather guiding stars for us to keep moving in the right direction + +## Program must fail at startup or never + +The idea is that most of the errors must be caught by compiler at compile time. And the rest of them, that are hard to catch (without sacrificing compiler's simplicity) are checked in runtime at startup. + +If no errors were caught at compile time and startup - then the program is correct and must run successfully. Any (non-logical) error that occurred after startup must be threated like compiler bug. + +## Runtime must be fast, flexible and unsafe + +Runtime won't do any checks after startup. The program that runtime consumes must be correct. Program's correctness must be ensured by compiler. If there's a bug in compiler and runtime consumed invalid program - bad things can happen: deadlocks, memory leaks, freezes and crashes. + +## Compiler directives must not be required + +Language must allow to implement everything without using of compiler directives. + +**Compiler directives are not always unsafe** (analyzer won't always validate their usage - that will make implementation more complicated) and thus must be used by language/stdlib developers or at _for users that know what they are doing_. + +It's good for user to understand what compiler directives are and how syntax sugar use them under the hood though. + +## There is interpreter (backend can be slow) + +Compiler must be fast to the point where it generates IR. After that we have generating of target code (e.g. generating Go and then generating machine code with Go compiler) - that part ("backend") doesn't have to be fast. It's more important to keep it simple. + +The reason for that is that we have an interpreter that is internally uses compiler (it's impossible to generate IR from invalid program due to lack of type information), but not the whole thing. Just to the point where it generates IR. That's the part of the compiler that is used for development/debugging purposes. That's where we need to be fast. + +## There is visual programming + +Once we build the good enough tool for visual programming we will switch from text based approach. Text will become supporting tool. To achieve this we must always keep in mind that what we do with the language must be easy to visualize in graph environment. + +# Internal Implementation Q&A + +## Why structures are not represented as Go structures? + +It would take generating Go types dynamically which is either makes use of reflection or codegeneration (which makes interpreter mode impossible). Maps have their overhead but they are easy to work with. + +## Why nested structures are not represented as flat maps? + +Indeed it's possible to represent `{ foo {bar int } }` like `{ "foo/bar": 42 }`. The problem arise when when we access the whole field. Let's take this example: + +``` +types { + User { + pet { + name str + } + } +} + +... + +$u.pet -> foo.bar +``` + +What will `foo.bar` actually receive? This design makes impossible to actually send structures around and allows to operate on non-structured data only. + +## Why Go? + +It's a perfect match. Go has builtin green threads, scheduler and garbage collector. Even more than that - it has goroutines and channels that are 1-1 mappings to FBP's ports and connections. Last but not least is that it's a pretty fast compiled language. Having Go as a compile target allows to reuse its state of the art standart library and increase performance for free by just updating the underlaying compiler. + +## Why compiler operates on multi-module graph (build) and not just turns everything into one big module? + +Imagine you have `foo.bar` in your code. How does compiler figures out what that actually is? In order to do that it needs to _resolve_ that _reference_. And this is how _reference resolution_ works: + +First, find out what `foo` is. Look at the `import` section in the current file. Let's say we see something like: + +```neva +import { + github.com/nevalang/x/foo +} +``` + +This is how we now that `foo` is actually `github.com/nevalang/x/foo` imported package. Cool, but when version of the `github.com/nevalang/x` we should use? Well, to figure that out we need to look out current _module_'s _manifest_ file. There we can find something like: + +```yaml +deps: + - github.com/nevalang/x 0.0.1 +``` + +Cool, now we now what _exactly_ `foo` is. It's a `foo` package inside of `0.0.1` version of the `github.com/nevalang/x` module. So what's the point of operating on a nested multi-module graph instead of having one giant module? + +Now let's consider another example. Instead of depending on `github.com/nevalang/x` your code depends on `submodule` and that sub-module itself depends on `github.com/nevalang/x` + +You still have that `foo.bar` in your code and your module still depends on `github.com/nevalang/x` module. But now you also depends on another `submod` sub-module that also depends on `github.com/nevalang/x`. But your module depends on `github.com/nevalang/x` of the `0.0.1` version and `submod` depends on `1.0.0`. + +Now we have a problem. When compiler sees `foo.bar` in some file it does import lookup and sees `github.com/nevalang/x` and... does not know what to do. To solve this issue we need to lookup current module manifest and check what version `github.com/nevalang/x` _this current module_ uses. To do that we need to preserve the multi-module structure of the program. + +One might ask can't we simply import things like: + +```neva +import { + github.com/nevalang/x@0.0.1 +} +``` + +That actually could solve the issue. The problem is that now we have to update the source code _each time we update our depen\dency_. That's a bad solution. We simply made probramming harder to avoid working on a compiler. We can do better. + +## Why #runtime_func_msg does not accept literals? + +Indeed it would be handy to be able to do stuff like this: + +```neva +nodes { + #runtime_func_msg(str "hello world!") + const Const +} +``` + +This would make desugarer much simpler (no need to create all this virtual constants), and not just for const senders but for struct selectors too. + +However, to implement this we need to be able to parse literals inside `irgen`. Right now we already introduce dependency for parsing entity references, but for arbitrary expressions we need the whole parser. + +Of course, it's possible to hide actual parser implementation behind some kind of interface defined by irgen but that would make code more complicated. Besides, the very idea of having parser inside code-generator sounds bad. Parsing references is the acceptable compromise on the other hand. + +## Why Analyzer knows about stdlib? Isn't it bad design? + +At first there was a try to implement analyzer in a way that it only knows about the core of the language. + +But turns out that some components in stdlib (especially `builtin` package, especially the ones that uses `#runtime_func` and `#runtime_func_msg` directives) are actually part of the core of the language. + +E.g. when user uses struct selectors like `foo.bar/baz -> ...` and then desugarer replaces this with `foo.bar -> structSelectorNode("baz") -> ...` (this is pseudocode) we must ensure that type of the `bar` is 1) a `struct` 2) has field `baz` and 3) `baz` is compatible with whatever `...` is. _This is static semantic analysis_ and that's is work for analyzer. + +Actually every time we use compiler directive we depend on implicit contract that cannot be expressed in the terms of the language itself (except we introduce abstractions for that, which will make language more complicated). That's why we have to analyze such things by injecting knowledge about stdlib. + +Designing the language in a way where analyzer has zero knowledge about stdlib is possible in theory but would make the language more complicated and would take much more time. + +## Why desugarer comes after analyzer in compiler's pipeline? + +Two reasons: + +1. Analyzer should operate on original "sugared" program so it can found errors in user's source code. Otherwise found errors can relate to desugar implementation (compiler internals) which is not the compilation error but debug info for compiler developers. Finally it's much easier to make end-user errors readable and user-friendly this way. +2. Desugarer that comes before analysis must duplicate some validation because it's unsafe to desugar some constructs before ensuring they are valid. E.g. desugar struct selectors without knowing fir sure that outport's type is a valid structure. Also many desugaring transformations are only possible on analyzed program with all type expressions resolved. + +Actually it's impossible to have desugarer before analysis. It's possible to have two desugarers - one before and one after. But that would make compiler much more complicated without visible benefits. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e5333d36..5d44a145 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,8 +1,6 @@ # Requirements -Read the [documentation](./docs/). Many design decisions are described there. - -This is **must read** for everyone who want to contribute to the language. +Make sure you've read the https://nevalang.org/docs from cover to cover ## System @@ -19,31 +17,21 @@ These are not really required but recommended in order you're using VSCode - [nevalang](https://marketplace.visualstudio.com/items?itemName=nevalang.vscode-nevalang) - [antlr4](https://marketplace.visualstudio.com/items?itemName=mike-lischke.vscode-antlr4) - [tmlanguage](https://marketplace.visualstudio.com/items?itemName=pedro-w.tmlanguage) -- [tooltitude](https://marketplace.visualstudio.com/items?itemName=tooltitudeteam.tooltitude) - [markdown-mermaid](https://marketplace.visualstudio.com/items?itemName=bierner.markdown-mermaid) -- [ts-errors](https://marketplace.visualstudio.com/items?itemName=yoavbls.pretty-ts-errors) # Development -After you've read the [documentation](./docs/) see [architecture high level overview](./ARCHITECTURE.md), see what [Makefile](./Makefile) can do. - -Remember that many go packages contain doc comments. Do not fear to read the source code, leave the comments for unintuitive parts. - -Try to follow [clean code](https://github.com/Pungyeon/clean-go-article), [clean architecture](https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html) and [SOLID](https://en.wikipedia.org/wiki/SOLID). - -Discuss your ideas via github discussions or issues before implementing it. Write tests, avoid using `nolint`. Leave the comments (but only when you have to), update documentation. - -## Github Issues - -Issues must only be created for known bugs and understandable architecture issues. Not ideas, suggestions or feature requests. Discussions must be used for that instead. +See [architecture high level overview](./ARCHITECTURE.md) and what [Makefile](./Makefile) can do. ## VSCode Extension -Check out [tygo.yaml](./tygo.yaml). It depends on types defined in the `src` and `typesystem` packages and thus it's dangerous to rename those types. If you gonna do so make sure you don't brake TS types generation. Check [web/CONTRIBUTING.md](./web/CONTRIBUTING.md). +Extension depends on types defined in the `src` and `typesystem` packages so it's dangerous to rename those. If you going to do so, make sure you did't brake TS types generation. + +Check out [tygo.yaml](./tygo.yaml). and CONTRIBUTING.md in vscode-neva repo. ## ANTLR Grammar -Don't forget to open `neva.g4` file before debugging with VSCode! +Don't forget to open `neva.g4` file before debugging with VSCode # Naming conventions @@ -55,53 +43,18 @@ Use `_` instead of space in for test-case names because go turns spaces into und ## FBP/DataFlow -- [Elements of Dataflow and Reactive Programming Systems](https://youtu.be/iFlT93wakVo?feature=shared) -- [The origins of Flow Based Programming with J Paul Morrison](https://youtu.be/up2yhNTsaDs?feature=shared) -- [Dataflow and Reactive Programming Systems: A Practical Guide](https://www.amazon.com/Dataflow-Reactive-Programming-Systems-Practical/dp/1497422442) - [Flow-Based Programming: A New Approach to Application Development](https://jpaulmorrison.com/fbp/1stedchaps.html) -- [Samuel Smith - "Flow Based Programming"](https://youtu.be/j3cP8uwf5YM?feature=shared) +- [Dataflow and Reactive Programming Systems: A Practical Guide](https://www.amazon.com/Dataflow-Reactive-Programming-Systems-Practical/dp/1497422442) ## Golang -### Must Read - -- [How To Write Go Code](https://go.dev/doc/code) -- [Effective Go](https://go.dev/doc/effective_go) -- [Go Proverbs](https://go-proverbs.github.io/) -- [50 Shades Of Go](http://golang50shad.es/) -- [Darker Corners Of Go](https://rytisbiel.com/2021/03/06/darker-corners-of-go/) - -### Highly Recommended +Advanced golang knowledge is required. Especially understanding of concurrency. - [Concurrency is not parallelism](https://go.dev/blog/waza-talk) - [Share Memory By Communicating](https://go.dev/blog/codelab-share) -- [Errors Are Values](https://go.dev/blog/errors-are-values) -- [Defer, Panic, and Recover](https://go.dev/blog/defer-panic-and-recover) - -### Nice To Know - -- [Strings, bytes, runes and characters in Go](https://go.dev/blog/strings) -- Concurrency - - [Go Concurrency Patterns: Timing out, moving on](https://go.dev/blog/concurrency-timeouts) - - [Go Concurrency Patterns: Context](https://go.dev/blog/context) - - [Go Concurrency Patterns: Pipelines and cancellation](https://go.dev/blog/pipelines) - -## JavaScript - -- [MDN](https://developer.mozilla.org/en-US/) -- [TypeScript docs](https://www.typescriptlang.org/) -- [React docs](https://react.dev/) -- [You don't know JS books](https://github.com/getify/You-Dont-Know-JS) - -## VSCode Extensions API Docs - -- [Custom Editors](https://code.visualstudio.com/api/extension-guides/custom-editors) -- [Webviews](https://code.visualstudio.com/api/extension-guides/webview) -- [Language Servers (LSP)](https://code.visualstudio.com/api/language-extensions/language-server-extension-guide) -- [Syntax Highlighter](https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide) -- [LSP Overview](https://microsoft.github.io/language-server-protocol/) -- [Go library for LSP implementation](https://github.com/tliron/glsp) -- [LSP official docs](https://microsoft.github.io/language-server-protocol/) +- [Go Concurrency Patterns: Timing out, moving on](https://go.dev/blog/concurrency-timeouts) +- [Go Concurrency Patterns: Context](https://go.dev/blog/context) +- [Go Concurrency Patterns: Pipelines and cancellation](https://go.dev/blog/pipelines) ## Subjective Recommendations @@ -114,23 +67,12 @@ Use `_` instead of space in for test-case names because go turns spaces into und - ["What Is a Strange Loop and What is it Like To Be One?" by Douglas Hofstadter (2013)](https://youtu.be/UT5CxsyKwxg?feature=shared) - ["The Economics of Programming Languages" by Evan Czaplicki (Strange Loop 2023)](https://youtu.be/XZ3w_jec1v8?feature=shared) - [Why Isn't Functional Programming the Norm? – Richard Feldman](https://youtu.be/QyJZzq0v7Z4?feature=shared) -- https://www.youtube.com/watch?v=SxdOUGdseq4 "Simple Made Easy" - Rich Hickey (2011) ### Books And Articles -- [Grokking Algorithms: An Illustrated Guide for Programmers and Other Curious People](https://www.amazon.com/Grokking-Algorithms-illustrated-programmers-curious/dp/1617292230) - [Bob Martin's Clean Architecture](https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html) -- [The Go Programming Language](https://www.amazon.com/Programming-Language-Addison-Wesley-Professional-Computing/dp/0134190440) -- [Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems](https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321) - [Code: The Hidden Language of Computer Hardware and Software](https://www.amazon.com/Code-Language-Computer-Hardware-Software/dp/0735611319) -### Other - -- [Go By Example](https://gobyexample.com/) - # Community -Here you can find help - -- [Flow-Based Programming Discord](https://discord.gg/JHWRuZQJ) -- [r/ProgrammingLanguages](https://www.reddit.com/r/ProgrammingLanguages/) +Check out https://nevalang.org/community to find out where you can get help \ No newline at end of file diff --git a/docs/design_principles.md b/docs/design_principles.md deleted file mode 100644 index 6b5e5a36..00000000 --- a/docs/design_principles.md +++ /dev/null @@ -1,33 +0,0 @@ -# Design Principles - -Nevalang is built on a set of principles. They were rather naturally derived from the development process rather artificially created beforehand. - -> WARNING: Language is under heavy development and these principles are not guarantees we can give you at the moment, but rather guiding stars for us to keep moving in the right direction - -## Program must fail at startup or never - -The idea is that most of the errors must be caught by compiler at compile time. And the rest of them, that are hard to catch (without sacrificing compiler's simplicity) are checked in runtime at startup. - -If no errors were caught at compile time and startup - then the program is correct and must run successfully. Any (non-logical) error that occurred after startup must be threated like compiler bug. - -## Runtime must be fast, flexible and unsafe - -Runtime won't do any checks after startup. The program that runtime consumes must be correct. Program's correctness must be ensured by compiler. If there's a bug in compiler and runtime consumed invalid program - bad things can happen: deadlocks, memory leaks, freezes and crashes. - -## Compiler directives must not be required - -Language must allow to implement everything without using of compiler directives. - -**Compiler directives are not always unsafe** (analyzer won't always validate their usage - that will make implementation more complicated) and thus must be used by language/stdlib developers or at _for users that know what they are doing_. - -It's good for user to understand what compiler directives are and how syntax sugar use them under the hood though. - -## There is interpreter (backend can be slow) - -Compiler must be fast to the point where it generates IR. After that we have generating of target code (e.g. generating Go and then generating machine code with Go compiler) - that part ("backend") doesn't have to be fast. It's more important to keep it simple. - -The reason for that is that we have an interpreter that is internally uses compiler (it's impossible to generate IR from invalid program due to lack of type information), but not the whole thing. Just to the point where it generates IR. That's the part of the compiler that is used for development/debugging purposes. That's where we need to be fast. - -## There is visual programming - -Once we build the good enough tool for visual programming we will switch from text based approach. Text will become supporting tool. To achieve this we must always keep in mind that what we do with the language must be easy to visualize in graph environment. diff --git a/docs/internal.md b/docs/internal.md deleted file mode 100644 index 4b1692fb..00000000 --- a/docs/internal.md +++ /dev/null @@ -1,104 +0,0 @@ -# Internal Implementation - -## Why structures are not represented as Go structures? - -It would take generating Go types dynamically which is either makes use of reflection or codegeneration (which makes interpreter mode impossible). Maps have their overhead but they are easy to work with. - -## Why nested structures are not represented as flat maps? - -Indeed it's possible to represent `{ foo {bar int } }` like `{ "foo/bar": 42 }`. The problem arise when when we access the whole field. Let's take this example: - -``` -types { - User { - pet { - name str - } - } -} - -... - -$u.pet -> foo.bar -``` - -What will `foo.bar` actually receive? This design makes impossible to actually send structures around and allows to operate on non-structured data only. - -## Why Go? - -It's a perfect match. Go has builtin green threads, scheduler and garbage collector. Even more than that - it has goroutines and channels that are 1-1 mappings to FBP's ports and connections. Last but not least is that it's a pretty fast compiled language. Having Go as a compile target allows to reuse its state of the art standart library and increase performance for free by just updating the underlaying compiler. - -## Why compiler operates on multi-module graph (build) and not just turns everything into one big module? - -Imagine you have `foo.bar` in your code. How does compiler figures out what that actually is? In order to do that it needs to _resolve_ that _reference_. And this is how _reference resolution_ works: - -First, find out what `foo` is. Look at the `import` section in the current file. Let's say we see something like: - -```neva -import { - github.com/nevalang/x/foo -} -``` - -This is how we now that `foo` is actually `github.com/nevalang/x/foo` imported package. Cool, but when version of the `github.com/nevalang/x` we should use? Well, to figure that out we need to look out current _module_'s _manifest_ file. There we can find something like: - -```yaml -deps: - - github.com/nevalang/x 0.0.1 -``` - -Cool, now we now what _exactly_ `foo` is. It's a `foo` package inside of `0.0.1` version of the `github.com/nevalang/x` module. So what's the point of operating on a nested multi-module graph instead of having one giant module? - -Now let's consider another example. Instead of depending on `github.com/nevalang/x` your code depends on `submodule` and that sub-module itself depends on `github.com/nevalang/x` - -You still have that `foo.bar` in your code and your module still depends on `github.com/nevalang/x` module. But now you also depends on another `submod` sub-module that also depends on `github.com/nevalang/x`. But your module depends on `github.com/nevalang/x` of the `0.0.1` version and `submod` depends on `1.0.0`. - -Now we have a problem. When compiler sees `foo.bar` in some file it does import lookup and sees `github.com/nevalang/x` and... does not know what to do. To solve this issue we need to lookup current module manifest and check what version `github.com/nevalang/x` _this current module_ uses. To do that we need to preserve the multi-module structure of the program. - -One might ask can't we simply import things like: - -```neva -import { - github.com/nevalang/x@0.0.1 -} -``` - -That actually could solve the issue. The problem is that now we have to update the source code _each time we update our dependency_. That's a bad solution. We simply made probramming harder to avoid working on a compiler. We can do better. - -## Why #runtime_func_msg does not accept literals? - -Indeed it would be handy to be able to do stuff like this: - -```neva -nodes { - #runtime_func_msg(str "hello world!") - const Const -} -``` - -This would make desugarer much simpler (no need to create all this virtual constants), and not just for const senders but for struct selectors too. - -However, to implement this we need to be able to parse literals inside `irgen`. Right now we already introduce dependency for parsing entity references, but for arbitrary expressions we need the whole parser. - -Of course, it's possible to hide actual parser implementation behind some kind of interface defined by irgen but that would make code more complicated. Besides, the very idea of having parser inside code-generator sounds bad. Parsing references is the acceptable compromise on the other hand. - -## Why Analyzer knows about stdlib? Isn't it bad design? - -At first there was a try to implement analyzer in a way that it only knows about the core of the language. - -But turns out that some components in stdlib (especially `builtin` package, especially the ones that uses `#runtime_func` and `#runtime_func_msg` directives) are actually part of the core of the language. - -E.g. when user uses struct selectors like `foo.bar/baz -> ...` and then desugarer replaces this with `foo.bar -> structSelectorNode("baz") -> ...` (this is pseudocode) we must ensure that type of the `bar` is 1) a `struct` 2) has field `baz` and 3) `baz` is compatible with whatever `...` is. _This is static semantic analysis_ and that's is work for analyzer. - -Actually every time we use compiler directive we depend on implicit contract that cannot be expressed in the terms of the language itself (except we introduce abstractions for that, which will make language more complicated). That's why we have to analyze such things by injecting knowledge about stdlib. - -Designing the language in a way where analyzer has zero knowledge about stdlib is possible in theory but would make the language more complicated and would take much more time. - -## Why desugarer comes after analyzer in compiler's pipeline? - -Two reasons: - -1. Analyzer should operate on original "sugared" program so it can found errors in user's source code. Otherwise found errors can relate to desugar implementation (compiler internals) which is not the compilation error but debug info for compiler developers. Finally it's much easier to make end-user errors readable and user-friendly this way. -2. Desugarer that comes before analysis must duplicate some validation because it's unsafe to desugar some constructs before ensuring they are valid. E.g. desugar struct selectors without knowing fir sure that outport's type is a valid structure. Also many desugaring transformations are only possible on analyzed program with all type expressions resolved. - -Actually it's impossible to have desugarer before analysis. It's possible to have two desugarers - one before and one after. But that would make compiler much more complicated without visible benefits.