New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to add 'owned' refs to Nim #144
Comments
You could define a standard macro that does node = nil, to hide this unsightly sight. |
Not being familiar with C++, I am confused by the direction Nim is taking. I am confused about the whole new runtime thing, and I am a little worried seeing how much effort goes into inventing new semantics and patching the standard library to work in both modes, instead of making Nim ready for a 1.0 release, with garbage collection, thread-local heaps and a shared unsafe heap as it was advertised before. Do not misunderstand me: I think the semantics of Nim about mutability is not great now, and fixing that will probably require some form of ownership concept. But it seems to me that the new runtime is an experiment of which the semantic is not formally proved to work and something that can open a myriad of new bugs. Recently, we have seen new work about
These are not small changes: they are quite fundamental distinguishing features of a language. I may even agree that these are useful directions to explore. But it leaves me worried about what will happen for Nim as I know it now. A language that uses garbage collection, freeing me from reasoning pervasively about ownership, which I can compile to C (not C++), which has a limited but simple threading model which I can easily reason about |
IMO, it makes sense to do these changes before 1.0 release because -
As far as I understand, destructors and owned refs are optional features that you may or may not choose to use. And they provide better safety as well as optimization opportunities wherever required. But incremental compilation should be postponed after v1. Instead (after destructors and owned refs are implemented) one release cycle should focus on bugfixes only. |
@andreaferretti I share your fears and this is indeed all stuff for version 2. Having said that, if we release v1 as it is with breaking changes in the future for v2 then why did we even take so long for v1. v1 took so long we might as well put in the extra effort and get the language into a shape we're confident it'll stand the test of time. Also, most regressions are not even due to feature changes. Most regressions are the result of bugfixes. That's terrible but apart from testing ever more things I'm out of ideas how to deal with this problem. Having said that, when was the last time we made Nim worse? I don't see it, nil for strings and seqs is gone (yay), |
Completely agree, if we improve the DLL/static lib generation stability that also offers a way out of the increasing compile-times. |
Uh, sorry for the misunderstanding, I never claimed that! In fact, I have seen many improvements :-) It's just that new features are piling and I am not sure that the language that will be standardized as v1 will resemble much the original vision of Nim (let's say Nim as we know now) |
As far as I understood, the GC is still there in V1, how would the transition work? AFAIK destructors are replacing the old However owned refs are quite different indeed. The macro to hide |
Of course the GC is there in v1, to make a transition possible. But a transition to what? To a language without GC? This is unfortunately still not clear to me. |
Yes but it's easy to misinterpret when you put it this way. Memory management is still mostly declarative and automatic. And it's not like the existing GC frees you of memory managment problems, you can easily have a cache that keeps growing in size and becoming a logical memory leak. (I have seen this happening in production a lot of times). In addition to that the GC doesn't close your sockets etc reliably, the new runtime would. |
The proper term is
Though putting file.close() in a finalizer kind of works at the moment. |
If I understand correctly, the reference counting would default to being disabled (in release builds?). If yes, do I understand correctly, that it is essentially a similar level of danger as disabled bounds checking? (Which is also disabled by default in release builds, right?) If yes, could there be some "middle" level of compilation, with optimizations as in release build, but with bounds checks + owned refcounting enabled? Say, "memory-safe release", or "bounds-checked release", or something? (Name totally open to being bikeshedded.) Or at least some flag for the release build, that would enable both checks at once? I mean, if I had bounds checks enabled in release builds as of now (is it even possible?), I wouldn't know I need to add another flag to "be safe" without reading this RFC. Generally, I would very much like an idea of a well advertised "safe" mode of compilation, for people who want some benefits of a "release" build, but "safety" is paramount to them, who are willing to trade some performance if this means keeping as much safety as possible (in hope of avoiding heartbleed-style bugs, etc.). |
@akavel Yes, completely agree and I consider it part of this proposal. You can use Also you can disable the checking only for the performance critical parts with Furthermore https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/Undangle.pdf
|
@akavel
Also see config/nim.cfg in the Nim install dir, and search for |
@Araq Thanks! I'm only not yet clear what exactly are you referring to by "it"; in particular, as part of the proposal, are you intending to add something like the |
I wanted to add |
Out of curiosity, how hard would it be to write a prototype of this runtime? I'm a little cautious about this feature for a lot of the same reasons as @andreaferretti mentioned, but I can see the potential benefits as well. Playing with a prototype would be really helpful in understanding how Nim would change. As a small aside, I like this proposed syntax for unowned refs better: https://forum.nim-lang.org/t/4743#29636 |
I posted this on IRC, but I thought it would be good to post it here as well: How is this different from the Let say I have a piece of important, complex, multithreaded server software that I'm using in a commercial product. It's important that it not crash unexpectedly or corrupt memory, otherwise I'll have angry customers (and an angry manager) to deal with.
Both these outcomes are rather unappealing - I'm facing a higher risk* of programming oversights that lead to random program aborts and memory corruption. This change would appear to actually put Nim in a situation that's worse than C++ with regards to memory management - at least C++ has the Look at all the commonly used programming languages - Python, Java, Ruby, Javascript, C#, C++, C, PHP, Go. Out of all of those, only 2 have memory management schemes that involve manual or semi-manual memory management mechanisms of the sort being proposed. Even Rust has something that, while not exactly as automatic as a garbage collector, is at least as surefire as one. I know there are alternatives out there. What about doing pure reference counting, and trying to detect possible reference cycles at compile time using the type graph? or special-casing reference cycles so that users have to mark a break point? At the end of the day, I think the decision on whether to implement this proposal comes down to who and what Nim is targeting. If Nim is supposed to be used as a general programming language, in the same areas as Go, C#, Java, and Python, then it needs to have reliability, in addition to performance. However, if Nim is supposed to be used primarily in embedded or HPC systems, where assembly, C, and C++ are the only suitable candidates, then perhaps performance needs to be the only concern. Personally, not having to worry about memory corruption or resource leaks** is one of the features that drew me to Nim. Having a language that had the speed of C, with the reliability and ease-of-use of Python, was what I was looking for. * Relative to Nim's current semantics |
No, they are not. The difference is that the scheme detects dangling pointers when they exist, not when they are deref'ed. That's something that valgrind, purify etc cannot detect because it would violate C's semantics. Whether that is in practice a big difference or not remains to be seen.
We don't have many of these highly reliable multithreaded servers in Nim. In practice the existing GC plus the various complex interactions with the OS's APIs mean that we have more unreliability than with a simpler more deterministic solution. Proof: Look at Nim's issue tracker.
No, the shared_ptr type is worse than a safer unique_ptr as you need to watch out you don't create cycles.
You can have plenty of resource leaks, the GC only collects memory. |
The point is that you get a chance to detect and correct the weird threading race condition in that mode. I do support adding a shared_ptr equivalent but I am sure it is not needed right now. |
@Varriount If you do have use case where you have multiple references and there is no way you can say which one is the owner, all references have 100% dynamic life time (improbable but possible use case) then |
@cooldome How does this pave the way for anything safer than what Nim currently has? As the language currently stands, use-after-free and memory corruption bugs are practically non-existent - one has to be using This proposal would change that. Any program using references could exhibit those bugs. Rebuttals to this fact seem to be the following:
With regards to the first point, unfortunately we do not live in a perfect world. Code gets written all the time that isn't properly tested, whether out of laziness, or because an individual simply doesn't have time to. In many situations, it is also incredibly difficult to write comprehensive tests (such as when code relies heavily on external data, such as a REST API); there is only so much mocking and separation one can do. With regards to the second point, without drastically changing the language, no amount of static analysis could ever hope to find all the situations in which use-after-free situations could arise. To do so, one would need to solve the halting problem. Even finding some of those situations will be hard, especially when one considers how multithreading can affect when parts of a program's logic (and therefore memory allocation, deallocation, and accesses) may run. I would much prefer just biting the bullet and using atomic reference counting and cycle detection. If a program is too slow, I throw more computing power behind it. If I can't do that, then I can resort to using raw pointers and the risks they bring (and let's not kid ourselves here, that's what this proposal is all about, turning the majority of references into a kind of raw pointer type). Technical implementation aside, what doesn't seem to be considered here is how this behavior will be perceived by those evaluating Nim. Most commonly used programming languages don't have the possibility of use-after-free or memory corruption bugs. The worst thing most languages have that's even remotely similar is null pointer/reference errors. I'm not saying that Nim should just become a clone of another language, but there is a limit to what people are willing to consider. |
That's wrong. Almost always whenever somebody brings up the halting problem it's wrong. What generally happens is that the analysis is pessimistic, but safe. For example var s = "string"
if haltingProblem:
s = 34 The Nim compiler does not allow
Atomic reference counting with cycle detection is one of the slowest GC algorithms you could come up with! And if you don't mind its overhead nobody is stopping you to leave on
Pure FUD.
How is that different from today where thousands of programmers already picked Go and not Nim? And since when is performance not important to have and a feature of its own? Most existing users of Nim picked it - among other things - for its performance. |
One of the best qualities of Nim is that most of the time it is a strict superset of the capabilities of any other language - that is, any code that you can imagine in Ruby, JavaScript, C++ or Malbolge can have an equivalent representation in Nim, composed of roughly the same abstractions, expressed with similar elegance. Completely eliminating the GC would lose us this quality because certain APIs rely on the existence of a GC. With that said, I don't see this proposal as a definitive plan to eliminate the GC from Nim, but rather as a way to greatly increase the number of programs that can be written without one. In particular, the standard library of Nim will be written with more care regarding resource management and the result will be that many user programs will become smaller and more efficient. Where the nature of the problem still requires more ad-hoc sharing, I think Nim can still provide a |
@Araq, Is there a branch where you're working on this? Or when would you expect a somewhat working (or at least building) prototype? That might make the discussion more concrete. |
The prototype implementation hides behind the |
So if I understand correctly what you're proposing are optional annotations that allow the GC to be turned off, is that correct? What follows is that libraries need to be explicitly written to support these annotations, right? So we will have two different ways to do things in Nim and end up in situations where libraries are written without a care in the world that they use a GC, and that will mean my GC-free app that uses |
var s = "string"
if haltingProblem:
s = 34
Yes, because of type checking, which can be done at compile time. Ref count checking to catch use-after-free cannot, so this is not an analogue. As to test coverage for a ref counting debug build: it's good that we don't have to test all the control flow paths which deref the owned ref. But since the abort could be triggered by any invalidation of the memory behind the owned ref, we do have to test all flow paths which do that, is that correct? If it is, these would be cases where the owned ref
Do we have any metrics for the effort this would take? It "feels" easier than covering derefs, though. |
AFAIU, your owned-ref-aware app code could be compiled in use-GC-mode together with the non-owned-ref-aware lib code. The |
Probably but we have the technology in the compiler to iterate over all control flow paths for a proc body. What would be required is some "abstract RC effect summary" for every proc so that the analysis doesn't have to inline every proc. Solving this for the general case seems intractable indeed but the tool could tell you if the program couldn't be proven and you should keep the runtime checks. It's too early to put too much thought into it. If you care that much about correctness why do you even use |
Correct, but It's too early to say if they stay optional in the long run or not.
Correct and I propose to not support the GC mode forever for this reason to avoid a permament split. But it's much too early for this decision. |
The ´´delete´´ is an unlink. It should be: unlink gets a |
What do you think we have been doing in this thread? ;-) |
I may be all wet on this, but shouldn't the goal be to clarify dialog with the user? I agree with @mratsim that we need language constructs to codify our Explicit language isn't just easier for us to read and reason about; it's easier for the compiler to analyze and optimize, too. The rewards are more accurate static analysis, more precise debugging information, and superior optimization. These aren't Rusty hoops we have to jump through; these are the means by which we may sidestep those hoops, and (not least) they help the compiler better serve our actual intent. If the specificity with which we can notate semantics improves, then I don't see how it will matter which backend the user selects -- the compiler will be able to do the Right Thing given the circumstances, both with libraries that assume a GC and with a scope-based MM technique and with whatever new-fangled automanual gizmo arrives in 2.0. |
@disruptek It shall be called |
I chose nim for the reason of being a fast and elegant language that was easier to read and understand than c/c++ (while being almost as fast), but faster and less verbose than c#/java without sacrificing safety. I'd be happy to know that v1.0 will be released under the initial assumptions on resource management, or else to know that the change will not happen overnight. And, if possible, to prioritize fixing the most important bugs first before any library adaptations. |
I'm really happy to hear about this as it seems to be a major improvement to the language. I actually always assumed that Araq would find some nice ergonomic way to add more safety and efficiency. However, I may need to scroll through again but from the discussion I am not 100% certain exactly where this leaves the 1.0 release and whether it actually is going into 2.0. Can I assume that there will be a 1.0 with libraries set up for GC that is available for use while we learn the new system and it gets refined? I wonder if there is a way to distinguish between these major runtime discrepancies when it comes to packages and dependency management. |
The current plan is:
Worst case: it turns out that everything fails and |
Just as untested idea: is it possible to use existing |
Seems reasonable. Is there still some hope for a |
Seems too risky, firstly it's not clear what
There are Nimble packages that provide these things and v1 needs to be about stabilizing what we have. |
I'm very interested in trying this feature if/when it is functional enough. There always seem to be some comments/fear about C# and managed languages, not having gc... etc. I'll not speculate here, but I did want to just mention my personal perspective as a potential nim user. I've got 3 years professional c# usage on 3d house design/engineering software. And 7 years as a pro c++ developer for mining optimization software. The only way I'm going to be able to touch these "less popular" languages is in my free time. Or if I can sell it to my boss :), of course. I love learning about them. Read the rust book. Read the nim book. Ported a good portion of the corange engine to rust as a hobby experiment to see what made the language tick. I would like to get into nim more. As far as I'm concerned there is little difference with coding a gc language, or a non gc from my user perspective: if the container object is thin & is on the stack, heap allocations are hidden behind some abstraction and there is some deterministic hands off destruction. All written by whoever is providing the structure. I know there are a lot of details left out there, but for "getting stuff done", it serves me. Pythonic just seems to be the syntax my mind likes best and nim has the macro system going for it, regardless of the semantics of object destruction. And it is compiled. I just wanted to chime in that this particular proposal was something that really struck me as intriguing for how I personally like to code. |
The owned refs conceps seems to be similar to Pointer Ownership Model of David Svoboda https://resources.sei.cmu.edu/asset_files/WhitePaper/2013_019_001_55008.pdf https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=55000 |
@Araq your blog post mentions backing refs with type-specific memory allocators. Right now in Would |
There is also |
That's slightly ugly, since both the default allocator and I would instead suggest for |
Bah, it's ok I guess, but we should have allocators that are page-aligned (or more than that) so that we can do bitmasking of pointers to access the underlying allocator, no need to carry the allocator around as a separate internal pointer everywhere. |
Question: Why is it necessary to have a runtime error for dangling unowned refs? Could the runtime not simply set them to Apologies if this is a dumb question, but it came to mind. |
Setting weak pointers to nil automatically is much more expensive, you would need to keep a data structure around just to be able to do this. This is particularly problematic for unowned refs on the stack, it would require a register/unregister pair for every stack slot that has an unowned ref. |
@Araq Had I but paused a moment to think of that, it would have been obvious. Thank you. |
But not necessarily a separate data structure. This is implemented by, for example, SaferCPlusPlus' "registered pointers" by making each pointer object double as a node in a linked list that tracks all the references currently targeting the given object. The registering and unregistering does add some expense. SaferCPlusPlus provides both "auto-self-nulling" registered pointers and non-owning alias counting "norad" pointers like the references Nim seems to be adopting, and simple micro-benchmarks for them. The micro-benchmarks indicate that the copy and assignment operations of "auto-self-nulling" pointers are, as expected, more expensive than those of "alias counting" pointers, but not by as much as I might have thought. (Shared-owning (non-atomic) reference counting pointers appear to be in the same performance ballpark as well.) In practice, my experience is that there's rarely a noticeable performance difference in the application overall. With this proposal, it seems to me that the primary use case for these non-owning references would be dynamic data structures with cyclic references. I suspect that in most of those cases either the differences in overhead of any pointer operations would be small relative to the cost of the accompanying memory allocation operations, or, for cache-coherency reasons, the organizational structure of the nodes (if not the nodes themselves) would be stored in a contiguous vector, in which case indexes could/would take the place of the non-owning pointers. No? That said, I don't disagree with the choice of using "alias counting" as the safety mechanism. But I don't think it's an either-or situation. There are (maybe rare but) legitimate use cases for weak pointers that can handle the "untimely" destruction of their target. I think that ultimately you're going to want to provide both. |
Most forms of weak pointers, including "auto niling" pointers solve a different problem, a "modelling" problem. For instance in a game your entities can "die" and then all the subsystems are "notified" about this. I don't think it's a good general purpose model -- when I write a BTree implementation etc I don't want to be notified about my now-dangling pointers, I want to fix my implementation so that it doesn't dangle. Of course, with Nim's destructors and assignments you can implement all these things, you can also implement traditional reference counting, but the proposal is about what Nim's default, builtin |
Is this part of a general "Don't attempt to recover from program logic errors." principle? Is there an inconsistency with out-of-bounds array index errors throwing an exception rather than terminating the program?
Yeah, that's a good point. If there is sufficient demand for these alternatives they'll likely end up being implemented anyway. Without adding clutter to the base language. |
Yes.
No, it is not, from https://nim-lang.org/docs/manual.html#definitions "Whether a checked runtime error results in an exception or in a fatal error is implementation specific. Thus the following program is invalid; even though the code purports to catch the IndexError from an out-of-bounds array access, the compiler may instead choose to allow the program to die with a fatal error." In fact, we restructured the exception handling to make this more explicit. |
Wow, that's strong. Is there a performance reason for doing so? |
This RFC was succeeded by #177 |
Owned refs
This is a proposal to introduce a distinction between
ref
andowned ref
in order to control aliasing and make all of Nim play nice with deterministic destruction.The proposal is essentially identical to what has been explored in the Ownership You Can Count On paper.
Owned pointers cannot be duplicated, they can only be moved so they are very much like C++'s
unique_ptr
. When an owned pointer disappears, the memory it refers to is deallocated. Unowned refs are reference counted. When the owned ref disappears it is checked that no danglingref
exists; the reference count must be zero. The reference counting can be enabled with a new runtime switch--checkRefs:on|off
.Nim's
new
returns an owned ref, you can pass an owned ref to either an owned ref or to an unowned ref.owned ref
models the spanning tree of your graph structures and is a useful tool also helping Nim's readability. The creation of cycles is mostly prevented at compile-time.Some examples:
We need to fix this by setting
dangling
tonil
:The explicit assignment of
dangling = nil
is only required if unowned refs outlive theowned ref
they point to. How often this comes up in practice remains to be seen.Detecting the dangling refs at runtime is worse than detecting it at compile-time but it also gives different development pacings: We start with a very expressive, hopefully not overly annoying solution and then we can check a large subset of problems statically with a runtime fallback much like every programming language in existance deals with array index checking.
This is how a doubly linked list looks like under this new model:
EDIT: Removed wrong
proc delete
.Nim has closures which are basically
(functionPointer, environmentRef)
pairs. Soowned
also needs to apply to closures. This is how callbacks can be done:main
is transformed into something like:This seems to work out without any problem if
envParam
is an unowned ref.Pros and Cons
This model has significant advantages:
owned ref
to C'srestrict
'ed pointers.shared_ptr
or Swift's reference counting.owned
keyword to strategic places. The compiler's error messages will guide you.And of course, disadvantages:
owned
annotations.nil
as a possible value forref
stays with us as it is required to disarm dangling pointers.Immutability
This RFC is not about immutability, but once we have a clear notion of ownership in Nim, it can be added rather easily. We can add an opt-in rule like "only the owner should be allowed to mutate the object".
Possible migration period
Your code can either use a switch like
--newruntime
and needs to useowned
annotations or elseyou keep using Nim like before. The standard library needs to be patched to work in both modes.
owned
is ignored if--newruntime
is not active. We can also offer an--owned
switch that enables the owned checks but does use the old runtime.The text was updated successfully, but these errors were encountered: