Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify Nim's GC/memory management options #177

Closed
Araq opened this issue Nov 6, 2019 · 38 comments
Closed

Unify Nim's GC/memory management options #177

Araq opened this issue Nov 6, 2019 · 38 comments
Milestone

Comments

@Araq
Copy link
Member

Araq commented Nov 6, 2019

This is an alternative to #144. #144 is not dead but scheduled for Nim version 2 which is a couple of years away. The reason for this is that owned is quite a breaking change and we're better served with stability now that version 1 is out and with us for the years to come.

GC: arc

The plan is to leverage the technology developed for destructors to give us a new GC mode, --gc:arc that maps ref to atomatic refcounting. RC operations are optimized away when you use the sink and lent annotations as they are outlined in https://nim-lang.org/docs/destructors.html. How cycles are dealt with is an open question, the first implementation will ignore the problem, but there are a couple of solutions:

  1. Detect potential or definite cycles at compile-time.
  2. Detect cycles at run-time in some debug mode.
  3. Introduce a collectCycles(someRef) API for manually freeing cyclic data structures in a subgraph of the heap.
  4. Use the .cursor annotation on object fields that are "parent" / unowning pointers. More on this later.

The GC modes

The plan is to deprecate and remove the GC options until we are left with only --gc:arc and --gc:boehm. Both options can handle shared heaps and work well with multi-threading, --gc:arc is optimized for latency and works for all heap sizes, small or really big, --gc:boehm is better for throughput on medium sized heaps (medium here means about 500MB). --gc:arc also works out of the box on weird targets such as webassembly and has better interop with C/C++/Python/etc.

The .cursor annotation

Besides cycles this proposal has a shortcoming in that so called "cursor" variables imply RC ops. These are very hard to optimize away in a multi-threaded setting and yet also can add considerable runtime overhead:

var it = listRoot # it is a 'cursor' variable
while it != nil:
  use(it)
  it = it.next

The solution is to annotate it with .cursor to tell the compiler it doesn't need to emit RC ops. In fact, .cursor more generally prevents object construction/destruction pairs and so can also be useful in other contexts. The alternative solution would be to use raw pointers (ptr) instead which is more cumbersome and also more dangerous for Nim's evolution: Later on the compiler can try to prove .cursor annotations to be safe, but for ptr the compiler has to remain silent about possible problems.

It is natural to extend the .cursor pragma to object fields in order to break up cycles declaratively:

type
  Node = ref object
    left: Node # owning ref
    right {.cursor.}: Node # non-owning ref

But please notice that this is not C++'s weak_ptr, it means the right field is not involved in the reference counting, it is a raw pointer without runtime checks.

@mratsim
Copy link
Collaborator

mratsim commented Nov 6, 2019

That sounds good, I've notive lots of overhead with a tree data structure with a backpointer to the parent.
However if the {.cursor.} is a raw pointer, does that mean that ref cannot be moved in memory by any of the GC now? This would require removing mark-and-sweep from the get go.

@Araq
Copy link
Member Author

Araq commented Nov 6, 2019

That's a good point, since we don't lose the type information (it's still a ref after all) a moving GC is very possible. In other words, a moving GC would ignore the .cursor annotation.

@Araq Araq added this to the Nim2020 milestone Nov 8, 2019
@gemath
Copy link

gemath commented Dec 25, 2019

The plan is to leverage the technology developed for destructors to give us a new GC mode, --gc:arc that maps ref to atomatic refcounting.

That's the kind of typo that makes people think the wrong thing. :o)

@Araq
Copy link
Member Author

Araq commented Sep 21, 2020

Update: With 1.4 we'll ship --gc:orc to the masses. --gc:orc is arc plus a cycle collector. We hope this new GC mode can be made the default in upcoming releases but either way the following memory management options should disappear:

--gc:none: To be replaced by --gc:arc.
--gc:refc: To be replaced by --gc:orc. Remark: The refc's realtime API additions are still missing.
--gc:regions: To be replaced by --gc:arc. Remark: Might need a swapable allocator runtime component.
--gc:boehm: To be replaced by --gc:orc. Remark: Remains to be useful for debugging and is very little work to maintain as it's an external component.

(As a first step, they simply stop being documented.)

We would be left with:

--gc:arc: To be replaced with --gc:orc once link-time dead code removal for the orc type traversals has been introduced.
--gc:orc: The upcoming new standard GC.
--gc:markAndSweep: Nothing else works as well for the Nim compiler itself which is about throughput, not latency. Eventually bootstrapping will work with --gc:orc and then markAndSweep can be phased out too.
--gc:go: For best interop with Golang code. Remark: Is very little work to maintain as it's an external component.

@vitreo12
Copy link

As someone whose project entirely depends on custom memory management ( Omni ) I would strongly prefer the --gc:none option to remain in the language.

What's the reasoning of taking it out completely? I believe it's still a nice feature to have for people like me that want to use Nim as sort of a C replacement, without having to worry about memory being freed under the hood, however smart this freeing is. This, I believe, is especially important in the case of building shared libraries where there is a clear distinction between functions that allocate memory and functions that consume the pointers to that memory. Moreover, in the case of hard real time processing (mine is audio), there can't be any risks of a specific function trying to allocate / deallocate memory during the audio loop, which is something that, in the case of my project Omni, I take care of doing. This, however, is only possible with memory pointers whose lifetime is manually handled, especially if this happens across different functions of a built shared library.

Considering this feature is already in the language, and it doesn't involve more work to keep it alive, I really hope that there is a way it will be kept as it is.

@Araq
Copy link
Member Author

Araq commented Oct 15, 2020

But you're free to use Nim's ptr for these cases. (Or containers built on top of ptr.) The different GC options create different language dialects effectively, libraries work much better with a single GC switch.

@vitreo12
Copy link

vitreo12 commented Oct 15, 2020

I do already use ptr in conjunction with memory allocators that exist outside of the shared libraries that I am building.

I just tried to compile some of my code with --gc:arc, and I see quite some things added to the generated C code, which I want to avoid for maximum performance, especially considering that the code I have now already works and it's tested with --gc:none.

I just have problems understanding the choice of taking it out of the language (considering it is already there), as I believe it can be useful, even just for people using Nim as C replacement (thus, all the ptr stuff, without anything more).

Moreover, I believe that gc:none doesn't collide or overlap with any option that is already there, as it might be the case for other GC options: it's just a very bare bone option that sort of allows you to write low level C code with Nim syntax (with all the conveniences of the compile time code generation to create DSLs, like in my case), which is something that is quite unique and definitely loved by people (I am sure I do).

@Araq
Copy link
Member Author

Araq commented Oct 15, 2020

Well you're right that supporting --gc:none isn't all that taxing but I would like to explore what --gc:arc adds that you think you can avoid. I mean, never freeing memory isn't exactly a production ready solution... :-)

@HugoP707
Copy link

HugoP707 commented Oct 15, 2020

Although arc is probably superior to both gc:none and gc:regions, i still think that being able to turn off the gc if you will is a nice feature to have.

The good thing about regions compared to gc:none is that you can always free the memory you use, so in case of saving one of them (if any) id say regions is more worth it

@Araq
Copy link
Member Author

Araq commented Oct 15, 2020

Comparing --gc:none to --gc:arc is much like a switch "let's turn C's free() into a nop", a switch I've never seen demanded for a C environment.

--gc:regions is more useful but the idea behind ORC is not performance (though it's usually really good, depending on the program), the idea is unification. ORC works much better with custom memory management than Nim v1.0's mechanisms did.

For example, I want to be able to use your Omni project with Nim's async framework and with mratsim's superior multi-threading solution, all in one program. With --gc:X that is pretty much impossible until we agree on a standard X. Yes this standard is suboptimal for plenty of use cases, that's the price to pay for interoperability. But if X is ORC it's the most flexible solution that we found, and you can tweak it for Omni's domain too. For example, a custom seq-like container that doesn't realloc under your hood is a couple dozen of lines of custom code.

@vitreo12
Copy link

vitreo12 commented Oct 15, 2020

Thing is, you can't really use Omni in anything other than Omni (it ships with its own omni compiler, which compile .omni code to shared libraries, using nim, of course). Omni is made to generate shared libraries that process audio that you would use in other environments, so you can't link it with other nim modules at all.

Having said so, what gc:arc seems to be adding to the generated C code, is a lot of useless if statements. In my project I don't use the nim stdlib at all, and if my user tries so, it will be already prompted an error to him. So, all of these statements are quite useless, as actually none of the constructs that gc:arc would handle are ever even used. In my case, memory is freed, just not automatically as gc:arc would do, but manually at a specific point in time that supports hard real time scheduling.

I am not saying that gc:none should be used in place of gc:arc, I am just saying that there are use cases where gc:none is more ideal than gc:arc (as I said, even to just use nim syntax instead of C's for low-level ptr programs), and considering that maintaining gc:none doesn't require much work (if none at all), I don't see reasons to deprecate it.

@HugoP707
Copy link

HugoP707 commented Oct 15, 2020

I think Araqs point was not specifically about omni, but about any kind of library, omni is not a good example here.

Its a fair point, but i am still not convinced about the idea of just getting rid of them completely.
Nevertheless its your language, if you think regions and/or "none" should be deprecated I am not the one to say otherwise.

About using projects with different gcs, i have always felt like it could be a nice feature, being able to have a computationally intensive part of your library use a gc and other part another one, wouldnt this be possible? cant the gcs be treated as libraries? (this is a bit offtopic already)

@Araq
Copy link
Member Author

Araq commented Oct 15, 2020

@vitreo12 I don't understand how this works with today's --gc:none. Do you use it with -d:useMalloc? Do you use it for the warning messages the Nim compiler produces when you use a feature that uses the GC?

but manually at a specific point in time that supports hard real time scheduling.

Ok, I get that but how do you do this with --gc:none? I'm not aware of an API that lets you do that.

@Clyybber
Copy link

Clyybber commented Oct 15, 2020

@RecruitMain707 gc:arc gives you the hooks to enable creating your own memory management strategy.
@vitreo12 Maybe we can figure out the cause of these useless if statements, the code generated by gc:arc shouldn't have these when you are not using ref/seq/string (which you shouldn't if you are using gc:none).

@vitreo12
Copy link

vitreo12 commented Oct 15, 2020

@Araq I don't use d:useMalloc. Omni works like so:

  1. The user has some .omni file, written in Omni code. This syntax is defined as a DSL on top of nim's macro system.
  2. The code gets interpreted by the nim VM and generates nim code (which doesn't use stdlib). This gets compiled to a shared library with different nim flags, including gc:none.
  3. The shared library also provides an omni.h header file, with all the entry points to the shared library. Internally, the shared libraries allows the user to define custom pointers to their own malloc / realloc / free implementations. If none is provided, Omni will use standard malloc / realloc / free.

All memory in Omni is allocated / freed with a special function, omni_alloc / omni_free, so it doesn't use nim's alloc / dealloc. This is made possible by the fact that Omni provides its custom way of allocating memory, via the struct construct.

Also, regarding the hard real time problem, Omni follows the standard object allocation (non real time) -> object consumption (hard real time) -> object destruction (non real time) way of dealing with audio code. These functions can then be called and used in other audio programming environments, like SuperCollider (omnicollider) or Max/MSP (omnimax).

Usually, you'd want to compile your omni code with one of these wrappers, as they already take care of calling the right functions at the right time for that specific audio programming environment.

@Yardanico
Copy link

Yardanico commented Oct 15, 2020

@vitreo12 do these "if statements" look like " if (NIM_UNLIKELY(*nimErr_)) goto BeforeRet_;"? If so, they're not useless - ARC/ORC use new exception implementation based on goto, see https://nim-lang.org/araq/gotobased_exceptions.html. Goto exceptions replace the older setjmp-based exception raising/tracking mechanism

@vitreo12
Copy link

@Yardanico Those are the ones. Is there a way to remove them? gc:none hasn't got them.

@Yardanico
Copy link

@vitreo12 gc:none doesn't have them because --exceptions:goto is only enabled with ARC/ORC by default. Why would you want to remove them?

@alaviss
Copy link

alaviss commented Oct 15, 2020

Isn't a bunch of ifs + goto much more lightweight than dealing with setjmp?

@vitreo12
Copy link

vitreo12 commented Oct 15, 2020

@vitreo12 gc:none doesn't have them because --exceptions:goto is only enabled with ARC/ORC by default. Why would you want to remove them?

Cause I know I won't be raising exceptions in the program. I compile disabling all the runtime checks off.

Or maybe I am just misunderstanding their usage by the way, I am a musician first and programmer secondly, excuse me for some misunderstandings! :)

@alaviss
Copy link

alaviss commented Oct 15, 2020

Cause I know I won't be raising exceptions in the program

Other parts of the stdlib can (and runtime checks do that as well, though --panics:on can optimize most of those out). I think the compiler know enough to not generate try-finally pairs when not necessary. Also Nim currently has no magical disable exceptions switch (quirky exceptions will be that switch in the future, if @Araq ever decides to pursue it).

I compile disabling all the runtime checks off.

Note that some of them are always enabled (scoped) because the logic in stdlib needs them.

@vitreo12
Copy link

I see. I managed to get (almost) the same C generated code of --gc:none with --gc:arc --exceptions:setjmp --panics:on

@juancarlospaco
Copy link
Contributor

--gc:none has very little maintenance work, implementation uses very little code,
documenting it takes no effort because is all manual, no direct critical bugs open on the issue tracker, I think it should remain.

Some people use --gc:boehm because it has a shared heap, but ARC/ORC also has shared heap, so it can be removed.

@Araq
Copy link
Member Author

Araq commented Oct 21, 2020

Update: With 1.4 we're shipping --gc:orc to the masses. It answers the question of how cycles can be handled, supports Nim's async (and Status's async design) and runs Nim's async under the Valgrind sanitzer without crashes or leaks. We expect version 1.4.x --gc:orc to be production ready for Status's needs.

The plans for the deprecation of the other GCs was met with some resistance so for now they won't be deprecated -- but they won't see much further development from the Nim core team either.

--gc:orc with its cycle collector that is based on Nim's "=hook" mechanism does meet the "unification of memory management" criterion we've been looking for. In my talks I explain how it also allows for arena-based allocation and show exciting benchmark results.

--gc:orc is optimized for latency and works for all heap sizes, small or really big and it works much better on embedded devices than the other GCs. The .cursor and acyclic annotations can be used to optimize --gc:orc further, much like it was outlined in the original plan.

@Araq Araq closed this as completed Oct 21, 2020
@HugoP707
Copy link

im fine with them not being developed any further by the nim team, its totally fair.

@juancarlospaco
Copy link
Contributor

I think that before deciding to deprecate, first ORC needs to be made the default memory management for at least 1 final release.

@capocasa
Copy link

capocasa commented Jan 18, 2021

Well you're right that supporting --gc:none isn't all that taxing but I would like to explore what --gc:arc adds that you think you can avoid. I mean, never freeing memory isn't exactly a production ready solution... :-)

Doing a bit of hardish-realtime low-latency audio development of my own. This is 2ms to get 40Mbyte/s of audio data from the microphone to the CPU to the speaker, with zero tolerance for buffer over/underruns. This only works with a patched kernel, tuned interrupts and pre-allocated memory- and no memory allocations. So you can have your own pre-allocated mempool, and you can have the stack, that's it. GC_none is invaluable in this setting, because you don't have to hope you are not accidentally creating GC overhead that will cause the dreaded audio dropouts- you know, because GC_none warns you.

It is already very hard to debug buffer over/underrounds in realtime audio code, if you add the GC layer as a potential source of issues, that disqualifies the language from use in this setting, period. This is, I guess, the unique use of GC_none- if you don't need the GC, you can be absolutely sure it's not there, and you can rule it out as the source of your Heisenbug of the day. This is a really, really big deal compared to, say, go.

Note that these properties are shared by other applications as well, such as robotics control, and industrial automation (not theoretical examples, those areas use the same kernel patches as linux audio). If you keep GC_none, you could potentially code rocket control firmware in Nim.

Nim is the only modern language in existance you can realtime audio development this in (except maybe if you count Rust)- even without the GC, Nim syntax is king, and you get templates and macros. Don't kill this.

@HugoP707
Copy link

@capocasa what prevents people from using arc/orc to code rocket control firmware?

@capocasa
Copy link

capocasa commented Jan 18, 2021

@capocasa what prevents people from using arc/orc to code rocket control firmware?

Assuming realtimish performance is is required for rocket control firmware- if this assumption is wrong then arc/orc would do fine. Realtimish performance precludes memory allocations, so not using a GC is the only option. And in this case, as mentioned above, not having the GC in the code at all, along with warning if GC memory is used, are really important for debugging.

@Clyybber
Copy link

@RecruitMain707 Nothing, but you don't have to do garbage collection if it's gonna blow up anyway :)

@capocasa You make a good point about the warnings. I think making --warning[GcMem] work with --gc:arc should make it possible for you to use --gc:arc confidently, right? (of course you could build your own gc/memory management scheme and it wouldn't warn you about that, but the same is true for --gc:none)

@capocasa
Copy link

@Clyybber Thanks! It would certainly help to have the warnings seperately. But it would still be much preferable to keep gc:none. Because then I still have to trust that ARC behaves as I expect it to. This is normally ok, but when I am on the brink of insanity due to debugging a Heisenbug, this is bad.

Imagine there is a very large rock suspended on a rope over your head where you sit. It is mounted on a top-rated cable and the release mechanism is connected to a sensor that is guaranteed not to go off when aimed at your particular hair color. In theory, you could now code with confidence. But you would still have to burn brain cycles wondering if there might not be some sort of malfunction. You would probably vastly prefer- and code better- if you could simply remove the entire contraption.

@Araq
Copy link
Member Author

Araq commented Jan 18, 2021

Realtimish performance precludes memory allocations, so not using a GC is the only option.

Hardly, but since --gc:none will stay, futile to argue about it.

@capocasa
Copy link

capocasa commented Jan 18, 2021

Hardly

Depends on your definition of realtimish- certainly as I used it.

but since --gc:none will stay, futile to argue about it.

Yay!

@Araq
Copy link
Member Author

Araq commented Jan 18, 2021

This only works with a patched kernel, tuned interrupts and pre-allocated memory- and no memory allocations.

If you could tell Nim's allocator to work from this pre-allocated memory, would that help? Nim's allocator uses constant-time (O(1)) algorithms.

@capocasa
Copy link

If you could tell Nim's allocator to work from this pre-allocated memory, would that help? Nim's allocator uses constant-time (O(1)) algorithms.

Yeah, that would most likely work, and be pretty amazing- I'd consider it quite the advance for audio programming to say the least! Using seqs for sound processing- wow. Much better than futzing around with length/UncheckedArray.

Traditionally, you have a real time thread, where the mentioned restrictions apply, but you also have a control thread where you do soft realtime (think games) or no realtime. Or, you have regular set up code, then real time code, then nonrelatime cleanup code. So it would be optimal to be able to set the GC behavior on arbitrary lines, or possibly blocks.

@disruptek
Copy link

Realize that arc is deterministic; if you have a heisenbug there, I think we would all like to see it.

@Clyybber
Copy link

warning[HeapAlloc] might make more sense in the context of arc/orc, since technically strings and seqs who aren't cyclic are not really using a gc. (same goes for acyclic refs, depending on how GC-y you consider simple refcounting to be).

@capocasa
Copy link

@disruptek I'm not worried about a Heisenbug in Arc, I'm worried about a Heisenbug in my code that might or might not have to do with it subtly interacting with arc in a way I didn't anticipate, compounding debugging difficulty. In other words, I don't mind to pay the complexity cost for a heavy feature like a GC, as long as I'm actually using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests