Eric's Research Logbook: Dan Grossman was Right

For those who don't follow Onward!, Dan Grossman published an essay talking about the relationship between garbage collection and transactional memory.

To give you a background, I've been working on some compiler-based transactional memory stuff in the context of Tony Hosking's RuggedJ, and on my own vision of the high-concurrency, featherweight threading runtime environment, which I'm now trying to bring to life using the components I've already built as a runtime support system, and LLVM both as an abstract instruction set and the arbitrary compiler support I need to make the whole thing fly.

The grand vision is that the compiler will graft in most of the code needed to do garbage collection and transactional memory, applying the fabled "sufficiently aggressive optimizations", which I sincerely hope will get the real costs of transactional memory down low enough to where it consistently beats locking synchronization.

Part of this, of course, involves creating a set of intrinsics, which your compiler frontend must generate. LLVM's garbage collection intrinsics are... almost... good enough. But they are just lacking enough when it comes to the kind of highly-concurrent, compiler-mangled stuff I want to do. To give you an example, LLVM gives you:

llvm.gc_read
llvm.gc_write
llvm.gc_root

Where gc_root marks a local variable (stack-slot) for visit by the collector. This is fine for your quaint, cute, lock-based, stop-the-world type collectors. But that's exactly the sort of non-concurrent barbarism we're trying to move beyond! Add in the desire to throw in STM at a future date, and it's better just to scrap theirs and build my own. Besides I really don't have much of a desire to do anything with LLVM other than pillage it for compiler machine parts anyway...

Specifically, I need a couple of things. First, I need to separate logging from the actual read/writes. I also need safepoints, at which actual collection can take place. The idea is you want only one logging action for each location that is accessed between the access and the next safe point. Enter difficult compiler analysis #1 (Eliot and I found out just how difficult...)

Second, I don't have stacks. The basic assumption is you have a frame pointer which points to an object which gets traced just like everything else. The basic assumption is that everything gets GC-allocated. Of course, the flatten optimization that you find in basically every functional language compiler on the planet will squash the all the frames for a non-recursive call tree into one (and my calling conventions allow this). Or you can figure out when a C-style stack actually is better and use it instead. But the point is that the gc_root is totally unnecessary.

In a distant third, allocation can't be done with a function. There's more state variables that get carried around, used for allocation, and passed into safepoints to be modified. This is cruft, but it's important cruft. All the same, I can probably have LLVM stick it all in for you, assuming you actually have a gc_alloc intrinsic. And of course, gc_alloc needs to know the type of the thing it's allocating since I do everyhing with type signatures.

So I have something along the lines of this instruction set:

gc_alloc
gc_log (r/w)
safepoint

And the normal LLVM load/store stuff.

Now this is where it gets interesting. At some point, I'll want to add TM intrinsics. The basics of what you want are this:

tx_begin (open/closed)
tx_validate
tx_commit
tx_abort
tx_retry
tx_log

And again, the usual LLVM load/store stuff. Notice the log. You now have two of them: one for the collector and one for the transactional memory. It gets deeper...

In the library-based world, you allocate objects, build up this list of logged locations, and then process it when you abort (or commit) a transaction. In GC world, you do the same. All of this is opaque to the compiler. But if you have a compiler that you can make do as you wish, you can do things differently. The compiler can stash the log itself in an SSA value, carrying it along as it goes. How is this special, you ask? Once again, apply the illustrious flatten. Whenever possible, you'll squish as much of the log out into SSA values, which will in turn be stashed in registers where possible and frame slots when not. Now, spit out the code to grind through the log directly into the program itself, and apply all your other optimizations. Result: much better than calling out to a library (I hope).

What's the point of all this digression? You'll have different behaviors depending on whether or not you're logging a transactional-only, gc-only, or gc and transactional thing. You also really don't want to be having a dual regime when it comes to logging, since you'll be duplicating a whole lot of stuff, and gobbling up alot of registers in the process.

Also, the TM really needs to know about and affect the placement of safepoints. Safepoints represent a potentially huge gap in time, so the transaction probably wants to validate immediately before and after. You'll also want at least some kind of notification to the contention manager, and in the future, we can imagine some kind of crosstalk between the TM system and the scheduler (I'm running a transaction that's logging A, B, and C, so it'd be really nice if you don't schedule anyone who walks all over them...)

Last and not least, safepoints can make the impossible possible. In a naive world, you can't do undo-logging with lazy conflict detection, because you might see inconsistent states. Likewise, some old lock-free TM implementations had the same problem. Let people see inconsistent states, and you get the potential for wacky control flows that shouldn't happen, loops that should terminate not terminating, and I/O side-effects that shouldn't happen happening (I/O is to transactional memory what nuclear waste containment is to nuclear power). In an I/O-less world, where you don't have some of the less-cooperative control-flow constructs (like continuations or exceptions), you can deal with the problem by using a timer interrupt to break and check consistency every so often. Safepoints are the lovely alternative to this at a fraction of the cost. Toss your safepoints around well enough, and you can actually do things like lazy detection with undo logging, or revive those old lock-free STMs...

Lastly, there's potential for some really nasty or beneficial interactions between GC and STM, depending on how naïve or smart you are. For instance, if you allocate inside a transaction, you don't want to discard the object and allocate another one just because you got aborted. Likewise, GC logging inside a transaction is likely to lend itself to certain subtleties. Start throwing in stuff like boosting, and... I'm getting ahead of myself. I'm sure there's probably several papers worth of stuff you could do here.

The last thing I'll mention is that alot of transactions can probably be compiled into simpler implementations. For instance, single-word transactions don't need a full TM suite; you can just do them with CAS or LL/SC instructions. Likewise, read-only transactions can be transformed into an atomic snapshot. Negotiating the interactions between these and GC logging will require more care in the compiler.

So in summary, Dan Grossman, you were right.

Eric's Research Logbook

Thursday, June 11, 2009

Dan Grossman was Right

No comments:

Post a Comment

About Me

Followers

Blog Archive