One of my research activities deals with compiling, analyzing, and optimizing transactional programs. I'm doing work for Tony Hosking's RuggedJ stuff, but also gathering intelligence and experience for doing my own LLVM work at a later time, which I expect will have much more potential, since I won't be tied to JVM, Java, and their limitations.
The primitives one might see in a TM system, based on the latest research might look something like this:
open (r/w)
log (r/w)
validate
begin (open/closed)
commit
abort
retry
A beneficial, but difficult optimization is to eliminate redundant open and logging operations. This is also quite beneficial for concurrent garbage collectors, as I discuss in a previous entry.
An important point here is that open operations are idempotent, and cannot be undone. This means that once a given object is open, it stays that way until the end of the transaction, and additional opens have no effect. Also, there is no way to "unopen" an object, aside from commiting the transaction.
Simple data-flow analysis is a start, but there's another subtlety. For instance, in the following example:
%x = load %ptr
%y = %x
open read,%x
open read,%y
The second open read is unnecessary since the values %x and %y both point to the same thing. This notion of "pointing to the same thing" has a name: regions. A region is some spot in memory where some number of values are located. It originates in the context of region types, which are able to reason about memory allocation and deallocation. It also works for what we're doing: object-granular conflict detection (though in the LLVM stuff, I'd like to try a more intelligent strategy of grouping related shared fields together).
We can track regions and note which ones have been opened for reading and writing (and which fields have been logged, or any other effect we might wish). When we do a phi instruction, we create a new region, whose state is the lower bound of all the states of the arguments (unless all the arguments have the same region, in which case we just keep it).
It's more complicated, though. If we create the merge regions based on the SSA values being merged, we might be too conservative. Consider this:
%w = load %p1
%x = load %p2
br ..., label %L1, label %L2
L1:
%y1 = %w
%z1 = %w
open read %w
br label %L3
L2:
%y2 = %x
%y1 = %x
open read %x
L3:
%y3 = phi [%y1, L1], [%y2, L2]
%z3 = phi [%z1, L1], [%z2, L2]
open write %y3
We end up falsely treating %y3 and %z3 as having a separate region, even though they don't. The answer might seem to create the new regions based on the regions being merged, but this doesn't quite work. Consider:
%w = load %p1 ; w -> r1
%x = load %p2 ; x -> r2
br ..., label %L1, label %L2
L1:
%y1 = %w ; y1 -> a
%z1 = %x ; z1 -> b
open read %y1
br label %L3
L2:
%y2 = %x ; y2 -> b
%z2 = %w ; z2 -> a
open read %y2
L3:
%y3 = phi [%y1, L1], [%y2, L2] ; y3 -> (a | b)
%z3 = phi [%z1, L1], [%z2, L2] ; z3 -> (a | b)
open read %y3
open read %z3
We would treat %y3 and %z3 as being the same region. However, notice that we can get rid of the first open read, but not the second. This is because the effects visited upon a and b differ based on the path by which they arrived. Therefore, we must record the paths. This gets nasty the moment we have loops, of course, but recall that we are dealing with monotonic, idempotent effects. Running through a basic block 5 times is the same as running through it once. Therefore, we can record merged regions as a "primitive region", plus a set of control-flow edges. The reason it's control-flow edges will become apparent in the next example.
There's still one problem. Consider this loop:
L1:
%z = %y
open read %z
%y = %x
open read %y
%x = (new object)
br ..., label L1, ...
Are %x, %y, and %z all pointing to the same region? By our previous method, they are. However, this would lead us to eliminate the open read %y, when in fact, we want to eliminate open read %z. The problem is that we didn't account for the fact that each look iteration, a "fresh" region gets assigned to %x.
The answer here is to associate with each region an "age" number, representing the loop iteration during which we saw it. We disregard this age when checking to see if anything changed, for the purposes of telling if we've reached a fixed point, but we count it when merging.
In summary, merged regions are a set of primitive regions, their ages, and the control flow edges they've visited. When we merge two regions, we line up all the equal primitive region-age pairs, and union their paths). We get the state for regions created this way by taking the lower bound of all the mergees. We can supplement this with alias analysis. We can treat loads from pointers we determine "must alias" as yielding the same region. Once we have regions assigned in this way, we can eliminate extraneous operations, or perhaps do more aggressive optimizations like PRE.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment