<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5817094669338788959</id><updated>2011-09-27T01:09:43.741-04:00</updated><category term='transactional memory'/><category term='algebra'/><category term='theory'/><category term='I/O'/><category term='operating systems'/><category term='verification'/><category term='logic'/><category term='security'/><category term='garbage collection'/><category term='compilers'/><category term='programming'/><category term='type systems'/><category term='manycore'/><category term='linearizability'/><category term='runtime systems'/><category term='multicore'/><category term='chaos'/><category term='semantics'/><category term='integer programming'/><category term='concurrency'/><category term='microkernels'/><category term='threading'/><category term='program languages'/><title type='text'>Eric's Research Logbook</title><subtitle type='html'>The research logbook of a Ph.D student in computer science, made available online.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>39</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-9017763979937638364</id><published>2011-09-27T00:20:00.004-04:00</published><updated>2011-09-27T01:09:43.750-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='I/O'/><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='threading'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>The Politics of Task and Thread Scheduling and Edge-Triggered Events</title><content type='html'>I wrote earlier on the subject of tasks vs threads.  The basic idea was of threads as a unit of &lt;span style="font-style: italic;"&gt;execution&lt;/span&gt;, and tasks as a unit of &lt;span style="font-style: italic;"&gt;executability&lt;/span&gt;.  Put another way, threads represent separate time streams, where tasks represent separable control flows.  The implication is that priorities and the epoch property matter only in the scheduling of threads; &lt;span style="font-style: italic;"&gt;tasks&lt;/span&gt; by contrast can be scheduled using the classic, dumb FIFO (or LIFO) methods.&lt;br /&gt;&lt;br /&gt;The truth is more complex.  I'd like to do event-handling in an edge-triggered, continuation-handled manner.  That is, you set a continuation to be executed in response to some particular kind of event (for instance, I/O completion, or availability of data on a socket).  There are many reasons for this, which I'm not going to discuss in full detail here.  The major reasons are 1) it's simple, natural, and close to what hardware actually gives you (even if OS APIs make it quite hard), 2) it avoids locking which would be necessary in a polling (or should I say level-triggered) style, yet it does not inhibit one's ability to implement such a style, 3) it provides just as fine an interface for long-running compute-bound tasks (particularly FFI calls) and 4) &lt;span style="font-style: italic;"&gt;&lt;/span&gt;it provides a means to identify control flows originating from system events, which is important, as I'll discuss momentarily.&lt;br /&gt;&lt;br /&gt;The lock-free scheduler system I designed has a very carefully-designed API designed to facilitate a kind of voluntary context switching used in conjunction with mailbox communication.  Changes in thread state aren't acknowledged until a safepoint, which enables a pattern of the form "set myself to sleep status", "change some part of state", "safepoint".  For example, in monitor-style programming, I'd mark myself as sleeping, append myself to the condition variable's listeners, unlock the mutex, and then execute a safepoint.  If another thread managed to catch me before the safepoint and wake me back up, then I never go to sleep (obviously, in the real world, I want to spin-wait for the expected time cost of executing a safepoint, sleeping, and waking up again).&lt;br /&gt;&lt;br /&gt;This all ties nicely into edge-triggered style events.  Tasks are really just continuations, which plays nicely into the idea of registering continuations to be run in response to events.  However, you don't want to hijack the I/O thread (or interrupt handler) to do lots of computation.  You want the I/O system constantly handling various events.  One could just take the continuation and shove it into the set of runnable tasks.  However, tasks have no notion of time or priority, and we'd like to keep it that way, or as close to it as possible (since such notions come with a cost).&lt;br /&gt;&lt;br /&gt;Ideally, tasks responding to I/O should modify data structures, possibly schedule some I/O, and then check out.  The problem is, this could involve potentially time-consuming data structure operations, or operations on lock-free structures which strictly speaking aren't guaranteed to ever finish.  Moreover, we'd like to harness parallelism for the completion of I/O tasks too.&lt;br /&gt;&lt;br /&gt;The answer may lie in the safepoint mechanism.  We can execute a set number of safepoints (or perhaps, for a set amount of time), then kick the task off the I/O timestream and onto the normal execution system.  Ideally, this should allow short-running responses to do their jobs, but won't risk hijacking the I/O timestream for long-running responses.  This requires us to split the task execution system into a time-sensitive and time-insensitive one, but that may not be a bad thing.&lt;br /&gt;&lt;br /&gt;I will write in more detail as I design more of the system.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-9017763979937638364?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/9017763979937638364/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2011/09/politics-of-task-and-thread-scheduling.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/9017763979937638364'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/9017763979937638364'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2011/09/politics-of-task-and-thread-scheduling.html' title='The Politics of Task and Thread Scheduling and Edge-Triggered Events'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-8325650369553740436</id><published>2011-07-28T22:48:00.007-04:00</published><updated>2011-08-10T08:37:37.905-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantics'/><category scheme='http://www.blogger.com/atom/ns#' term='program languages'/><title type='text'>Some More PL Ideas</title><content type='html'>These are some ideas I've had for a while, that I want to get down.&lt;br /&gt;&lt;br /&gt;1) Move away from the linear string metaphor of program representation.  Toward what exactly?  One idea I've had is to something more resembling circuits and components.  The idea is you'd wire together what basically amounts to a big dataflow graph, consisting of various primitive elements.  The "pins" would carry dependent type information.  Specifying the behavior of the components themselves is a bit more of a task.&lt;br /&gt;&lt;br /&gt;You might be able to adopt concepts from signal analysis in EE here.  Time-invariant signals are represented by pure functions, time-variant ones by Arrows. Monads (aka effects) and comonads (aka streams) are what muck everything up.  Note: I need to write that article about how to explain Big Scary PL terms like monads and comonads in simpler terms.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[update]: You could probably build a really nifty programming environment with a multitouch interface.  Also, I noticed alot of cryptographic algorithms are represented in this way.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;2) Languages whose execution model is something other than first-order term rewriting or von Neumann-style execution.  For example, theorem proving, term rewriting, algebraic normalization, convergence in a topology, manipulation of categories, and the like.  This has all been done (I think), but for the most part, not in any nice way.  I know there's a language like Pure, which does general term-rewriting.&lt;br /&gt;&lt;br /&gt;Observation: a working example of this would, of course, be relational databases.  Also datalog.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;3) Languages with different notions of the data on which they operate, other than discrete values.  For example, a language that can operate on probability distributions.  Perhaps try compiling down to sets of equations, or representations like baysean networks and use the algebraic reduction-as-execution metaphor from above to attempt to derive the resulting distribution.  Possible huge application: finance.  Use this to compute expected outcomes of a particular strategy, or position.  (I know Goldman Sachs does/did something like this; they told me about it back in 2008, and it's probably why they didn't need a bailout)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-8325650369553740436?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/8325650369553740436/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2011/07/some-more-pl-ideas.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/8325650369553740436'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/8325650369553740436'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2011/07/some-more-pl-ideas.html' title='Some More PL Ideas'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-6984640578563807076</id><published>2011-07-27T14:38:00.002-04:00</published><updated>2011-07-27T20:25:09.475-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='multicore'/><category scheme='http://www.blogger.com/atom/ns#' term='integer programming'/><category scheme='http://www.blogger.com/atom/ns#' term='manycore'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Scheduling for Heterogenous Multi/Many Core Architectures</title><content type='html'>I've been working on an approach to dealing with heterogenous multi/manycore architectures.  These are architectures in which 1) communication costs between execution units (cores) is not uniform, 2) processing capabilities of cores is not uniform, and 3) communication costs with different sections of memory is not uniform.&lt;br /&gt;&lt;br /&gt;Scheduling tasks to run on these is quite difficult, especially in any kind of dynamic fashion.  Examples of these architectures include the IBM Cell and Tilera Tile64 architectures.  Cell has 8 SPU's with their own private memories and a fast interconnect, and Tile64 has an 8x8 grid of cores with busses connecting adjacent cores.  An example of a non-uniform memory architecture would be IBM's P7 architecture, which shares L3 cache amongst eight cores in a non-uniform fashion.  Cell and Tilera both have had a hard time when scheduled dynamically.  The Cell requires expert programming, and the Tilera suffers from highly non-uniform communication costs between cores.  Even P7 has many heirarchies to it: hyperthreads, cores, dies, packages, boards, blades, and boxes.&lt;br /&gt;&lt;br /&gt;In the future, I don't see these architectures getting any simpler; rather, I see them becoming more and more complicated, and specialized, like GPU's did.  Hence, my proposed solution is to take inspiration from the way we handle GPU's (and FPGA's for that matter).  Rather than try to dynamically assign tasks to cores (which inherently requires centralization- an anathema to concurrency), I propose "compiling" programs to run on a heterogenous machine by splitting up tasks in a way that minimizes communication costs and maximizes computational power.&lt;br /&gt;&lt;br /&gt;This is most assuredly NP-hard even for simple cases; however, I suspect that modern IP solvers can handle the task.  IBM's CPLEX solver routinely handles problems on the order of millions of variables, and lp_solve has similar capabilities.  I've devised a method for formulating the problem as an IP problem.  We'll see how practical the approach is.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-6984640578563807076?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/6984640578563807076/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2011/07/scheduling-for-heterogenous-multimany.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6984640578563807076'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6984640578563807076'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2011/07/scheduling-for-heterogenous-multimany.html' title='Scheduling for Heterogenous Multi/Many Core Architectures'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-4425943990710758271</id><published>2011-07-26T23:28:00.005-04:00</published><updated>2011-07-28T23:11:47.017-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='verification'/><category scheme='http://www.blogger.com/atom/ns#' term='semantics'/><category scheme='http://www.blogger.com/atom/ns#' term='program languages'/><category scheme='http://www.blogger.com/atom/ns#' term='type systems'/><title type='text'>A Universal Core Language</title><content type='html'>One of my long-term research goals (and likely, a significant portion of my dissertation) is the design of a very powerful, very general core language and type system.  To be clear, a core language is a reduced set of syntactic forms generally not meant for human consumption.  The purpose of the core language is to define the semantics of the language, and to be able to prove theorems about it.  A compiler will translate a program into the core language, which it will then type check and subsequently translate into IR.&lt;br /&gt;&lt;br /&gt;One idea I do want to explore is widening the gap between concrete syntax and core.  The overall idea is that concrete syntax should be designed to be "nice" for people to use, perhaps even designed by legitimate designers.  However, this is a topic for another post.&lt;br /&gt;&lt;br /&gt;The core I envision could operate as a substrate for many programming languages, and would bring a very powerful dependent type system to bear for the purposes of verification.  My philosophy with regard to dependent types is that they should be a tool, but not a hindrance.  A dependently-typed language should separate the decidable and undecidable portions of its typechecking, allowing proof obligations to be ignored or deferred.  However, a language which generates these kinds of obligations makes verification, testing, and many other tasks significantly easier.&lt;br /&gt;&lt;br /&gt;I want to try to combine several things that don't tend to play nicely together: 1) impredicative polymorphism, 2) subtyping, 3) full higher-order dependent types, and 4) types describing concurrent behaviors.  The final goal is made possible by my own work, and integrates well with the third.  The first two lead to undecidability when combined, which is usually bad, but I don't particularly care, since the pathway to the last two begins with the sign "Abandon all decidability, ye who enter here".&lt;br /&gt;&lt;br /&gt;The real difficulty lies in designing a logic which combines the first two and the last two, and is sound.  Decidability can be sacrificed; soundness cannot.  And one of the first things you learn as a logician or an algebraist is that there are fundamental limitations upon what kinds of reasoning systems one can design.&lt;br /&gt;&lt;br /&gt;But assuming I can do this, the resulting core language would be quite powerful.  For one, it would be possible to verify concurrent programs with arbitrary primitives.  It would be possible to embed a language with a weaker type system (say, predicative or ad-hoc polymorphism) by synthesizing the necessary subtyping proofs by using the (decidable) type checking algorithms for those type systems.  A similar approach could be used with substructural types, region types, and other interesting decidable type systems.  In other words, it could act as a substrate for other languages- in essence, a universal core language.&lt;br /&gt;&lt;br /&gt;If nothing else, this would provide a convenient mechanism for getting test examples.  If I can embed languages like ML, Haskell (or Java, Scala, X10, or whatever...) in this core, then I can simply translate existing programs.  Of course, none of them will have any proofs of correctness for the obligations they generate, but this could be a positive, as it gives me an opportunity to show the utility of the type system I want to design.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-4425943990710758271?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/4425943990710758271/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2011/07/universal-core-language.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4425943990710758271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4425943990710758271'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2011/07/universal-core-language.html' title='A Universal Core Language'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-6419458752453180429</id><published>2011-07-26T09:04:00.003-04:00</published><updated>2011-07-26T09:15:36.545-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='theory'/><category scheme='http://www.blogger.com/atom/ns#' term='algebra'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Further Progress on the Algebra of Concurrency</title><content type='html'>I've further developed the algebra of concurrency, to the point where it now very much resembles vector space algebra.  The primary difference is that I have meets and joins instead of sums.  Differences remain, and the axioms for all these constructs are simply those from set algebra.&lt;br /&gt;&lt;br /&gt;It comes as little surprise, given the vector space algebra similarities, but I think the semantic definitions of the resulting algebra are better given using metric spaces.  It is known that a metric distance metric exists for any context-free grammar; a memory state can be described in a manner similar to a vector, which means that we can derive a distance metric between states as well (it should be obvious from this that the construct is closed under product spaces).  I should be able to use known results from metric spaces to derive that outer product spaces are likewise metric, and to justify all the rules.  Recursively-defined states describe a monotonically decreasing sequence of open sets (a Cauchy sequence), which is known to converge to a unique fixed-point.&lt;br /&gt;&lt;br /&gt;I'll be working to completely formalize this in the near future.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-6419458752453180429?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/6419458752453180429/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2011/07/further-progress-on-algebra-of.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6419458752453180429'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6419458752453180429'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2011/07/further-progress-on-algebra-of.html' title='Further Progress on the Algebra of Concurrency'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-4274303781051921583</id><published>2011-07-25T23:25:00.003-04:00</published><updated>2011-07-25T23:36:29.678-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='theory'/><category scheme='http://www.blogger.com/atom/ns#' term='programming'/><title type='text'>A Project</title><content type='html'>I've been self-studying on various topics, including topology, category theory, domain, theory, abstract algebra, and others for a while.  It occurred to me I could perhaps facilitate more rapid learning of these subjects if I undertook implementing them as programming tasks.  Experience has shown me that I &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; learn from reading, but I learn much more readily from building things.&lt;br /&gt;&lt;br /&gt;This would also lay groundwork for future investigations.  I suspect that many of the results from these branches of mathematics could be put to use to automatically synthesize large amounts of code (ie "I noticed that your module satisfies such-and-such properties from topology; here, have a distance metric and a normalization function").&lt;br /&gt;&lt;br /&gt;We'll see how it goes...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-4274303781051921583?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/4274303781051921583/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2011/07/project.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4274303781051921583'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4274303781051921583'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2011/07/project.html' title='A Project'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-571580445189459470</id><published>2011-07-21T11:26:00.003-04:00</published><updated>2011-07-21T11:48:31.510-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='operating systems'/><category scheme='http://www.blogger.com/atom/ns#' term='security'/><category scheme='http://www.blogger.com/atom/ns#' term='microkernels'/><title type='text'>Operating Systems from First Principles: Capabilities as a Replacement for System Calls</title><content type='html'>I'm going to stop indexing this series of articles, since I anticipate there will be quite a few of them.  The subject for this article is something that's been on my mind for a while.&lt;br /&gt;&lt;br /&gt;During the height of the interest in microkernels, Jochen Liedtke published a paper in which he discussed the reasons behind the performance costs of the microkernel model.  It was believed at the time that the overhead of interprocess communication and the resultant context switching was to blame.  However, his work suggested that the actual overhead of a context switch in and of itself was rather low; the true overhead came from checking permissions, rather than actually performing IPC or switching contexts.&lt;br /&gt;&lt;br /&gt;In all the talks on OS-level security I've seen, the object capability model seems to be the all-round best approach.  It is the only approach which is capable of guaranteeing that a program is only allowed to do what the user wants it to do.  Of course, a poorly-designed capability system is intolerable.  Recall the security manager on windows vista, or the third-party security programs on windows XP ("you are about to open a file. allow/deny")?  A well-designed system with capabilities will harvest information on what is allowed from user interactions, and not have to ask twice.  For example, if I select a particular file, then the system won't ask me again if I can open the file.&lt;br /&gt;&lt;br /&gt;In traditional operating systems, there's a static set of system calls, which any program can use.  The kernel checks to make sure that these calls are being used appropriately.  Replace system calls with capabilities, and the possibilities become interesting.  For one, you get a dynamically reconfigurable user/kernel interface, but that's the least of it.  You also get built-in support for capabilities, and the ability to permanently constrain a processes' capabilities.  But the big win, at least to me, is that you can present capabilities that are actually interprocess calls.  A server process can grant such a capability to its clients, and unlike in a more traditional operating system, the kernel does not need to perform the expensive security checks to make sure the underlying IPC is valid.  Possessing the capability means a process has permission.  Of course, some amount of checking does need to go on, but this can be done by the server process without the need to duplicate the effort in the kernel.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-571580445189459470?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/571580445189459470/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2011/07/operating-systems-from-first-principles.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/571580445189459470'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/571580445189459470'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2011/07/operating-systems-from-first-principles.html' title='Operating Systems from First Principles: Capabilities as a Replacement for System Calls'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-2109107503162501991</id><published>2011-05-24T15:37:00.004-04:00</published><updated>2011-05-25T13:02:15.841-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='verification'/><category scheme='http://www.blogger.com/atom/ns#' term='type systems'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Logic of Concurrency</title><content type='html'>I wrote before on an algebraic theory of concurrency.  The algebraic  construction I gave is good for describing the semantics of a  programming language.  Terms in the language are mapped onto terms in  the algebra, and execution is described by the equational theory of the  algebra.  However, the algebra is ill-suited for constructing a type  system; describing terms in algebraic terms is tedious, and it's  difficult to integrate the algebra with the logic of a type system.&lt;br /&gt;&lt;br /&gt;To build a type system which can reason about concurrency, we need to  design a logic that corresponds with the algebra.  More concretely, we  need to describe a term language and a mapping from logical terms onto  algebraic ones.  We also need an equational theory of logical terms  (described in logic), which corresponds with the equational theory for  algebraic terms.&lt;br /&gt;&lt;br /&gt;To be clear, we'll call logical descriptions of process behaviors &lt;span style="font-style: italic;"&gt;patterns&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;The biggest challenge in doing this is that we are dealing with a  predicate logic over terms of a language (and very likely, a  higher-order predicate logic).  In the algebra, we had a strict notion  of "state spaces", which in a memory system implies a specific address  space.  In the logical theory, we'll likely be using a higher-order  separation logic, which is a great deal more flexible in how it can  reason about states. For one, separation logic can express propositions  such as "no even address points to ...", "x = 5 implies y points to  ..."recursively-defined states, and propositions completely independent  of any state.  Additionally, the logic needs to be able to introduce  variables in order to deal with value bindings.&lt;br /&gt;&lt;br /&gt;I'll deal with the issue of value bindings first.  Consider the  following code fragment: "λp q. x = [p]; if x = 1 then [q] = 2 else ()"  (that is, a function that takes a two pointers, and stores 2 to the  second if the first points to 1).  The full description of this term's  behavior would be  (| p ↦ 1 ⊗ q ↦ v &amp;gt;&amp;lt; p ↦ 1 ⊗ q ↦ v | ⋅ | p ↦ v1 ⊗  q ↦ v2 &amp;gt;&amp;lt; p ↦ v1 ⊗ q ↦ 2 |) ⊔ | p ↦ v1 ⊗ q ↦ v2 ∧ v1 ≠ 1 &amp;gt;&amp;lt;  p ↦ v ⊗ q ↦ v2 |.  Of course, this contains alot of redundant  information.  We'd like to be able to write something more like this: (|  p ↦ 1 &amp;gt; ⋅ &amp;lt; q ↦ 2 |) ⊔ | ¬ (p ↦ v1) &amp;gt; (we'll stick to  these kinds of minimalistic terms from here on, since this is indeed how  the logic works in the end).  This says we can observe that p points to  1, then cause q to point to 2, or we can simply observe that p does not  point to 1 (note how the observation and effect are &lt;span style="font-style: italic;"&gt;not&lt;/span&gt;  done atomically).  The problem is we can't construct this by looking at  each part of the term individually.  We'd get the following  (equivalent) pattern by doing it that way: | p ↦ x &amp;gt; ⋅ (| x = 1  &amp;gt;&amp;lt; q ↦ 2 | ⊔ | x ≠ 1 &amp;gt;).  The problem with this term is that in a logical system, we cannot implicitly introduce the variable x.  The introduction of this issue is alot longer than its  solution; in a higher-order logic system, we simply introduce a new  constructor which ↑ (lift), which constructs a pattern from a function  from something to a pattern.  The elimination rule for lift simply  replaces it with a universal quantifier, and passes the quantified  variable into the function.  Readers familiar with logic may recognize  this as how other kinds of quantifiers are synthesized in higher-order  logic using universal quantifiers.&lt;br /&gt;&lt;br /&gt;The first issue requires a bit more work to resolve.  In the algebra, we define the closure operator over a particular state space.  Closing over a state space treats all processes defined over that state space as isolated (meaning, they can be reduced according to the equational theory of actions).  However, we have a much fuzzier notion of state spaces in the logical theory.  We can tie up this loose end by defining state spaces in terms of predicates about states.  A closure applies to an individual outer product if the state predicate implies the observation predicate (meaning, the P in | P &amp;gt;&amp;lt; Q |) for all states.  For sequences A1 ⋅ A2, we can close over P if we can close A1 over P, and we can find some P' such that | P &amp;gt; A1 = A1 | P' &amp;gt;, and we can close A2 over P'.  The rules for the other constructors are purely compositional.&lt;br /&gt;&lt;br /&gt;Finally, in order to facilitate simpler patterns, the rules for the formation and evaluation of actions are a bit different.  For starters, we allow single observations (| P &amp;gt;) and effects (&amp;lt; Q |), and products of the form | P1 &amp;gt;| P2 &amp;gt; and &amp;lt; Q1 |&amp;lt; Q2 |.  The rules for inner products are also more complex, as they are now tasked with precondition/postcondition reasoning.  Lastly, an observation | P &amp;gt; is formed from a predicate P about a state (in higher-order logic, a function from a state to a proposition), and an effect &amp;lt; Q | is formed from a binary predicate Q, which expresses the difference between an initial and final state.&lt;br /&gt;&lt;br /&gt;With these constructions, it is actually rather easy to construct a language whose types can reason about concurrency.  We can use the same techniques Morrisett et al used to construct a language with Hoare-Logic types, only we use patterns instead of Hoare triples.&lt;br /&gt;&lt;br /&gt;The remaining challenges include integrating impredicative polymorphism, recursive state predicates, and subtyping.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-2109107503162501991?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/2109107503162501991/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2011/05/logic-of-concurrency.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/2109107503162501991'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/2109107503162501991'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2011/05/logic-of-concurrency.html' title='Logic of Concurrency'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-2759438359174295148</id><published>2011-05-20T11:06:00.006-04:00</published><updated>2011-07-21T11:49:10.458-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='theory'/><category scheme='http://www.blogger.com/atom/ns#' term='algebra'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Algebraic Theory of Concurrency</title><content type='html'>The last thing I posted about a year ago was the beginning of work I've subsequently pursued to a concrete result.  That result is an algebraic theory of concurrency, which I can use to construct a logic, and from that, a type system, both of which can reason about concurrency.  There's a very strong connection to quantum theory in all this, which I'll discuss later on.&lt;br /&gt;&lt;br /&gt;For now, I'll give an overview of the algebra.  What I'm describing represents some of what is, and some of the directions I'd like to take things, so it's not necessarily mathematically rigorous.  This is meant to introduce the basics of the algebra; the complete equational theory is in the tech report I wrote.&lt;br /&gt;&lt;br /&gt;First, we need to describe states.  I do this in a manner that is compatible with linear algebra.  A &lt;span style="font-style: italic;"&gt;state space&lt;/span&gt; is a collection of distinct elements called "states", which are closed under scalar multiplication by identity and zero.  We write states using notation similar to Dirac's notation: | s &amp;gt; and &amp;lt; s |.  The &lt;span style="font-style: italic;"&gt;inner product&lt;/span&gt; &amp;lt; s | s' &amp;gt; is a mapping from two states to identity or zero (which I write using id and ⊥ instead of 1 and 0) such that &amp;lt; s | s' &amp;gt; = id iff s = s' and ⊥ otherwise.  We can describe any memory state over a fixed address space by taking the elements of the state space to be maps from addresses to values.  Also, note that any inner-product vector space can also give rise to a state space.&lt;br /&gt;&lt;br /&gt;For any two state spaces S and T, there exists a &lt;span style="font-style: italic;"&gt;product&lt;/span&gt; space S ⊗ T.  We write elements of the product space using the &lt;span style="font-style: italic;"&gt;tensor product&lt;/span&gt; s ⊗ t.  This allows us to reason about disjoint portions of a state.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Actions&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; represent the single, atomic actions that make up concurrent programs (such as atomic reads, writes, and more complex primitives).  The algebra of actions gives us a way to sequentially compose actions into larger actions.  We will use this to reason about how processes behave when running in isolation.  An action is a set of strict, (Scott-)continuous functions from a given state space onto itself.  I'll discuss the constructors for actions briefly.&lt;br /&gt;&lt;br /&gt;The most basic action is the &lt;span style="font-style: italic;"&gt;outer product&lt;/span&gt;, | s &amp;gt;&amp;lt; s' |, which represents a mapping some state t to the state &amp;lt; t | s &amp;gt; s'.  Note that if t = s, then the result will be s', otherwise, it will be ⊥.  As with quantum mechanics, we can thus reason about the composition of outer products using inner products in a straightforward manner.  For example | s &amp;gt;&amp;lt; s' | composed with | t &amp;gt;&amp;lt; t ' | is | s &amp;gt;&amp;lt; s' | t &amp;gt;&amp;lt; t '|, which we can reduce by evaluating &amp;lt; s' | t &amp;gt;.&lt;br /&gt;&lt;br /&gt;There exists an identity action &lt;span style="font-family:arial;"&gt;id&lt;/span&gt; which maps each state onto itself.&lt;span style="font-family:arial;"&gt;&lt;/span&gt;  We will be able to give a precise definition of this later on.  The empty function space is represented by ⊥ (this is distinct from the scalar ⊥, but scalar multiplication of any action by ⊥ yields the ⊥ action).  The total function space is represented by ⊤.&lt;br /&gt;&lt;br /&gt;As with states, actions are closed under tensor products.  Tensor products distribute with respect to outer products as one would expect, ie. | s ⊗ t &amp;gt;&amp;lt; s' ⊗ t' | = | s &amp;gt;&amp;lt; s' |  ⊗  | t &amp;gt;&amp;lt; t' |.  Tensor products also have the identity property with &lt;span style="font-family:arial;"&gt;id&lt;/span&gt;, ie. A ⊗ &lt;span style="font-family:arial;"&gt;id&lt;/span&gt; = A.&lt;br /&gt;&lt;br /&gt;We can represent branching and "intelligent" nondeterminism using the pointwise join of the function spaces represented by two actions.  We write this as A ⊔ B.  Not surprisingly, A ⊔ ⊥ = A and A ⊔ ⊤ = ⊤.  The pointwise meet of two function spaces models the behavior one sees in real parallelism (or what theorists call "alternation), where if any possibility can fail, then the system can fail.  We write this as A ⊓ B.  Not surprisingly, A ⊓ ⊥ = ⊥, and A ⊓ ⊤ = A.  Both meet and join distribute over eachother, and both distribute with tensor products.  The rules for distribution with outer products are more intricate.  In essence, we have that (A ⊔ B)(A' ⊓ B') = (A ⊓ B)(A' ⊔ B') = (AA' ⊔ AB' ) ⊓ (BA' ⊔ BB').&lt;br /&gt;&lt;br /&gt;Lastly, it is often useful to talk about the differences of actions.  We can model these as set differences, and the algebraic rules are exactly those from the algebra of sets (with meet and join substituted for intersection and union).&lt;br /&gt;&lt;br /&gt;Finally, we model actual concurrent programs with &lt;span style="font-style: italic;"&gt;processes&lt;/span&gt;.  The main purpose of processes is to reason about how actions interleave, and to permit some notion of reasoning about isolation of portions of state.  For starters, every action is also a process.  There are also constructors for processes that correspond to the constructors for actions.  There are meet and join constructors, tensor products, and a sequence constructor ⋅ (which corresponds to composition).  These are &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; the same as the operators on actions, however; we cannot reduce processes by the action rules unless we know the state on which it operates to be isolated.&lt;br /&gt;&lt;br /&gt;We represent this notion of isolation with &lt;span style="font-style: italic;"&gt;closure&lt;/span&gt;.  The closure operator is written as Cl[P] (normally I write a superscript denoting the state space being closed over).  Closure maps processes to actions by substituting corresponding constructors.  We can think of the process versions of the constructors as "frozen".&lt;br /&gt;&lt;br /&gt;We can model recursion with recursively-defined processes.  More importantly, we can model real concurrency with a derived operator ⋈, which is defined as (P1 ⋅ P2) ⋈ P = P ⋈ (P1 ⋅ P2) = ((P1 ⋈ P) ⋅ P2) ⊓ (P1 ⋅ (P ⋈ P2)) and A ⋈ A' = (A ⋅ A') ⊓ (A' ⋅ A).  In other words, ⋈ represents all possible interleavings.&lt;br /&gt;&lt;br /&gt;This finally gets us an acceptable notion of bisimilarity.  Formally, we write it as follows P1 ≈ P2 iff ∀P. Cl[P1 ⋈ P] = Cl[P2 ⋈ P].  In other words, P1 is bisimilar to P2 if it behaves the same when running beside all other possible processes.&lt;br /&gt;&lt;br /&gt;It is possible to formulate a similar reasoning system in a higher-order predicate logic (such as ECC), which can then be incorporated into a type system to allow reasoning about concurrency.  I will discuss this later.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-2759438359174295148?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/2759438359174295148/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2011/05/algebraic-theory-of-concurrency.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/2759438359174295148'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/2759438359174295148'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2011/05/algebraic-theory-of-concurrency.html' title='Algebraic Theory of Concurrency'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-3461583978611125913</id><published>2011-05-20T10:59:00.002-04:00</published><updated>2011-05-20T11:02:32.011-04:00</updated><title type='text'>Restarting</title><content type='html'>I've not posted in this logbook in quite some time.  I've been extremely busy, first as a research intern with IBM Research last summer, and then working on a project which serves as a Ph.D qualifier.  However, after lunch yesterday here at IBM (I'm an intern again this summer), I've decided I'm going to start writing again.&lt;br /&gt;&lt;br /&gt;It'll probably take me a while to write out all the various ideas I've had over the past year, but hopefully it should be interesting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-3461583978611125913?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/3461583978611125913/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2011/05/restarting.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3461583978611125913'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3461583978611125913'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2011/05/restarting.html' title='Restarting'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-1627567969288126617</id><published>2010-04-19T19:58:00.003-04:00</published><updated>2011-05-20T11:06:05.413-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='verification'/><category scheme='http://www.blogger.com/atom/ns#' term='type systems'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Beginnings of a Theory of Concurrent Processes</title><content type='html'>First post in a very long time.  I've been working on things that will be turned into intellectual property one day, so I can't post in the clear about them.  However, this pertains to my first potential PhD thesis topic...&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm pulling ideas together for a general theory of concurrent processes.  First, background.  One of my older interests, π-calculus and its predecessor, CPS both describe concurrent execution using synchronous channels, which are a very peculiar construction.  They are peculiar in that they effectively step two threads (processes) at once.  I was actually able to use this property to set up two separate proofs (which I have yet to successfully publish).  The primary shortcoming of π-calculus is that it cannot describe the more common model of shared-heap interactions.  This is where I'm trying to go, of course.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The theory I'm developing thus far is actually &lt;i&gt;very&lt;/i&gt; general.  I'm hoping that by taking this approach, I can avoid getting entangled in the nastiness of memory semantics.  Also, quantum mechanics has been an &lt;i&gt;excellent&lt;/i&gt; source of ideas for my efforts thus far.  Entanglement, operators, tensor products, the Quantum Cloning Theorem, and others all seem to have analogs in concurrent execution.  This of course makes sense; quantum mechanics describes general physical systems; my theory will describe general computational systems.  The remainder here will be very much scrawled notes, interspersed with more coherent thoughts, as are my thoughts at this point.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In general, states are sets of propositions.  I'm unsure about the exact details.  There's several ways I can go here.  One approach is to model them as a kind of "value space": that is, a mapping from coordinates (which, in computing will be one-dimensional and discrete (and total-ordered, with multiply and add and the ring laws, blah blah) but nothing prohibits more exotic constructions) to values of some kind, and with more general properties providing more abstract reasoning ability.  This is nicely close to computation.  Another approach is simply "sets of truthful propositions", or "truth assignments of propositions".  This is nice because it can reason about &lt;i&gt;anything&lt;/i&gt;, not just computation.  But it begs for a minimal basis set (and computational states are notoriously high-dimensional).  It also raises the question of relevance.  Logic-as-space is to my knowledge unexplored, relevance logic is quite well explored (it's basic substructural logic, probably the least-used variation).  Somewhere in the middle might be the best approach.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've made more progress with regard to semantics.  I'd like to get away with not tying my theory to any one representation of processes, but I've managed to redesign programming language semantics in a way that enables this, and gives rise to a very elegant construction of synchronization as well.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You can also apply quantum mechanics' tensor products and entanglement to the descriptions of state.  Separation logic is &lt;i&gt;invaluable&lt;/i&gt; when it comes to describing propositional states.  It permits isolation.  The tensor product ⊗ in quantum mechanics is completely analogous.  The Quantum Cloning Theory also comes to bear.  In quantum mechanics, you cannot "split" a quantum state without having it be entangled with its original.  A similar theorem applies to states.  You can easily split a state with a separating conjunction, so as to reason about how two processes act upon it, assuming you can cleanly separate it.  However, if you can't then you get two states which affect eachother (quantum entanglement).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I call it &lt;i&gt;differential semantics&lt;/i&gt;.  Most semantic descriptions of stateful languages simply include the state in their operational semantics.  The differential semantic approach is to describe the possible observations and effects that a given process can cause on state.  This makes building a type system for describing concurrent effects easier, and it also &lt;i&gt;removes the state from all semantic descriptions&lt;/i&gt;.  The semantics are blind to any real state; they reason only about what might possibly happen.  Extend this to synchronization, and you get another elegant description of something that's usually very messy: possible future semantics.  We can describe synchronization in terms of possible outcomes, rather than getting into the ugly details of how it might be implemented (and losing generality).  The huge advantage of this approach is that it can turn synchronization problems into classical computing problems, and then you can reason about their difficulty.  I used this to show that combining synchronous channels and transactions yields a synchronization problem equivalent to solving SAT problems.  A &lt;i&gt;very&lt;/i&gt; useful result would be finding the point at which a system requires global synchronization (which is bad)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've described a language using differential semantics, but I've not yet extended the approach to general possible-future semantics.  This is largely because my type system can't reason about the behavior of multiple threads.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To pull this off, I really need four things: a way to reason from sequences of observation/effect pairs to possible futures, a way to treat groups of threads as a single thread, a way to describe the interactions of possible futures without getting bogged down in a mess, and design a type theory that can reason about the interactions amongst multiple threads.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If I can pull all this off, and wrap it up in a nice package, with all the theoretical heavy-lifting done, then I may very well have a thesis topic...&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-1627567969288126617?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/1627567969288126617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2010/04/beginnings-of-theory-of-concurrent.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/1627567969288126617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/1627567969288126617'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2010/04/beginnings-of-theory-of-concurrent.html' title='Beginnings of a Theory of Concurrent Processes'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-5180696667993867952</id><published>2009-11-16T13:21:00.004-05:00</published><updated>2009-11-16T13:50:09.193-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='garbage collection'/><category scheme='http://www.blogger.com/atom/ns#' term='transactional memory'/><title type='text'>LLVM Versions of the Intrinsics</title><content type='html'>The process of implementing the GC/TM instruction set I described earlier has brought up some small revelations and oversights, which I'll list here.&lt;br /&gt;&lt;br /&gt;First and foremost, the type taken and yielded by the &lt;span style="font-family: courier new;"&gt;safepoint&lt;/span&gt; instruction needs to be something called a "context", which should contain everything I described &lt;span style="font-family: courier new;"&gt;mbox&lt;/span&gt; as containing, except that it also needs a mailbox pointer itself.  I'm torn as to exactly how to do this.  The assumption I've been making is that &lt;span style="font-family: courier new;"&gt;safepoint&lt;/span&gt; would be translated into code that (if the &lt;span style="font-family: courier new;"&gt;safepoint&lt;/span&gt; actually ends up going into the scheduler or GC) stores all the information in a context back into the mailbox itself (hence the need for the mailbox pointer), calls into the runtime system, then extracts that same information out of the mailbox once the thread resumes.  This would seem to produce alot of extraneous state in the program.  However, LLVM's optimizations should eliminate any extraneous loads and stores if some component of the context isn't used between &lt;span style="font-family: courier new;"&gt;safepoint&lt;/span&gt;s.  Additionally, the instructions which store to or load from the mailbox can be positioned to minimize register pressure.&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-family: courier new;"&gt;spawn&lt;/span&gt; and &lt;span style="font-family: courier new;"&gt;gcalloc&lt;/span&gt; instructions need to take and produce a full context, not just a &lt;span style="font-family: courier new;"&gt;gcallocator&lt;/span&gt;, as allocation can potentially have the same effect as a &lt;span style="font-family: courier new;"&gt;safepoint&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;The LLVM &lt;span style="font-family: courier new;"&gt;getelementptr&lt;/span&gt;, &lt;span style="font-family: courier new;"&gt;load&lt;/span&gt;, and &lt;span style="font-family: courier new;"&gt;store&lt;/span&gt; instructions ought to be modified, rather than creating new &lt;span style="font-family: courier new;"&gt;read&lt;/span&gt; or &lt;span style="font-family: courier new;"&gt;write&lt;/span&gt; instructions.  Of particular interest is &lt;span style="font-family: courier new;"&gt;load&lt;/span&gt;/&lt;span style="font-family: courier new;"&gt;store&lt;/span&gt;s operating on structures.  This will require extraction/injection of &lt;span style="font-style: italic;"&gt;all&lt;/span&gt; the double-pointer fields in the structure, which will require special backend code-generation.&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-family: courier new;"&gt;touch&lt;/span&gt; instruction also needs more arguments.  In the case of GC objects, I need a pointer to the object header, as well as a pointer to the thing that was touched.  In the transactional case, I probably want to allow for "data groups", which would generalize the cache-line/object conflict detection presently used.&lt;br /&gt;&lt;br /&gt;And lastly, I'm unsure about what should happen with transactions when you hit a &lt;span style="font-family: courier new;"&gt;safepoint&lt;/span&gt;.  At the least, validation of the entire transaction before and after actually suspending would be a good idea.  There should also be some kind of deprioritization that can take place.  I'll be looking into this more as I get further into implementation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-5180696667993867952?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/5180696667993867952/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/11/llvm-versions-of-intrinsics.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/5180696667993867952'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/5180696667993867952'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/11/llvm-versions-of-intrinsics.html' title='LLVM Versions of the Intrinsics'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-8742438725062487863</id><published>2009-10-11T14:34:00.003-04:00</published><updated>2009-10-11T16:21:51.759-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='operating systems'/><category scheme='http://www.blogger.com/atom/ns#' term='threading'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Operating Systems from First Principles, Part 2: Mob-Job Scheduling</title><content type='html'>As previously discussed, there are two different ways of looking at concurrency.  The more common pthread model presents an interface in which parallelism is explicit.  The model found in parallel functional languages, Cilk, or similar environments is one in which parallelism is an optimization.  From a different angle, the pthread approach is an interface for a scheduler, where the Cilk-style approach is an interface for a dispatcher.  And of course, schedulers produce concurrency, while dispatchers consume them.&lt;br /&gt;&lt;br /&gt; Put the two together, and you have a nice little hylomorphism.  This, of course, is an opportunity for optimization.  Put another way, it's a potential inefficiency.  Since we will have to put the kernel/user boundary somewhere in here, the goal is to make this as efficient as possible.&lt;br /&gt;&lt;br /&gt; Fundamentally, this is an extension of the uniprocessor scheduling problem.  In one way (the current dominant) of looking at it, we apply uniprocessor scheduling techniques, except we have multiple processors to schedule.  This works nicely when we have a bunch of threads running in isolation; it introduces no more complexity than is needed.  However, this isolation isn't necessarily representative of reality.  An alternate (also flawed) view of things would be to turn over all the execution resources of a machine to one task at a time.  This is ill-advised, but it has its virtues.  Tasks which consist of many threads are likely to run better: reactive mutexes and other synchronization constructs will work better, interthread communication will work better, and cache- and memory-affinity will help, and load-balancing is much more predictable, among other things.  This approach works quite well, in fact, as long as each task can effectively use all the CPUs it is given.&lt;br /&gt;&lt;br /&gt;The truth is we want a combination of the two: we want to assign groups of CPUs, call them "mobs" to groups of threads, call them "jobs".  Theoretically, this is multimachine scheduling, which is a hard enough problem even when we know the future (which we don't), so we'll rely on simplifying heuristics.  It sounds complicated, but in reality the problem can be solved using a modification of a standard epoch scheduler.  For any given scheduling cycle, we can assign the highest priority job it's requested CPUs, then repeat the process with the leftovers (if there are any).  The only question is how to differentiate between jobs requesting large numbers of CPUs and those requesting smaller numbers.  I would argue that the scheduler should favor the smaller requests for several reasons: 1) smaller requests would tend to correspond to event handlers or other highly interactive jobs, which are likely to process a small amount of work, then yield their CPUs, 2) in an activation-type interface, if a job has a large number of idle activations, its CPU demand will likely increase in the near future, so it should be allowed to run, and 3) such an approach would preferentially run multiple, simultaneous jobs as opposed to allowing one to take over all the processing resources of the system (though, due to the epoch property, it will be allowed to run eventually).&lt;br /&gt;&lt;br /&gt;The approach I envision would be to compare jobs requesting various CPU counts, scaling their priorities by the number of CPUs requested (priority / log CPUs would probably be a good formula).  Schedule the highest ranking job, then repeat, first trying the jobs that don't request more than the remaining CPUs, then trying those that do if there are no such jobs remaining.  Of course, designing a lock-free, concurrent algorithm to do all this is tricky to say the least, especially if it's supposed to deal with processor affinity.&lt;br /&gt;&lt;br /&gt;As for the actual interface, activations are the &lt;span style="font-style: italic;"&gt;only&lt;/span&gt; sane way to go about it, for a number of reasons.  In essence, activations are the only way that you won't lose information when crossing the kernel/user boundary, and they are the only way that the kernel's own scheduling facilities can be properly constructed in user space.  I'll go over the particulars later.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-8742438725062487863?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/8742438725062487863/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/10/operating-systems-from-first-principles_11.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/8742438725062487863'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/8742438725062487863'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/10/operating-systems-from-first-principles_11.html' title='Operating Systems from First Principles, Part 2: Mob-Job Scheduling'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-3896470175574798762</id><published>2009-10-04T00:04:00.004-04:00</published><updated>2009-10-04T15:27:28.880-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='operating systems'/><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='threading'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Operating Systems from First Principles, Part 1: Rethinking Threads</title><content type='html'>Arguably the most basic functionality of any modern OS is multitasking, and a decent threading model is fast becoming a required feature of any modern language runtime; therefore, it makes sense that I start what will in time become a lengthy elaboration with this topic.&lt;br /&gt;&lt;br /&gt;An often-missed subtlety is that there are two different goal sets that can arise regarding multiple processors and threads, and though the realization of one &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; support the other, this may not be the &lt;span style="font-style: italic;"&gt;best&lt;/span&gt; way to go about doing things.&lt;br /&gt;&lt;br /&gt;The first goal is the most common: having the appearance of an infinite number of processors regardless of the underlying hardware.  The primary concern here is interactivity.  Raw number-crunching can be postponed provided that threads which acknowledge events are able to do so.  This is what we all learn in undergrad, and is the goal of a pthread-style API.  This is the &lt;span style="font-style: italic;"&gt;production&lt;/span&gt; of concurrent execution, sometimes from a single processor, sometimes from several.&lt;br /&gt;&lt;br /&gt;The second goal is increased performance, or the &lt;span style="font-style: italic;"&gt;consumption&lt;/span&gt; of concurrent execution.  Because the pthread model is so common, we might be tempted roll this line of thinking into it, but there are a few key differences.  Parallel evaluation of terms differs from pthread-style concurrency in that it is purely an optimization; it does not affect the outcome of the terms in anyway, and .  Furthermore, the interactivity and timing issues are irrelevant.  The goal here is to commit more processors to a task, hopefully achieving better performance.  Cilk is probably the best example of this particular way of doing things.&lt;br /&gt;&lt;br /&gt;Building a scheduler system that caters to both is, to my knowledge, a largely unexplored problem.  Such a system needs to support threads, which represent the expectation of interactivity.  However, &lt;span style="font-style: italic;"&gt;multiple CPUs might be committed to a single thread&lt;/span&gt;.  The scheduling problem then becomes one of balancing the desire for interactivity with the desire to commit large numbers of CPUs to a given thread, so that it completes faster.  A useful heuristic here might be that highly interactive threads desire relatively few CPU's, as they would function primarily as dispatchers to more compute-bound threads.&lt;br /&gt;&lt;br /&gt;Crossing the user/kernel boundary gets even more complicated, which I'll discuss in later entries.  Suffice to say, the activation model is vastly superior to anything else, and the problem gets further complicated by the desire to commit &lt;span style="font-style: italic;"&gt;exactly&lt;/span&gt; the number of desired CPUs to a given task.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-3896470175574798762?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/3896470175574798762/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/10/operating-systems-from-first-principles.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3896470175574798762'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3896470175574798762'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/10/operating-systems-from-first-principles.html' title='Operating Systems from First Principles, Part 1: Rethinking Threads'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-4991487749590996095</id><published>2009-10-03T22:57:00.003-04:00</published><updated>2009-10-04T00:04:09.280-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='verification'/><category scheme='http://www.blogger.com/atom/ns#' term='theory'/><category scheme='http://www.blogger.com/atom/ns#' term='program languages'/><category scheme='http://www.blogger.com/atom/ns#' term='garbage collection'/><category scheme='http://www.blogger.com/atom/ns#' term='type systems'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Specifying Concurrent and Synchronous Operations</title><content type='html'>I've spoken before on how to specify concurrent operations in a lock-free setting.  The trick there is observations and effects, and safety is guaranteed when all possible ending states of all operations are valid starting states for each operation.&lt;br /&gt;&lt;br /&gt;However, we need more than just this.  Such a type system would be unable to certify the atomic snapshot protocol.  To reason about this, we need two additional reasoning capabilities.  First, we need to be able to reason about when an observation is made, or when an effect was caused.  Secondly, we need to be able to infer this with arbitrary predicates over terms.  For instance, in the atomic snapshot protocol, we need to be able to connect a memory tag with the time at which an operation occurred.  In other words, if tag a is observed in one observation of a given location, and later, a tag b is observed such that a &lt; b, then something effected that location in between the two observations.  More importantly, if they are the same, then nothing affected the location.&lt;br /&gt;&lt;br /&gt;Lastly, we need some means of reasoning about actual &lt;span style="font-style: italic;"&gt;synchronous&lt;/span&gt; operations and their behavior.  For instance, the synchronization barrier operation or lock/unlock operations imply certain timing behaviors between various threads.  Barriers allow a thread to continue at some point after some other number of threads have executed a barrier wait.&lt;br /&gt;&lt;br /&gt;Temporal Logic may be the key here, or at least a starting point.  Temporal logic provides six different predicates on propositions.  They are as follows:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Fp (future): At some point in the future, p holds&lt;/li&gt;&lt;li&gt;Pp (past): At some point in the past, p held&lt;/li&gt;&lt;li&gt;Hp (henceforth): At all points in the future, p holds&lt;/li&gt;&lt;li&gt;Gp (hitherto): At all points in the past, p held&lt;/li&gt;&lt;li&gt;☐p (always) = Gp ∧ p ∧ Hp: At all points in time, p holds&lt;br /&gt;&lt;/li&gt;&lt;li&gt;◇p (sometimes) = Pp ∨ p ∨ Fp: At some point in time, p holds&lt;/li&gt;&lt;/ul&gt;The best thing about temporal logic is that it is decidable (at least in some instances).&lt;br /&gt;&lt;br /&gt;However, this logic alone isn't enough.  Specifically, we need quantification over time quanta probably based on these predicates.  Linearizability of terms in such a type system would then come down to identifying a specific time quantum which acts as the linearization point.  For your basic lock-free algorithms, this is simply the time quantum of some operation in the term itself.  For more complicated things, such as a lock-free garbage collector, then the "start collection" function specifies some point in future time (F) at which the heap is exchanged for an isomorphic superset of the observable locations at that point.&lt;br /&gt;&lt;br /&gt;In any case, this is likely the way forward on this front.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-4991487749590996095?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/4991487749590996095/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/10/specifying-concurrent-and-synchronous.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4991487749590996095'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4991487749590996095'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/10/specifying-concurrent-and-synchronous.html' title='Specifying Concurrent and Synchronous Operations'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-5085173822755146391</id><published>2009-09-16T13:11:00.003-04:00</published><updated>2009-09-16T14:38:01.898-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='program languages'/><category scheme='http://www.blogger.com/atom/ns#' term='type systems'/><title type='text'>Central Themes from the Ynot Papers</title><content type='html'>The Ynot project is the fancy name for Hoare Type Theory, which is a body of work developed by Greg Morrisett at Harvard.  Hoare Type Theory is a dependent type theory for stateful programs, which is based on Hoare Logic.  The Hoare type {&lt;span style="font-style: italic;"&gt;P&lt;/span&gt;} &lt;span style="font-style: italic;"&gt;T&lt;/span&gt; {&lt;span style="font-style: italic;"&gt;Q&lt;/span&gt;} denotes a computation that expects a world described by P, leaves the world in a state described by Q, and yields T.&lt;br /&gt;&lt;br /&gt;This work is intimately related to my own work on linearizability types, and my work will probably share a great deal with Hoare Type Theory.  The work's major ideas include the following:&lt;br /&gt;&lt;br /&gt;1) Type-checking is split into two parts.  The first is an ML-style, decidable, terminating type checking phase.  This phase also generates a multitude of proof obligations.  The second phase discharges these obligations in one of many possible ways, such as a) ignoring them, b) relying on proof annotations in the code itself, c) checking by hand, d) using an interactive prover, or e) using an automated prover.  This sets up a nice, pay-as-you-go system that allows for rapid development as well as more formal processes.&lt;br /&gt;&lt;br /&gt;2) The Hoare types incorporate separation logic, which allows for minimal specifications.  P and Q in {&lt;span style="font-style: italic;"&gt;P&lt;/span&gt;} &lt;span style="font-style: italic;"&gt;T&lt;/span&gt; {&lt;span style="font-style: italic;"&gt;Q&lt;/span&gt;} only describe those portions of various heaps which the term in question actually uses.  This allows for much smaller, and reusable specifications.&lt;br /&gt;&lt;br /&gt;3) The separation logic can also express ownership of various locations.  This can be used to describe a garbage collector or memory allocator, for instance.&lt;br /&gt;&lt;br /&gt;4) Abstract predicates can describe state and effects without actually revealing details about internal representations.  For instance, a naïve typing of an increment function on a counter object might be &lt;span style="font-style: italic;"&gt;inc&lt;/span&gt; : &lt;span style="font-style: italic;"&gt;c&lt;/span&gt; : &lt;span style="font-family:arial;"&gt;counter&lt;/span&gt;,&lt;span style="font-style: italic;"&gt;n&lt;/span&gt; : &lt;span style="font-family:arial;"&gt;int&lt;/span&gt;.{ &lt;span style="font-style: italic;"&gt;c&lt;/span&gt; ↦ &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; } &lt;span style="font-style: italic;"&gt;c&lt;/span&gt; → &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; { &lt;span style="font-style: italic;"&gt;c&lt;/span&gt; ↦ &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; + 1 }, however, this reveals too much about the structure of the object.  A better way is to include in the object's description a proposition over counters and integers, like &lt;span style="font-style: italic;"&gt;yields&lt;/span&gt; : &lt;span style="font-family:arial;"&gt;counter&lt;/span&gt; × &lt;span style="font-family:arial;"&gt;int&lt;/span&gt; → &lt;span style="font-family:arial;"&gt;Prop&lt;/span&gt; (I'll write this in infix notation).  Then, the type of the counter ends up being &lt;span style="font-style: italic;"&gt;inc&lt;/span&gt; : &lt;span style="font-style: italic;"&gt;c&lt;/span&gt; : counter,&lt;span style="font-style: italic;"&gt;n&lt;/span&gt; : int.{ &lt;span style="font-style: italic;"&gt;c yields n&lt;/span&gt; } &lt;span style="font-style: italic;"&gt;c&lt;/span&gt; → &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; { &lt;span style="font-style: italic;"&gt;c yields &lt;/span&gt;(&lt;span style="font-style: italic;"&gt;n&lt;/span&gt; + 1) }.  This tells me nothing about the &lt;span style="font-style: italic;"&gt;internal&lt;/span&gt; structure of c.  C could be anything for all I care, so long as the abstract predicate &lt;span style="font-style: italic;"&gt;yields&lt;/span&gt; tells me what I'll get from each call to &lt;span style="font-style: italic;"&gt;inc&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Hoare Type Theory itself has predicative (ML-style) polymorphism with existential types (data hiding).  They have proven that impredicative polymorphism &lt;span style="font-style: italic;"&gt;could&lt;/span&gt; feasibly exist, but it doesn't appear that they've actually added it.  The have attempted to add support for transactions, but this is copy-catting the Haskell STM with invariants monad, not what I'm trying to do.  At the present, they have no subtyping.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-5085173822755146391?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/5085173822755146391/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/09/central-themes-from-ynot-papers.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/5085173822755146391'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/5085173822755146391'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/09/central-themes-from-ynot-papers.html' title='Central Themes from the Ynot Papers'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-3524153321742317683</id><published>2009-09-16T12:41:00.002-04:00</published><updated>2009-09-16T13:09:43.747-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='theory'/><category scheme='http://www.blogger.com/atom/ns#' term='garbage collection'/><category scheme='http://www.blogger.com/atom/ns#' term='chaos'/><title type='text'>Analyzing GC Traces</title><content type='html'>For the chaos research, I'm building a simulator framework for analyzing garbage collection traces.  What I've had in mind up to this point is a simulator for a simple language, which generates traces, an analyzer which runs on traces to simulate collection events.  I recently had the idea of using an ILP solver to compute the optimal collection strategy, and compare the results against any online (read: usable) approaches we came up with.&lt;br /&gt;&lt;br /&gt;However, I realized yesterday that the branch and bound algorithm for solving ILP problems potentially yields alot of very useful information regarding garbage collection.  The algorithm provides a way to analyze the impact of choosing whether or not to collect at a given point, and the shape of the search tree should give some indication as to which points are the most important.  This in itself is potentially very useful information.  Moreover, it could become a profiling technique in its own right.  If a certain point is a useful point for garbage collection in most of the runs of a program, then it could be statically marked as a preferential point for collection.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-3524153321742317683?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/3524153321742317683/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/09/analyzing-gc-traces.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3524153321742317683'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3524153321742317683'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/09/analyzing-gc-traces.html' title='Analyzing GC Traces'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-6113250026625001142</id><published>2009-08-16T02:46:00.006-04:00</published><updated>2009-08-16T13:03:54.297-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='transactional memory'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Implementing the Transactional Intrinsics</title><content type='html'>I spoke with Chip Weems about how to implement the transactional intrinsics I described in my earlier posts. His primary objection was to the &lt;span style="font-family:courier new;"&gt;watch&lt;/span&gt; instructions I proposed, which mark a location as being watched.&lt;br /&gt;&lt;br /&gt;In general, the scheme I proposed had three core operations: &lt;span style="font-family:courier new;"&gt;watch&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;touch&lt;/span&gt;, and &lt;span style="font-family:courier new;"&gt;validate&lt;/span&gt;.  The problem with &lt;span style="font-family:courier new;"&gt;watch&lt;/span&gt; is that it suggests some sort of global monitoring which detects &lt;span style="font-family:courier new;"&gt;touch&lt;/span&gt;es to various locations. Such a thing is easy enough to implement; one can use a hash-table, a bitvector, a Bloom filter, or any number of techniques to implement it. This is no mystery.&lt;br /&gt;&lt;br /&gt;The problem is that in a highly distributed system, or even a multicore system with more than 32 cores, any centralized memory system isn't feasible. With 32 processors, you get contention problems. In a distributed system, a single shared memory is out of the question.&lt;br /&gt;&lt;br /&gt;For higher contention, it becomes necessary to get rid of the centralized coordination, or at least reduce it. We can employ version tags to do this (bounded-length infinite-reuse tags for n processors can be implemented with 2n bits), recording a location's tag with a &lt;span style="font-family:courier new;"&gt;watch&lt;/span&gt;, incrementing the tag for our own copy with a &lt;span style="font-family:courier new;"&gt;touch&lt;/span&gt;, and checking the tag against the global version with &lt;span style="font-family:courier new;"&gt;validate&lt;/span&gt;.  This version of things seems to imply a slightly different set of intrinsics.  First, &lt;span style="font-family:courier new;"&gt;watch&lt;/span&gt; and &lt;span style="font-family:courier new;"&gt;read&lt;/span&gt; must be combined, though &lt;span style="font-family:courier new;"&gt;read&lt;/span&gt; without &lt;span style="font-family:courier new;"&gt;watch&lt;/span&gt; is still permissible.  Second, &lt;span style="font-family:courier new;"&gt;rollback&lt;/span&gt; performs writes, bumping version tags. However, since it backtracks values, we need to somehow state that two version tags are equivalent. There's a fairly simple obstruction-free, undo-logged transaction implementation using this method.&lt;br /&gt;&lt;br /&gt;The apparent way of the world would suggest we'll run into another wall at the next power of 8, which is 256. At this point, we're best off using Herlihy's methodology. Since my intrinsic set was designed to allow the implementation of either undo or redo logging (and generally to maximize flexibility), we would need to slip redo logging in under the constructs I've provided for doing undo logging. This could be done by buffering all locations which are &lt;span style="font-family:courier new;"&gt;watch&lt;/span&gt;ed or &lt;span style="font-family:courier new;"&gt;touch&lt;/span&gt;ed, having &lt;span style="font-family:courier new;"&gt;log&lt;/span&gt; instructions allocate a slot in the redo log, to which subsequent &lt;span style="font-family:courier new;"&gt;write&lt;/span&gt;s store, having &lt;span style="font-family:courier new;"&gt;rollback&lt;/span&gt; simply clean up, and &lt;span style="font-family:courier new;"&gt;validate&lt;/span&gt; do an implicit &lt;span style="font-family:courier new;"&gt;commit&lt;/span&gt;.  It's less efficient, but it works.  This gets you a lock-free redo-logged implementation.&lt;br /&gt;&lt;br /&gt;One would also expect that at 1024, the protocol would need to go to wait-freedom.&lt;br /&gt;&lt;br /&gt;In summary, I theorize the following use of strategies for CPU numbers:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;1-8&lt;/span&gt;: Lock-based centralized conflict detection&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;8-32&lt;/span&gt;: Lock-free centralized conflict detection&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;32-256&lt;/span&gt;: Obstruction-free version-tagged decentralized conflict detection&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;256-1024&lt;/span&gt;: Lock-free Herlihy&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;1024+&lt;/span&gt;: Wait-free Herlihy&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-6113250026625001142?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/6113250026625001142/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/08/implementing-transactional-intrinsics.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6113250026625001142'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6113250026625001142'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/08/implementing-transactional-intrinsics.html' title='Implementing the Transactional Intrinsics'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-7728793038014038091</id><published>2009-07-30T15:20:00.002-04:00</published><updated>2009-07-30T15:24:13.920-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='theory'/><category scheme='http://www.blogger.com/atom/ns#' term='chaos'/><title type='text'>Self-Organizing Criticality and Program Allocation Behavior</title><content type='html'>I ran across a concept called "self-organized criticality" while investigating chaos.  The term refers to simple systems that nonetheless give rise to critical points.  The original investigation I believe was of pouring sand onto a surface, which builds up into a pile, and occasionally causes avalanches.&lt;br /&gt;&lt;br /&gt;In computing, this could be represented by balanced trees, cellular automata, and similar structures.&lt;br /&gt;&lt;br /&gt;If such programs give rise to unpredictable critical points, then this could cause chaotic allocation behavior from a simple program.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-7728793038014038091?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/7728793038014038091/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/07/self-organizing-criticality-and-program.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/7728793038014038091'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/7728793038014038091'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/07/self-organizing-criticality-and-program.html' title='Self-Organizing Criticality and Program Allocation Behavior'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-6893043640294264815</id><published>2009-07-26T12:35:00.002-04:00</published><updated>2009-07-26T13:04:43.842-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='theory'/><category scheme='http://www.blogger.com/atom/ns#' term='garbage collection'/><category scheme='http://www.blogger.com/atom/ns#' term='chaos'/><title type='text'>Current Questions on Garbage Collection and Chaos</title><content type='html'>My investigations into garbage collection and chaos have produced two main areas of interest so far.  Both are motivated by the previous question of "how can one choose the points at which to perform collections wisely?"&lt;br /&gt;&lt;br /&gt;First, garbage collection actions by definition, have no effect on execution from the standpoint of semantics.  Collection actions replace the memory graph with a subset of the same memory graph, which is isomorphic with respect to the root set.  Therefore, collections can be performed anywhere during execution.  Typically, there is a maximum size which garbage collection must keep the program under.&lt;br /&gt;&lt;br /&gt;Assuming that the cost of collection is roughly equivalent to the number of objects preserved, then we can usually have a profound impact on the total cost of collection depending on exactly where we perform collections.  There are three general schemes for doing this:&lt;br /&gt;&lt;br /&gt;1) The perfect world, where we know the execution ahead of time, and we pick collection points using arbitrary computing power.  This is obviously unrealistic, but it's the goal.  This is a basic constraint-solving problem, and it yields the optimal result.  Call this mode "hindsight".&lt;br /&gt;&lt;br /&gt;2) We know the program ahead of time, and can do arbitrary full-program analysis, compilation, and the like, but we don't know the specifics of any execution.  Call this mode "full-program".&lt;br /&gt;&lt;br /&gt;3) We don't know anything about the program.  Call this mode "online".&lt;br /&gt;&lt;br /&gt;The first major question is whether there is a gap between hindsight, full-program, and online.  I suspect there is.  Specifically, I'm almost certain that hindsight &gt; full-program, and I'm pretty sure full-program &gt; online.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The second specifically concerns the full-program scheme.  I'll call programs whose execution patterns and allocation patterns are predictable knowing all inputs "regular", and those who aren't "chaotic".  It's pretty obvious that you can have a regular program with regular allocation behavior.  An example is a program that builds up a list to a specific length, then empties it, then fills it again.  Once you know the length of the list, you can plan to do collections when you empty the list.  I'm almost certain that one can produce a chaotic program with chaotic allocation behavior by adapting a real-life chaotic system like Lorenz's water-wheel.  It's fairly easy to get from this point to chaotic programs with regular allocation behavior.  The real question is, can you have a regular program with chaotic allocation behavior.&lt;br /&gt;&lt;br /&gt;On that, I'm split.  Intuition would seem to say "no", but there is alot of work on things like chaotic iterated maps that leads me to suspect otherwise.&lt;br /&gt;&lt;br /&gt;So the real second question is whether or not a regular program can produce chaotic allocation behavior.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-6893043640294264815?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/6893043640294264815/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/07/current-questions-on-garbage-collection.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6893043640294264815'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6893043640294264815'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/07/current-questions-on-garbage-collection.html' title='Current Questions on Garbage Collection and Chaos'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-6924658852895254822</id><published>2009-07-21T12:59:00.002-04:00</published><updated>2009-07-21T13:49:34.943-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='theory'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>The Differences of Asynchronous Complexity</title><content type='html'>After finishing rewriting my paper on this subject for PPoPP, several things are apparent:&lt;br /&gt;&lt;br /&gt;1) CAS, LL/SC, or bounded-size transactions are the fundamental operation, not consensus.  Consensus objects, because they are one-shot objects, end up increasing costs.&lt;br /&gt;&lt;br /&gt;2) The lack of a universal construction is problematic.  However, I doubt that such a construction can truly exist, at least for all kinds of objects.&lt;br /&gt;&lt;br /&gt;3) The hierarchy emerges from the number of simultaneous CAS operations needed to implement a given data structure.  At the moment, I've considered only constant number of operations.  However, in a distributed object framework or transactional memory system, we might see something like a binary tree has a simultaneity of O(log2(n))&lt;br /&gt;&lt;br /&gt;4) There may be a complexity gap between lock-freedom and wait-freedom.  I suspect it to be O(n).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-6924658852895254822?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/6924658852895254822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/07/differences-of-asynchronous-complexity.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6924658852895254822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6924658852895254822'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/07/differences-of-asynchronous-complexity.html' title='The Differences of Asynchronous Complexity'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-7247782142277430390</id><published>2009-07-12T13:31:00.006-04:00</published><updated>2009-07-12T13:48:03.668-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='theory'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Defining Lock-Freedom for Synchronous Channels</title><content type='html'>The problem on my mind now is trying to find a definition similar to lock-freedom for synchronous channels.  The real problem I want to describe is the concept of implementing synchronous channels with choice-composition without using a global lock.&lt;br /&gt;&lt;br /&gt;Herlihy's definition of lock-freedom is nice, because it eliminates the capacity for blocking synchronization &lt;span style="font-style: italic;"&gt;generally&lt;/span&gt;, by stating the property that you really want: at least one thread is guaranteed to finish.  Unfortunately, it doesn't quite work for synchronous channels, which are inherently blocking.&lt;br /&gt;&lt;br /&gt;There are three approaches I've considered:&lt;br /&gt;&lt;br /&gt;1) Try to define the exact properties that global locks have.  This is near impossible, and I can actually visualize Herlihy's original thinking through the papers about linearizability and lock-freedom (he was my advisor afterall).  Trying to state what is forbidden is near impossible.  It's better to go for what you want.&lt;br /&gt;&lt;br /&gt;2) Try to define some sort of synchronization operation which can only synchronize two threads.  I'm pretty sure this is impossible.&lt;br /&gt;&lt;br /&gt;3) Expand lock-freedom to accomodate synchronous channel operations.  This seems most promising.  My thoughts are something like "assuming there is a possible solution, one thread completes in a finite amount of time".&lt;br /&gt;&lt;br /&gt;The third will probably be the one that I end up selecting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-7247782142277430390?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/7247782142277430390/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/07/defining-lock-freedom-for-synchronous.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/7247782142277430390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/7247782142277430390'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/07/defining-lock-freedom-for-synchronous.html' title='Defining Lock-Freedom for Synchronous Channels'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-4160461401473029833</id><published>2009-07-01T00:29:00.003-04:00</published><updated>2009-07-01T03:29:38.489-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='garbage collection'/><category scheme='http://www.blogger.com/atom/ns#' term='chaos'/><title type='text'>Garbage Collection and Chaos</title><content type='html'>Changing topics, and going to more research in progress as opposed to finalizing what I've already done.  I'm shifting to my long-term work: chaos and computing.&lt;br /&gt;&lt;br /&gt;The original question was "is garbage collection chaotic?"  Now that I understand all this better, I can take a more direct approach.  For a system to be chaotic, three properties need to be satisfied.&lt;br /&gt;&lt;br /&gt;1) Sensitivity to initial conditions&lt;br /&gt;&lt;br /&gt;Points which are close to eachother may produce substantially different trajectories.  That is, subtle changes may drastically alter the result.  This, I think, is the most important.&lt;br /&gt;&lt;br /&gt;2) Topological mixing&lt;br /&gt;&lt;br /&gt;As near as I can tell, given any two sets of points, it is possible to start with any point in one, run the system long enough, and end up with a point in the other.&lt;br /&gt;&lt;br /&gt;3) Dense periodic orbits&lt;br /&gt;&lt;br /&gt;Repeating orbits are continuous, or unbroken.  You don't don't have "gaps" in a cycle.&lt;br /&gt;&lt;br /&gt;Now, how does this relate to garbage collection?  That depends on what you're measuring.  My first thoughts were to treat a family of isomorphic memory graphs as a single point in a space.  However, this will not be a chaotic system.  Garbage collection as a system will consist of a number of families of isomorphic graphs with vectors all pointing towards a set of strongly-connected graphs representing the nodes reachable from the root set.  Clearly this is not chaotic.&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-style: italic;"&gt;costs&lt;/span&gt; of collections, however, may well be chaotic, or determined by some chaotic system.  Costs can be measured in terms of nodes traversed, nodes deleted, nodes preserved, or some combination thereof.  An individual move from one graph to another entails some cost.&lt;br /&gt;&lt;br /&gt;Now, a program describes some trajectory in this space of graph-families.  At some interval, the garbage collection function gets fired, which costs something.  Depending on how costs get computed, two neighboring points (graphs) in the trajectory described by the program might result in radically different collection costs.  So whatever function decides when to collect can potentially have a major effect on the total cost (this is something analogous to sensitivity to initial conditions).  Topological mixing and dense orbits are more troublesome.  They suggest the ability for a given decision function to produce arbitrarily bad behavior if the program runs long enough.&lt;br /&gt;&lt;br /&gt;I think the more telling question is "how much data is necessary to make optimal, or within some multiple of optimal decicions about when to collect?", or perhaps the other angle, "how close to optimal can I get with a finite data set?".  Can a collector actually make a decent decision, or will it produce arbitrarily bad behavior given enough time?  I think answering that question would get at the original intent.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-4160461401473029833?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/4160461401473029833/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/07/garbage-collection-and-chaos.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4160461401473029833'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4160461401473029833'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/07/garbage-collection-and-chaos.html' title='Garbage Collection and Chaos'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-3922150000908079159</id><published>2009-06-28T18:25:00.012-04:00</published><updated>2009-08-16T02:45:46.831-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='transactional memory'/><category scheme='http://www.blogger.com/atom/ns#' term='threading'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Complete GC and Scheduler Intrinsics</title><content type='html'>This is an attempt to add the mailbox data to the GC and transactional intrinsics.  I will need to further develop some transactional concepts before the transactional intrinsics are where they need to be.&lt;br /&gt;&lt;br /&gt;On the other hand, those are for far-future work, not present-day.  The idea is to make sure I don't have to redesign all the instructions at a later date.&lt;br /&gt;&lt;br /&gt;This is consciously trying to look more like LLVM instructions...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Types:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;gcallocator&lt;/span&gt;:   A structure used to allocate GC objects.  These are good for one use, and the gcalloc intrinsic yields a new allocator.  The program is expected to use this, and give the latest version to safepoints.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;gcstate&lt;/span&gt;: A collection of boolean flags representing various things.  Doesn't change between safepoints.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;gc&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;undo&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;buf&lt;/span&gt;: Each of these are a kind of log.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;mbox&lt;/span&gt;:    A thread's mailbox.  At present equal to &lt;span style="font-family:courier new;"&gt;{i32 execid, gcstate state, gcallocator alloc, gc gclog }&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;thread&lt;/span&gt;:  A thread id.  This is a differentiated type because it is traced by the garbage collector.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;schedstate&lt;/span&gt;: Scheduling states.  Includes at the least &lt;span style="font-family:courier new;"&gt;run&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;suspend&lt;/span&gt;, and &lt;span style="font-family:courier new;"&gt;term&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tstate&lt;/span&gt;: A thread's state.  At the present, equal to  &lt;span style="font-family:courier new;"&gt;{schedstate state, i32 priority }&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Calling Conventions:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;thread&lt;/span&gt;:  A calling convention for function that serve as the starting point of threads.  Enters with the architecture's thread pointer pointing to an inital frame, which is initialized to contain the arguments to the function.  The object containing this frame will likely also contain the thread structure, though it is not required.&lt;br /&gt;&lt;br /&gt;In general, the other calling conventions are geared towards using GC-allocated frames, and flattening them to the fullest extent possible.  Functions expect to enter with their stack pointer pointing to a frame of the proper size and type.  When leaving, they restore the return address and old frame pointer.  I call this the "blind callee convention" since the callee has no idea where it got its frame from.&lt;br /&gt;&lt;br /&gt;There are roughly three sub-conventions that arise from this.  The first is the most basic: allocate a frame from GC, initialize it, set up the arguments, return address, and return frame, and jump to the function.&lt;br /&gt;&lt;br /&gt;In the second, we actually use the frame of the caller.  We can do this for calls to non-recursive functions.&lt;br /&gt;&lt;br /&gt;In the third, we use a classic stack, which is a big aggregate GC object.  We can just use the blind callee convention, or we can have the callee be aware that it is on a stack (and thus, can allocate more frames using the stack).  This is useable when the frames themselves don't escape a call (curried functions, inner functions that access variables in their parents and escape the parent, and continuations break this convention).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Instructions:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;result&lt;/span&gt; = spawn execid &lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;execid&lt;/span&gt;, gcallocator &lt;span style="font-style: italic;"&gt;gcallocator&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;funptrval&lt;/span&gt;(&lt;span style="font-style: italic;"&gt;initial args&lt;/span&gt;)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Spawn a thread.  The starting function &lt;span style="font-style: italic;font-family:courier new;" &gt;funptrval&lt;/span&gt; must use the &lt;span style="font-family:courier new;"&gt;thread&lt;/span&gt; calling convention.  &lt;span style="font-style: italic;font-family:courier new;" &gt;result&lt;/span&gt; is of type &lt;span style="font-family:courier new;"&gt;{thread, gcallocator }&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;result&lt;/span&gt; = tstat execid &lt;span style="font-style: italic;"&gt;execid&lt;/span&gt;, thread &lt;span style="font-style: italic;"&gt;threadval&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Get the state for &lt;span style="font-style: italic;font-family:courier new;" &gt;threadval&lt;/span&gt;.  The  &lt;span style="font-style: italic;font-family:courier new;" &gt;result&lt;/span&gt; is of type &lt;span style="font-family:courier new;"&gt;tstate&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;update execid &lt;span style="font-style: italic;"&gt;execid&lt;/span&gt; [, &lt;span&gt;schedstate&lt;/span&gt; (run|suspend|term)] [, i32 &lt;span style="font-style: italic;"&gt;pri&lt;/span&gt; ], thread &lt;span style="font-style: italic;"&gt;threadval&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Update state for &lt;span style="font-style: italic;font-family:courier new;" &gt;threadval&lt;/span&gt;.  There must be at least one state change given.  For threads updating themselves, this does not take effect until the next &lt;span style="font-family:courier new;"&gt;safepoint&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;result&lt;/span&gt; &lt;span style="font-family:georgia;"&gt;=&lt;/span&gt; safepoint mbox &lt;span style="font-style: italic;"&gt;mbox&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Possibly context switch or run garbage collection, acknowledge any changes to scheduling state, generate a new &lt;span style="font-style: italic;font-family:courier new;" &gt;mbox&lt;/span&gt; for subsequent calls.  The &lt;span style="font-style: italic;font-family:courier new;" &gt;mbox&lt;/span&gt; given becomes defunct, and using it after this point causes undefined behavior.  A safepoint will spill and restore all registers to the current frame if a garbage collection or a context switch takes place.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;result&lt;/span&gt; = gcalloc [tx, ] mbox &lt;span style="font-style: italic;"&gt;mbox&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;type&lt;/span&gt; [, &lt;span style="font-style: italic;"&gt;options&lt;/span&gt; ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Allocate an object with type &lt;span style="font-family:courier new;"&gt;type&lt;/span&gt;, initializing only its header.  The type of result is &lt;span style="font-family:courier new;"&gt;{type*, mbox }&lt;/span&gt;.  This call &lt;span style="font-style: italic;"&gt;may&lt;/span&gt; also have the effect of executing a &lt;span style="font-family:courier new;"&gt;safepoint&lt;/span&gt;.&lt;br /&gt;&lt;span style="font-style: italic;font-family:courier new;" &gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;result&lt;/span&gt; = watch [r ][w ]&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;, &lt;span style="font-style: italic;"&gt;type&lt;/span&gt;*&lt;span style="font-style: italic;"&gt; ptrval&lt;/span&gt; ... [, set &lt;span style="font-style: italic;"&gt;setval&lt;/span&gt; ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Watch for activity on ptrvals.  If no specific type of activity is given, rw is assumed.  Append &lt;span style="font-style: italic;font-family:courier new;" &gt;ptrval&lt;/span&gt;s to &lt;span style="font-style: italic;font-family:courier new;" &gt;setval&lt;/span&gt;, or create a new set containing &lt;span style="font-style: italic;font-family:courier new;" &gt;ptrval&lt;/span&gt;s.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;touch r [w ], &lt;span style="font-style: italic;"&gt;type&lt;/span&gt;* &lt;span style="font-style: italic;"&gt;ptrval&lt;/span&gt; ...&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;result&lt;/span&gt; = touch [r ]w, &lt;span style="font-style: italic;"&gt;type&lt;/span&gt;*&lt;span style="font-style: italic;"&gt; ptrval&lt;/span&gt; ... [, &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;(gc / undo) &lt;span style="font-style: italic;"&gt;logval&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Report access to &lt;span style="font-style: italic;font-family:courier new;" &gt;ptrval&lt;/span&gt;s, which may be read or write.  If write access is reported, then it may also be logged.  If the touch is logged, then there is a &lt;span style="font-style: italic;font-family:courier new;" &gt;result&lt;/span&gt;, which is a log equal to the type of log argument given.  Depending on the exact log type, this instruction must be placed differently relative to the memory operation to which it corresponds.  If &lt;span style="font-family:courier new;"&gt;undo&lt;/span&gt; logs are used, the instruction must precede any write to the locations, whereas the instruction can be anywhere with &lt;span style="font-family:courier new;"&gt;gc&lt;/span&gt; logs.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;result&lt;/span&gt; = read [gc &lt;span style="font-style: italic;"&gt;gcstate&lt;/span&gt;, ] [tx ,] &lt;span style="font-style: italic;"&gt;type&lt;/span&gt;* &lt;span style="font-style: italic;"&gt;ptrval&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;result&lt;/span&gt; = read [gc &lt;span style="font-style: italic;"&gt;gcstate&lt;/span&gt;, ] tx, &lt;span style="font-style: italic;"&gt;type&lt;/span&gt;* &lt;span style="font-style: italic;"&gt;ptrval&lt;/span&gt;, buf &lt;span style="font-style: italic;"&gt;logval&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Exactly equivalent to &lt;span style="font-family:courier new;"&gt;load&lt;/span&gt;, except it exists for extra transactional or garbage collection semantics.  At least one of &lt;span style="font-family:courier new;"&gt;gc&lt;/span&gt; or &lt;span style="font-family:courier new;"&gt;tx&lt;/span&gt; must be present.  The second variant checks a set of buffer logs first.  This does &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; inform watch sets of the activity.  If this is a read of gc-traced memory, the &lt;span style="font-family:courier new;"&gt;gcstate&lt;/span&gt; from the last &lt;span style="font-family:courier new;"&gt;mbox&lt;/span&gt; must be given.&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;write [gc &lt;span style="font-style: italic;"&gt;gcstate&lt;/span&gt;, ] [tx ], &lt;span style="font-style: italic;"&gt;type&lt;/span&gt;* &lt;span style="font-style: italic;"&gt;ptrval&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;type&lt;/span&gt; &lt;span style="font-style: italic;"&gt;val&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;result&lt;/span&gt; = write [gc &lt;span style="font-style: italic;"&gt;gcstate&lt;/span&gt;, ] tx, &lt;span style="font-style: italic;"&gt;type&lt;/span&gt;* &lt;span style="font-style: italic;"&gt;ptrval&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;type&lt;/span&gt; &lt;span style="font-style: italic;"&gt;val&lt;/span&gt; [, buf &lt;span style="font-style: italic;"&gt;logval&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Exactly equivalent to &lt;span style="font-family:courier new;"&gt;store&lt;/span&gt;, except for extra transactional or garbage collection semantics.  At least one of &lt;span style="font-family:courier new;"&gt;gc&lt;/span&gt; or &lt;span style="font-family:courier new;"&gt;tx&lt;/span&gt; must be present.  The second variant buffers the write in a write-buffer log.  This does not actually write through to memory.  This does &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; inform watch sets of the activity.  If this is a read of gc-traced memory, the &lt;span style="font-family:courier new;"&gt;gcstate&lt;/span&gt; from the last &lt;span style="font-family:courier new;"&gt;mbox&lt;/span&gt; must be given.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-style: italic;"&gt;result&lt;/span&gt; = validate [r ][w ], [set &lt;span style="font-style: italic;"&gt;setval&lt;/span&gt; ... ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Validate &lt;span style="font-style: italic;font-family:courier new;" &gt;setval&lt;/span&gt;s, or all outstanding watched values for this thread if no sets are given.  If no access mode is given, &lt;span style="font-family:courier new;"&gt;rw&lt;/span&gt; is assumed.  &lt;span style="font-family:courier new;"&gt;result&lt;/span&gt; is a &lt;span style="font-family:courier new;"&gt;i1&lt;/span&gt;, indicating success or failure.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;discard [r ][w ], set &lt;span style="font-style: italic;"&gt;setval&lt;/span&gt; [, set &lt;span style="font-style: italic;"&gt;setval&lt;/span&gt; ... ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Discard &lt;span style="font-style: italic;font-family:courier new;" &gt;setval&lt;/span&gt;s from the watched locations for this thread.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;commit buf &lt;span style="font-style: italic;"&gt;logval&lt;/span&gt; [, buf &lt;span style="font-style: italic;"&gt;logval&lt;/span&gt; ... ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Perform the actions described in &lt;span style="font-style: italic;font-family:courier new;" &gt;logval&lt;/span&gt;s.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;rollback undo &lt;span style="font-style: italic;"&gt;logval&lt;/span&gt; [, undo &lt;span style="font-style: italic;"&gt;logval&lt;/span&gt; ... ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Abort all actions described in &lt;span style="font-style: italic;font-family:courier new;" &gt;logval&lt;/span&gt;s.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;listen set &lt;span style="font-style: italic;"&gt;setval&lt;/span&gt; [, set &lt;span style="font-style: italic;"&gt;setval&lt;/span&gt; ... ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Following the completion of this instruction, the current thread's status will be set to &lt;span style="font-family:courier new;"&gt;run&lt;/span&gt; whenever anyone &lt;span style="font-style: italic;font-family:courier new;" &gt;touch&lt;/span&gt;es an of the locations contained in &lt;span style="font-style: italic;font-family:courier new;" &gt;setval&lt;/span&gt;s.&lt;br /&gt;&lt;br /&gt;There is still no conflict management.  Also, the transactional instructions completely lack the kind of meta-arguments present in the gc and scheduler intrinsics.  The next steps will be to develop conflict management and the necessary metastate.  There also need to be semantics and operations for managing logs.  Adding these will complete the intrinsic set.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-3922150000908079159?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/3922150000908079159/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/this-is-attempt-to-add-mailbox-data-to.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3922150000908079159'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3922150000908079159'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/this-is-attempt-to-add-mailbox-data-to.html' title='Complete GC and Scheduler Intrinsics'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-7120405408824787762</id><published>2009-06-24T19:57:00.003-04:00</published><updated>2009-06-24T20:06:51.550-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='threading'/><title type='text'>Scheduler Intrinsics</title><content type='html'>This is the basic scheduler intrinsic set I developed as part of my master's thesis (a very, very tiny part, and there was a fully formal description of the execution model to go along with it), modified for the runtime system.  The actual runtime system had a mailbox pointer, which gets passed around.  This pointer is a required first argument to anyscheduler function&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%thr&lt;/span&gt; = &lt;span style="font-family:courier new;"&gt;spawn %mbox&lt;span style="font-family:georgia;"&gt;, &lt;/span&gt;@func&lt;/span&gt;(...)&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Spawn the call to &lt;span style="font-family:courier new;"&gt;func&lt;/span&gt; as a thread.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%state&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;%pri&lt;/span&gt; = &lt;span style="font-family:courier new;"&gt;stat %thr&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;%mbox&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Get the state and priority of the given thread&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;update&lt;/span&gt; &lt;span style="font-family: courier new;"&gt;%thr&lt;/span&gt; [, run | suspend | term] [, &lt;span style="font-family:courier new;"&gt;%pri&lt;/span&gt;], &lt;span style="font-family:courier new;"&gt;%mbox&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Update a thread's state and priority.  State changes do not take effect until the next safepoint.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%mbox = safepoint&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;%mbox'&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Possibly deschedule or context switch, and acknowledge any changes in state.  This effectively destroys &lt;span style="font-family:courier new;"&gt;%mbox'&lt;/span&gt;, replacing it with &lt;span style="font-family:courier new;"&gt;%mbox&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;The cooperative nature of this threading works much nicer with a lock-free scheduler, and lock-free systems in general.  It is possible to implement asynchronous channel communications quite simply:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;send(%achan, %ptr, %mbox):&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%box = getelementptr, %achan, 0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%recv = getelementptr, %achan, 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%thr = load %recv&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;store %val, %box&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;update %thr, run, %mbox&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ret&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;recv(%achan, %ptr, %mbox):&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%box = getelementptr, %achan, 0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%recv = getelementptr, %achan, 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;update %self, suspend, %mbox&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;store %self, %recv&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;safepoint %mbox&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%val = load %box&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ret %box&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Lastly, it works nicely for doing a TM system as well.  The next step is to tie all this in with the GC/TM intrinsics, and wire in all the &lt;span style="font-family:courier new;"&gt;%mbox&lt;/span&gt;es into those as well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-7120405408824787762?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/7120405408824787762/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/scheduler-intrinsics.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/7120405408824787762'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/7120405408824787762'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/scheduler-intrinsics.html' title='Scheduler Intrinsics'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-845824408421536363</id><published>2009-06-20T12:09:00.004-04:00</published><updated>2009-06-20T15:51:02.403-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compilers'/><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='garbage collection'/><category scheme='http://www.blogger.com/atom/ns#' term='transactional memory'/><title type='text'>Transactional and GC Intrinsics, 3rd Revision</title><content type='html'>Attempting to combine the intrinsicts I developed in the latest entry about transactional intrinsics (a RISC approach) with GC intrinsics.  (Note: in this text "or" means inclusive, "either ... or" means exclusive.  Damn English language and its logical inadequacies).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;watch&lt;/span&gt; (r | w), &lt;span style="font-family: courier new;"&gt;%ptr&lt;/span&gt;* [&lt;span style="font-family: courier new;"&gt;, %set&lt;/span&gt;]&lt;br /&gt;&lt;br /&gt;Watch for read or write events on %ptr, possibly adding it to %set.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;touch&lt;/span&gt; (tx | gc), (r | w), &lt;span style="font-family: courier new;"&gt;%ptr&lt;/span&gt;*  [&lt;span style="font-family: courier new;"&gt;, %gclog&lt;/span&gt;]&lt;br /&gt;&lt;br /&gt;Report read or write activity on &lt;span style="font-family: courier new;"&gt;%ptr&lt;/span&gt; to transaction or garbage collection system.  If activity is being reported to the gc system, &lt;span style="font-family: courier new;"&gt;%gclog&lt;/span&gt; is required.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;discard&lt;/span&gt; [r | w,] &lt;span style="font-family: courier new;"&gt;%set&lt;/span&gt;*&lt;br /&gt;&lt;br /&gt;Stop watching the specified type of activity on &lt;span style="font-family: courier new;"&gt;%set&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;result&lt;/span&gt; = &lt;span style="font-family: courier new;"&gt;validate&lt;/span&gt; [r | w], [&lt;span style="font-family: courier new;"&gt;%set&lt;/span&gt;*]&lt;br /&gt;&lt;br /&gt;Yield true if no conflicts of the specified type has taken place on &lt;span style="font-family: courier new;"&gt;%set&lt;/span&gt;.  If &lt;span style="font-family: courier new;"&gt;%set&lt;/span&gt; is omitted, check for any conflict.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;log&lt;/span&gt; (undo/buf), &lt;span style="font-family: courier new;"&gt;%ptr&lt;/span&gt;*, &lt;span style="font-family: courier new;"&gt;%log&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Log an action in either undo or write-buffering style, storing the information to &lt;span style="font-family: courier new;"&gt;%log&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;commit %log&lt;/span&gt;*&lt;br /&gt;&lt;br /&gt;Perform all actions described in &lt;span style="font-family: courier new;"&gt;%log&lt;/span&gt; if it is a write-buffering log, and finalize the actions in either case, discarding the log.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;rollback %log&lt;/span&gt;*&lt;br /&gt;&lt;br /&gt;Undo all actions in &lt;span style="font-family: courier new;"&gt;%log&lt;/span&gt; if it is an undo log, and abort the actions in either case, discarding the log.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;suspend %set&lt;/span&gt;*&lt;br /&gt;&lt;br /&gt;Suspend the current thread until someone accesses a location in &lt;span style="font-family: courier new;"&gt;%set&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;gc_alloc&lt;/span&gt; [tx,] type, options&lt;br /&gt;&lt;br /&gt;Allocate garbage-collected memory with type and options.  If tx is specified, the allocation will be cached and not repeated if the transaction aborts.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;safepoint&lt;/span&gt; &lt;span style="font-family: courier new;"&gt;%gclog&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Possibly context switch or do garbage collection.  &lt;span style="font-family: courier new;"&gt;%gclog&lt;/span&gt; must contain at least one entry for each location that has been written to since the last safe point.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Problems with this intrinsic set:  1) there is no way to create logs or sets and maintain them, 2) &lt;span style="font-family: courier new;"&gt;log&lt;/span&gt; would seem to be inseparable from a &lt;span style="font-family: courier new;"&gt;touch&lt;/span&gt; w, which would mean they should probably be combined, 3) there is no obvious way to indicate how conflicts are to be resolved (which thread wins, for instance).  I also need to add the gc allocator stuff, and scheduler intrinsics.&lt;br /&gt;&lt;br /&gt;The next revision will have the scheduler operations and more metastate, perhaps conflict stuff.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-845824408421536363?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/845824408421536363/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/transactional-and-gc-intrinsics-3rd.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/845824408421536363'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/845824408421536363'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/transactional-and-gc-intrinsics-3rd.html' title='Transactional and GC Intrinsics, 3rd Revision'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-6491847729183765083</id><published>2009-06-19T12:57:00.005-04:00</published><updated>2009-06-20T12:02:14.796-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='program languages'/><category scheme='http://www.blogger.com/atom/ns#' term='type systems'/><title type='text'>Languages and Derivatives of Evaluation Semantics</title><content type='html'>Brief thought.  A program language is at a minimum a syntax definition and a set of evaluation semantics of the form a → b, meaning "a gets rewritten as b" (of course, they are terms in reality).  So lambda calculus looks like:&lt;br /&gt;&lt;br /&gt;t := n&lt;br /&gt; |   λn.t&lt;br /&gt; |   t t&lt;br /&gt;&lt;br /&gt;and has three evaluation rules:&lt;br /&gt;&lt;br /&gt;t1 λn.t2 → [t1/n]t2&lt;br /&gt;t1 → t1' ⇒ t1 t2 → t1' t2&lt;br /&gt;t2 → t2' ⇒ t1 t2 → t1 t2'&lt;br /&gt;&lt;br /&gt;Only the first is of any real substance.  The last two say that an application steps to a particular point if one of its sub-terms steps to a particular point.&lt;br /&gt;&lt;br /&gt;Essentially, these evaluation rules are a vector field, or a system of equations.  They tell you, for a given point, what the next point is.  This is important, of course, because we do need to know where to go.  We can even apply a logic system that operates over the syntax of the language to restrict the possibilities to programs that never have undefined behavior (aka a type system).&lt;br /&gt;&lt;br /&gt;In my previous entries, I discussed concurrency and linearizability (and its similarity to limit cycles in chaos).  For this, it's nice to know the difference between our present state and the next one.  For this, we need differentiation.&lt;br /&gt;&lt;br /&gt;So an interesting thing might be to take a language, and develop both it's operational semantics &lt;span style="font-style: italic;"&gt;and&lt;/span&gt; its derivative semantics, and develop some theorems relating them to the equivalent concepts in calculus.  What's also nice is that the corresponding type judgments over derivative semantics give a very straightforward notion of linearization points (see below).&lt;br /&gt;&lt;br /&gt;More on this later.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-6491847729183765083?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/6491847729183765083/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/languages-and-derivatives-of-evaluation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6491847729183765083'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6491847729183765083'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/languages-and-derivatives-of-evaluation.html' title='Languages and Derivatives of Evaluation Semantics'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-1013898591211846535</id><published>2009-06-16T15:45:00.004-04:00</published><updated>2009-06-16T17:15:45.863-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='linearizability'/><category scheme='http://www.blogger.com/atom/ns#' term='type systems'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Types, Atomic Actions and Dirac's Notation</title><content type='html'>My recent work has dealt quite alot with Hoare types and how they can be used to reason about concepts like linearizability.  This has been a bit rough, for several reasons.  Hoare types are fine for reasoning about sequential consistency in programs.  For instance, in the following simple (and infomal, and probably incorrect by "true" Hoare logic) example:&lt;br /&gt;&lt;br /&gt;x ← [l] : { v : Int = [l] } Unit { x : Int = v }&lt;br /&gt;[l] ← x + 1 : { x : Int } Unit { [l] = x + 1 }&lt;br /&gt;x : { x : Int } x { }&lt;br /&gt;&lt;br /&gt;the types can be combined by the rules of Hoare logic into the type { v : int = [l] } v { [l] = v + 1 }.&lt;br /&gt;&lt;br /&gt;But in concurrency, this isn't what we want.  We don't really care about one step guaranteeing the preconditions of another.  We care about observing properties and causing effects.  It's a subtle, but distinct difference, and it seems related to what I was talking about earlier with differentiation of propositions.  Observing a property is when some proposition in the outside world causes a change in the eventual result of the term.  Causing an effect is when some proposition is changed.  So basically, you have the properties you observe, the effects you cause, and a change of types (as in, a before type, and an after type)&lt;br /&gt;&lt;br /&gt;Of all ways to represent this, Dirac's notation comes to mind.  The representation of effects would go something like this: 〈 properties observed | effects caused 〉.  So, for instance, we might type an atomic counter as 〈 v : Int = [l] | [l] = v + 1 〉.  Now, we also need to add the type of the term before the action, versus the type after.  So, the type might look something like this: Int〈 v : Int = [l] | [l] = v + 1 〉v.  This is, of course, a simplification.  In a real dependent type system, expect something like this:&lt;br /&gt;&lt;br /&gt;Π l : Int Ref.Π x : Int. Int〈 [l] = v | [l] = v+1 〉{ y : Int. y = v }&lt;br /&gt;&lt;br /&gt;The other benefit of this is that it allows sequences of atomic actions to be differentiated from individual ones.  The counter I gave above would be typed as something like (in the simpler form):&lt;br /&gt;&lt;br /&gt;Int〈 v : Int = [l] | 〉v ; v〈 | [l] = v + 1 〉v&lt;br /&gt;&lt;br /&gt;The best part of all this is that I have a pretty good idea how to get these automatically from the sequences of Hoare-typed terms and dependently-typed heaps I've been working with in all my writings on this subject, and that method is a propositional differentiation function...&lt;br /&gt;&lt;br /&gt;I told you, it all comes together in the end.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-1013898591211846535?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/1013898591211846535/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/types-atomic-actions-and-diracs.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/1013898591211846535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/1013898591211846535'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/types-atomic-actions-and-diracs.html' title='Types, Atomic Actions and Dirac&apos;s Notation'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-978031339595910624</id><published>2009-06-13T17:15:00.001-04:00</published><updated>2009-06-13T23:54:40.129-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compilers'/><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='garbage collection'/><category scheme='http://www.blogger.com/atom/ns#' term='transactional memory'/><title type='text'>The Point of It All</title><content type='html'>Garbage Collection and Transactions, in an advanced concurrent runtime environment are as essential as things like function calls, frames, dynamic linking and loading, and everything else we take for granted now.  The compiler should take a very active role in implementing and optimizing them, not just cough out the bare minimum and punch the clock.&lt;br /&gt;&lt;br /&gt;I might add, this is very much the thinking of a functional language person: the compiler bears the responsibility of making an elegant language fast.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-978031339595910624?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/978031339595910624/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/point-of-it-all.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/978031339595910624'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/978031339595910624'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/point-of-it-all.html' title='The Point of It All'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-3548702219926566354</id><published>2009-06-13T14:56:00.001-04:00</published><updated>2009-06-13T23:54:17.430-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compilers'/><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='transactional memory'/><title type='text'>A RISC Approach to Transactional Instructions</title><content type='html'>This is a different approach, taking a more low-level, RISC-based approach.  Ultimately, we'd like either instructions or intrinsics that help us implement a transactions using whatever strategy we'd like, and do so efficiently, with the ability to optimize away much of the cost.&lt;br /&gt;&lt;br /&gt;As an analogy, look at early CPU research, which was concerned with the "semantic gap".  This produced instruction sets like VAX, which were a nightmare to implement in silicon, and had the effect of forcing compilers to follow very specific idioms (and generally were very difficult to optimize in any meaningful way).  For a modern example, look at JVM.  It was built to map directly from Java expressions, and it's been a tragedy of errors ever since.  Then along came the RISC idea: present instructions that can be implemented efficiently in silicon, and semantic gaps are &lt;span style="font-style: italic;"&gt;good&lt;/span&gt;: they give you space in which to optimize.&lt;br /&gt;&lt;br /&gt;So in transactional memory, what does this equate to?  Throw out begin/end transaction, specialized styles of nesting, commit handlers, abort, and all that and start over.  What do I end up implementing when I do a pure STM system?  Logging/watching and commit/rollback, and validation.  I use some combination of these depending on exactly what strategy I'm employing (write-locking vs read-validation, lazy vs eager conflict detection, write-buffering vs undo-logging).&lt;br /&gt;&lt;br /&gt;Going down the list we need a way to say "watch for activity on this location(s)", as well as a way to say "notify people of activity on this location(s)", and a way to say "make sure no one has conflicted with my watched set".  In terms of actually doing things, we need a way to say "buffer this write" or "log the previous value", as well as a way to say "roll back/perform this action(s)"  And lastly, we need the ability to discard watched locations.&lt;br /&gt;&lt;br /&gt;I present the following:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;watch&lt;/span&gt; (r | w), &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;*&lt;br /&gt;&lt;br /&gt;Add &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt; to the watch set, alerting on read or write.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;touch&lt;/span&gt; (r | w), &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;*&lt;br /&gt;&lt;br /&gt;Notify watchers of activity on &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;discard&lt;/span&gt; (r | w), &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;*&lt;br /&gt;&lt;br /&gt;Stop watching activity on &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;validate&lt;/span&gt; (r | w), [&lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;*]&lt;br /&gt;&lt;br /&gt;Check for conflicts on all watched locations, or just &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;log&lt;/span&gt; (undo/redo), &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;, [&lt;span style="font-family:courier new;"&gt;%val&lt;/span&gt;]&lt;br /&gt;&lt;br /&gt;Log activity on &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;, either saving the old value at &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt; or logging a write of &lt;span style="font-family:courier new;"&gt;%val&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;commit&lt;/span&gt; [&lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;*]&lt;br /&gt;&lt;br /&gt;Perform all buffered actions, or just those on &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;rollback&lt;/span&gt; [&lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;*]&lt;br /&gt;&lt;br /&gt;Rollback all buffered actions, or just those on &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Perhaps &lt;span style="font-family:courier new;"&gt;commit&lt;/span&gt; and &lt;span style="font-family:courier new;"&gt;rollback&lt;/span&gt; should be combined into a &lt;span style="font-family:courier new;"&gt;flush&lt;/span&gt; instruction.  This isn't finished yet, except I don't quite like this since they do opposing things.  We need specific watch-sets and logs so we can do nesting.  We also need the ability to store logging information in SSA values, so the illustrious flatten can squish it out of the logs, as I describe in earlier posts.  The watch-sets would also give a straightforward implementation of retry functionality (perhaps we want a modified &lt;span style="font-family:courier new;"&gt;halt&lt;/span&gt; instruction).&lt;br /&gt;&lt;br /&gt;But this gives you the flexibility without forcing you into a particular regime.  It also (hopefully) gives you the space to optimize.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-3548702219926566354?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/3548702219926566354/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/risc-approach-to-transactional.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3548702219926566354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3548702219926566354'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/risc-approach-to-transactional.html' title='A RISC Approach to Transactional Instructions'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-4454833723283172525</id><published>2009-06-13T13:59:00.001-04:00</published><updated>2009-06-13T23:53:46.689-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='garbage collection'/><title type='text'>Repacking Garbage Collection</title><content type='html'>If you have a copying collector, like I do, and you want to have the compiler generate the trace code, like I do, the next logical step is to have the ability to "repack" objects as they are being collected.&lt;br /&gt;&lt;br /&gt;Take this example: you have a hash-table implementation you want to resize periodically.  The naïve way is to redo the entire table whenever you want to resize.  The slightly smarter way is to build a second table and move entries over whenever you look them up.  This is how Haskell and SML's hash tables work.&lt;br /&gt;&lt;br /&gt;But why not mark the table, and when you allocate the new table, &lt;span style="font-style: italic;"&gt;then&lt;/span&gt; extend it.  Or take something like a tree.  You can pack immutable trees into a very dense, very cache-efficient array representation if you create them all at once (think like a binary heap, with a bitmap indicating whether a given slot contains an entry or a pointer to a classical node.  This also works for structures that need their own internal garbage collection (like graphs), and lazy structures with some kind of lookahead caching.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-4454833723283172525?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/4454833723283172525/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/repacking-garbage-collection.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4454833723283172525'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4454833723283172525'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/repacking-garbage-collection.html' title='Repacking Garbage Collection'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-3720576563179766262</id><published>2009-06-13T10:41:00.001-04:00</published><updated>2009-06-13T23:53:19.739-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='I/O'/><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='threading'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>More on I/O</title><content type='html'>Disclaimer: I don't endorse having one's car booted and having to walk 2 hours in the dark to get home as a way to stimulate thought.&lt;br /&gt;&lt;br /&gt;I recalled something I had realized earlier about I/O in a highly-concurrent world.&lt;br /&gt;&lt;br /&gt;Stream-based I/O is very detrimental to parallelism.  While it's nice and convenient in a sequential world, in the concurrent world, it forces an awkward sort of serialization phase.  I found this out while experimenting with implementing a parallel compiler (as in, a &lt;span style="font-style: italic;"&gt;parallelized&lt;/span&gt; compiler, not a parallelizing compiler) when looking at the final stage, which outputs to a file.  If you insist on using streams, the result is a dramatic reduction in parallelism.  The same goes for input, for that matter.  If you can get away with storing data in a way that doesn't require stream-based parsing (say, in a relational, or even just indexed form), this is better.  So the question is, what kind of I/O do we want to be doing, exactly?&lt;br /&gt;&lt;br /&gt;On the output side, it's much cleaner to do a kind of indexed scheme.  For instance, even something as simple as writing out an ELF binary becomes much more parallel if my I/O calls are something like &lt;span style="font-family:courier new;"&gt;write(fd, index, size, data)&lt;/span&gt;.  This is, of course, because ELF files are a collection of arrays with some cross-referencing, essentially.  Trees can also be written out using this style I/O, and relations are already so parallel it practically hurts.  As far as input goes, the same style works just as well.&lt;br /&gt;&lt;br /&gt;Where all this breaks down is when you have a sequence of different-sized things.  Most of the time, this comes up in flat text files, XML data, and the like (which makes one wonder, might it be a good idea to implement some kind of OS-level XML framework that actually stores everything in some kind of ELF-like binary form, and pretends to be text when it needs to?).  Even worse is when you have a sequence of different-sized things, and you index into it.  This is the case with JVM class files (yet &lt;span style="font-style: italic;"&gt;another&lt;/span&gt; bad JVM design choice).&lt;br /&gt;&lt;br /&gt;The point is, we want to move away from sequences and streams and move towards trees, arrays, and relations.  This all reminds me of a talk given by Guy Steele, which basically said the same thing about data structures (lists are bad, trees are good).  Using the indexed I/O model, we can either build up buffers and flush them in very large writes (which lends itself naturally to having little threads write into a buffer and an I/O thread periodically sync it to disk), or we can &lt;span style="font-family:courier new;"&gt;mmap&lt;/span&gt; and work directly.  The &lt;span style="font-family:courier new;"&gt;mmap&lt;/span&gt; approach is potentially dangerous, though, depending on exactly how the OS handles &lt;span style="font-family:courier new;"&gt;mmap&lt;/span&gt;ed NFS files.  What we don't want is the executors getting hit with page faults and blocking while the OS fetches data remotely.  And while we're on the subject, we probably don't want to wait on disks either, in a concurrent setting.  Not when we can just grab some memory and go to work.&lt;br /&gt;&lt;br /&gt;Of course, there's more to I/O than shipping data to disk.  We have, among other things, the filesystem maintenance operations and network traffic.  Regarding the second, if we're in a highly-concurrent setting, chances are we want some kind of event-based or reactive model, rather than the single-processor "read a request, process it, write a response" approach.&lt;br /&gt;&lt;br /&gt;The "maintenance" operations are the annoying ones.  These are the operations that &lt;span style="font-style: italic;"&gt;should&lt;/span&gt; be effectively function calls, but because of NFS mounts and the shortcomings of POSIX when it comes to distributed filesystems, they can potentially block.  Having no better alternative, I'll default to the older idea: one I/O thread per executor, plus whatever buffer-maintainers and socket/event-watchers you end up creating.&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-3720576563179766262?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/3720576563179766262/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/more-on-io.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3720576563179766262'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/3720576563179766262'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/more-on-io.html' title='More on I/O'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-4306337025957340792</id><published>2009-06-12T22:35:00.002-04:00</published><updated>2009-06-13T23:52:17.553-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compilers'/><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='garbage collection'/><category scheme='http://www.blogger.com/atom/ns#' term='transactional memory'/><title type='text'>Current Thoughts on the GC/Transaction Intrinsic Set</title><content type='html'>These are my current thoughts on intrinsics for the combined GC/Transaction stuff I want to do some day:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;log&lt;/span&gt; (gc | tx), (r | w), &lt;span style="font-family:courier new;"&gt;%ptr&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Perform a logging action for %ptr.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;gc_alloc&lt;/span&gt; type, options&lt;br /&gt;&lt;br /&gt;Allocate something from GC'ed memory.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;safepoint&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Possibly deschedule.  Possibly switch to allocator.  These must be placed by the compiler so that no more than some number of instructions get executed between each one.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_begin&lt;/span&gt; (open/closed)&lt;br /&gt;&lt;br /&gt;Begin a new transaction, with closed or open nesting.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_validate&lt;/span&gt; (r | w) [ &lt;span style="font-family:courier new;"&gt;%ptr*&lt;/span&gt;]&lt;br /&gt;&lt;br /&gt;Validate the given pointers, or the entire read/write set.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_discard %ptr&lt;/span&gt;*&lt;br /&gt;&lt;br /&gt;Discard the given pointers from the read set.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_commit&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Commit a transaction.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_abort&lt;/span&gt;  [num]&lt;br /&gt;&lt;br /&gt;Abort the num innermost transactions (or just the innermost).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_retry&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Roll back, then suspend until someone touches the current read set, then continue.&lt;br /&gt;&lt;br /&gt;This is the intrinsics, &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; what the compiler will end up using.  The stuff actually coming from the TM runtime will be much more minimal, primarily contention management, suspend, and wakeup stuff in the end.&lt;br /&gt;&lt;br /&gt;It should be noted that retries inside orelses are known to be evil.  I have a paper about this...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-4306337025957340792?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/4306337025957340792/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/current-thoughts-on-gctransaction.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4306337025957340792'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/4306337025957340792'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/current-thoughts-on-gctransaction.html' title='Current Thoughts on the GC/Transaction Intrinsic Set'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-8539027959351590241</id><published>2009-06-12T19:32:00.001-04:00</published><updated>2009-06-13T23:51:26.618-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compilers'/><category scheme='http://www.blogger.com/atom/ns#' term='transactional memory'/><title type='text'>Region and Effect Analysis for Optimization of Transactional Memory</title><content type='html'>One of my research activities deals with compiling, analyzing, and optimizing transactional programs.  I'm doing work for Tony Hosking's RuggedJ stuff, but also gathering intelligence and experience for doing my own LLVM work at a later time, which I expect will have much more potential, since I won't be tied to JVM, Java, and their limitations.&lt;br /&gt;&lt;br /&gt;The primitives one might see in a TM system, based on the latest research might look something like this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;open&lt;/span&gt; (r/w)&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;log&lt;/span&gt; (r/w)&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;validate&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;begin&lt;/span&gt; (open/closed)&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;commit&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;abort&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;retry&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A beneficial, but difficult optimization is to eliminate redundant open and logging operations.  This is also quite beneficial for concurrent garbage collectors, as I discuss in a previous entry.&lt;br /&gt;&lt;br /&gt;An important point here is that open operations are idempotent, and cannot be undone.  This means that once a given object is open, it stays that way until the end of the transaction, and additional opens have no effect.  Also, there is no way to "unopen" an object, aside from commiting the transaction.&lt;br /&gt;&lt;br /&gt;Simple data-flow analysis is a start, but there's another subtlety.  For instance, in the following example:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-size:100%;"&gt;%x = load %ptr&lt;br /&gt;%y = %x&lt;br /&gt;open read,%x&lt;br /&gt;open read,%y&lt;/span&gt;&lt;span style="font-family:times new roman;"&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;The second &lt;span style="font-family:courier new;"&gt;open read&lt;/span&gt; is unnecessary since the values &lt;span style="font-family:courier new;"&gt;%x&lt;/span&gt; and &lt;span style="font-family:courier new;"&gt;%y&lt;/span&gt; both point to the same thing.  This notion of "pointing to the same thing" has a name: regions.  A region is some spot in memory where some number of values are located.  It originates in the context of region types, which are able to reason about memory allocation and deallocation.  It also works for what we're doing: object-granular conflict detection (though in the LLVM stuff, I'd like to try a more intelligent strategy of grouping related shared fields together).&lt;br /&gt;&lt;br /&gt;We can track regions and note which ones have been opened for reading and writing (and which fields have been logged, or any other effect we might wish).  When we do a phi instruction, we create a new region, whose state is the lower bound of all the states of the arguments (unless all the arguments have the same region, in which case we just keep it).&lt;br /&gt;&lt;br /&gt;It's more complicated, though.  If we create the merge regions based on the SSA values being merged, we might be too conservative.  Consider this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%w = load %p1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%x = load %p2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;br  ..., label %L1, label %L2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;L1:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%y1 = %w&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%z1 = %w&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;open read %w&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;br label %L3&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;L2:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%y2 = %x&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%y1 = %x&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;open read %x&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;L3:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%y3 = phi [%y1, L1], [%y2, L2]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%z3 = phi [%z1, L1], [%z2, L2]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;open write %y3&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We end up falsely treating &lt;span style="font-family:courier new;"&gt;%y3&lt;/span&gt; and &lt;span style="font-family:courier new;"&gt;%z3&lt;/span&gt; as having a separate region, even though they don't.  The answer might seem to create the new regions based on the &lt;span style="font-style: italic;"&gt;regions&lt;/span&gt; being merged, but this doesn't quite work.  Consider:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%w = load %p1   ; w -&gt; r1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%x = load %p2   ; x -&gt; r2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;br  ..., label %L1, label %L2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;L1:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%y1 = %w        ; y1 -&gt; a&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%z1 = %x        ; z1 -&gt; b&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;open read %y1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;br label %L3&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;L2:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%y2 = %x        ; y2 -&gt; b&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%z2 = %w        ; z2 -&gt; a&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;open read %y2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;L3:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%y3 = phi [%y1, L1], [%y2, L2] ; y3 -&gt; (a | b)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%z3 = phi [%z1, L1], [%z2, L2] ; z3 -&gt; (a | b)&lt;br /&gt;open read %y3&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;open read %z3&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We would treat &lt;span style="font-family:courier new;"&gt;%y3&lt;/span&gt; and &lt;span style="font-family:courier new;"&gt;%z3&lt;/span&gt; as being the same region.  However, notice that we can get rid of the first &lt;span style="font-family:courier new;"&gt;open read&lt;/span&gt;, but not the second.  This is because the effects visited upon &lt;span style="font-family:courier new;"&gt;a&lt;/span&gt; and &lt;span style="font-family:courier new;"&gt;b&lt;/span&gt; differ based on the path by which they arrived.  Therefore, we must record the paths.  This gets nasty the moment we have loops, of course, but recall that we are dealing with monotonic, idempotent effects.  Running through a basic block 5 times is the same as running through it once.  Therefore, we can record merged regions as a "primitive region", plus a set of control-flow edges.  The reason it's control-flow edges will become apparent in the next example.&lt;br /&gt;&lt;br /&gt;There's still one problem.  Consider this loop:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;L1:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%z = %y&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;open read %z&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%y = %x&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;open read %y&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;%x = (new object)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;br ..., label L1, ...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Are &lt;span style="font-family:courier new;"&gt;%x&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;%y&lt;/span&gt;, and &lt;span style="font-family:courier new;"&gt;%z&lt;/span&gt; all pointing to the same region?  By our previous method, they are.  However, this would lead us to eliminate the &lt;span style="font-family:courier new;"&gt;open read %y&lt;/span&gt;, when in fact, we want to eliminate &lt;span style="font-family:courier new;"&gt;open read %z&lt;/span&gt;.  The problem is that we didn't account for the fact that each look iteration, a "fresh" region gets assigned to &lt;span style="font-family:courier new;"&gt;%x&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;The answer here is to associate with each region an "age" number, representing the loop iteration during which we saw it.  We disregard this age when checking to see if anything changed, for the purposes of telling if we've reached a fixed point, but we count it when merging.&lt;br /&gt;&lt;br /&gt;In summary, merged regions are a set of primitive regions, their ages, and the control flow edges they've visited.  When we merge two regions, we line up all the equal primitive region-age pairs, and union their paths).  We get the state for regions created this way by taking the lower bound of all the mergees.  We can supplement this with alias analysis.  We can treat loads from pointers we determine "must alias" as yielding the same region.  Once we have regions assigned in this way, we can eliminate extraneous operations, or perhaps do more aggressive optimizations like PRE.&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-family:times new roman;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-8539027959351590241?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/8539027959351590241/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/region-and-effect-analysis-for.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/8539027959351590241'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/8539027959351590241'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/region-and-effect-analysis-for.html' title='Region and Effect Analysis for Optimization of Transactional Memory'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-303943525871934077</id><published>2009-06-12T17:41:00.001-04:00</published><updated>2009-06-13T23:50:40.961-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='verification'/><category scheme='http://www.blogger.com/atom/ns#' term='linearizability'/><category scheme='http://www.blogger.com/atom/ns#' term='type systems'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'>Verifying Linearizability with Types</title><content type='html'>A major theme of my research for the past year has been the desire to build type systems which are able to verify that certain term (or more accurately, certain objects) are linearizable.  Linearizability means that something (an execution, a term, operations on an object) all take effect in a single atomic operation.&lt;br /&gt;&lt;br /&gt;Of course, to build a type system, we'll need to be more specific.  A term's behavior is described by a Hoare type of the form {P} T {Q}, meaning that the term assumes P, guarantees Q, and reduces to a value in T.  As I discuss in a previous entry, the term will observe some properties and change others.  If all this happens in the same step, then we have linearizability.&lt;br /&gt;&lt;br /&gt;Of course, life is always more complicated.  In optimistic concurrency control mechanisms, we usually have a read operation which gives us a value, then a validate operation, which ensures that the location hasn't changed.  In LL/SC, the write and validate operations are a combined atomic operation.  "Observation" happens at any point between the read and the validate, though we usually figure out our exact control flow at the validate.&lt;br /&gt;&lt;br /&gt;So essentially, what we have are read, write, and validate operations.  We can do any number of reads, so long as we validate them and throw away the values if validation fails.  All validations and writes must be done in a single atomic step.  But it's still more complicated than that, actually.  We're allowed to do writes that don't produce observable changes (such as rebalancing a tree), and we're allowed to skip validation on reads under less obvious circumstances.&lt;br /&gt;&lt;br /&gt;Colin Gordon and I worked out an approach for dealing with simple languages with LL/SC-style atomic memory instructions using non-dependent (yes, you heard me correctly) types, and a ridiculous amount of caffeine.  Of course, we had to combine the control-flow effects of the SC instructions with the write effects.  The type system itself works something like this:&lt;br /&gt;&lt;br /&gt;Pointers to shared data have a special "guarded" type, which is linear (meaning use-exactly-once), but special.  Instead, the entry in the type environment can be "split" into several other entries, which we call "tokens": read, write, and value, each of which are also linear.  The tokens are consumed by operations which read from, write to, and use the value of the corresponding tokens.  The type system also uses a primitive form of Hoare-types to track whether it has seen a linearization point yet.&lt;br /&gt;&lt;br /&gt;I'm glossing over quite alot here, but it's enough to get the general idea.  Now, how can this be generalized?  The tracking of whether or not we've seen a linearization point yet is also critical, since it determines our ability to make observable changes to state.  If you go from a pre-point state to a post-point state, you need to validate all reads you've done and perform any writes such that you perform one of the possible behaviors of the term.&lt;br /&gt;&lt;br /&gt;All writes that &lt;span style="font-style: italic;"&gt;aren't&lt;/span&gt; part of a linearization point need a proposition showing that they don't cause any observable effects (meaning that the before and after states are equivalent).  The ability to discard reads that have not been validated yet depends on a slightly more intricate property.  Unvalidated reads can be discarded if the result of the term is not affected by changes to the location from which the read took place.  Proving this is another matter.&lt;br /&gt;&lt;br /&gt;In Colin's and my work, we consumed the value token in a SC or check (basically a straight validation), and if the operation failed, you got kicked to a "fail" control flow path.  In a more general solution, you end up with two situations.  The first is failed validation, in which you need to prove that you don't use the value, or any value derived from it.  I've still not worked out the details of this yet.&lt;br /&gt;&lt;br /&gt;The second is by proving that concurrent modifications don't affect the outcome.  For instance, in a linked list, you would have a proposition that hd(cons(a, b)) = b, so if you're traversing down a list, removing deleted values from the list won't affect the outcome.&lt;br /&gt;&lt;br /&gt;This is clearly not yet worked-out, and it won't be until I build a language and start imlementing some more complex linearizable systems.  But this is a lab-book, afterall.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-303943525871934077?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/303943525871934077/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/verifying-linearizability-with-types.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/303943525871934077'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/303943525871934077'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/verifying-linearizability-with-types.html' title='Verifying Linearizability with Types'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-7982457533601108090</id><published>2009-06-11T22:49:00.001-04:00</published><updated>2009-06-13T23:49:56.302-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='I/O'/><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='threading'/><title type='text'>The Nasty I/O Problem in Featherweight Threading</title><content type='html'>Since I devoted so much of my time to devising a featherweight threading system (aka, two-thirds of my master's thesis), it's clearly something I care alot about to say the least.  This is, of course, an understatement.  The truth is I think it's the only way forward in a concurrent world.&lt;br /&gt;&lt;br /&gt;It begins with a criticism I hear all the time: "Functional languages have this implicit parallelism, but no one's ever realized it."  The reason, is twofold.  One is pure politics which I won't discuss here.  The other is that functional languages expose an amazing level of &lt;span style="font-style: italic;"&gt;fine-grained&lt;/span&gt; parallelism.  But what we're given by pthreads and similar platforms isn't fine-grained at all.  Just how fine-grained?  Every single expression can compute its sub-expressions in parallel.  Of course, that doesn't mean we can exploit it.&lt;br /&gt;&lt;br /&gt;A fairly basic common-sense argument holds that the number of instructions it takes to launch a thread is rougly the same as the minimum number of instructions that we want that thread to execute before terminating.  Of course, we can analyze further, but the basic notion that the cheaper it is to launch threads, the more parallelism we can afford to exploit.  To parallelize a language like Haskell or Standard ML, we'd like to ideally be able to spawn threads even for medium-sized functions, consisting of 50 or so real instructions.  (Anything smaller will likely end up losing more than it gains due to caching, speculation, branch predictors, and other such effects anyway).  And herein lies the problem.&lt;br /&gt;&lt;br /&gt;Your standard 1:1 pthreads library needs to allocate a stack, make a system call to allocate a kernel thread, plus &lt;span style="font-style: italic;"&gt;its&lt;/span&gt; stack, lock/unlock mutexes, and initialize a whole lot of system-specific cruft.  The result is a cost on the order of tens of thousands of instructions.  It also costs several thousand instructions to switch between threads.&lt;br /&gt;&lt;br /&gt;The solutions I proposed weren't necessarily new; the exact way in which I &lt;span style="font-style: italic;"&gt;combined&lt;/span&gt; them was.  First, begin with a copying garbage collector: a staple in most functional languages anyway.  Copying collectors have the nice property of better memory locality, as well as extremely cheap allocation (just advancing a pointer).  With this, you can replace conventional stacks with gc-allocated frames a la Andrew Appel's CPS/closure conversion, and employing all the usual tricks to cut down allocation (flatten to squash non-recursive call trees, liveness analysis to reuse the memory of object that have died, etc).  This gets you both threads and continuations at very little cost.  You then employ the reviled M:N threading, only in a more clever way.  You can probably tell from the title that I/O is, as yet, unsatisfactorally resolved, so I'll dance around it and address the synchronization problems.  Locking mutexs potentially blocks system threads (I call them executors), which for M:N threading is &lt;span style="font-style: italic;"&gt;bad&lt;/span&gt;.  The solution is quite simple, actually: don't fight the battle at all.  Build the scheduler from lock-free data structures.  Implementing synchronization for the user-level threads without relying on OS synchronization is a bit trickier.  My solution was to rely on voluntary context-switching, which is enabled by the presence of the safepoints necessary for garbage-collection.  The result is a runtime system whose only use of locks is to implement a barrier to turn the garbage collector on and off.&lt;br /&gt;&lt;br /&gt;This is all fine in a perfect world where we just run CPU-bound jobs and go home happy.  That being said, I/O mucks it all up.&lt;br /&gt;&lt;br /&gt;The problem with I/O is that it is all done using a hopelessly outdated API.  The POSIX APIs, were built for a much simpler world: one processor, usually no preemption in the kernel, running on one machine, with no concept of a distributed system.  The resulting situation resembles very much the European liberal democracies implemented on top of an old, defunct absolute monarchy, which is kept in place by a quaint sort of nationalism, and the fact that they are too convenient both to the rich and powerful gentry as well as the young sycophants looking to inflate their reputation by paying court to the obselete.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Any&lt;/span&gt; I/O call is potentially blocking, even &lt;span style="font-family:courier new;"&gt;close&lt;/span&gt;, for the simple reason that it may be on an NFS mount.  Anyone with cursory experience in non-blocking or asynchronous I/O knows &lt;span style="font-style: italic;"&gt;that&lt;/span&gt; cake is a lie.  The only thing we &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; trust is blocking I/O, and everything potentially blocks.  Of course, we have no clean, let alone accurate way of knowing what will...  Barring all kinds of platform-specific warlock arts, this means every single I/O call has to take place off of the main executors.&lt;br /&gt;&lt;br /&gt;This gives us two options.  The first is to start a thread specifically to handle every I/O call.  The second, and far more sane, is to give the job to a set of I/O threads.  One can imagine some sort of lock-free, load-balancing multi-queue in the vein of Herlihy's old work on counting networks, or perhaps just a randomized bucket approach based on the birthday paradox.  Ideally, this should cut down the level of cross-talk between I/O threads to acceptable levels.  And hopefully I/O jobs will take long enough that the level of real contention will be minimal.&lt;br /&gt;&lt;br /&gt;The question then becomes how many I/O threads to have.  There is no clear answer.  For executors, this is fairly obvious: the same as the number of CPUs, perhaps plus some constant.  With I/O threads, it could literally be anything.  On a system with one disk, and all I/O headed there, it's clearly one.  Truly non-blocking I/O, of course, would benefit from many.&lt;br /&gt;&lt;br /&gt;My answer at the moment is simple: one per executor.  There's no particular inspiration or genius going into this, aside from the simple observation that you'll do no worse than you would if you had nonblocking I/O that actually worked, or with a system such as FreeBSD's KSE system.   Anything more clever than this gets platform-specific, arcane, and opaque very fast.&lt;br /&gt;&lt;br /&gt;Regarding I/O, executors, and garbage collectors, GHC deals with this in an elegant fashion, which I adopted for my collector.  One option when allocating is to "pin" the block.  This can be done either by a scheme I describe in one of my papers, or simply by farming out the request to a lock-free mark/sweep collector.  The result is an immobile block of garbage-collected memory, which can be used directly in I/O.  You can pop this into a lock-free I/O queueing mechanism and employ the same wait mechanisms I use to implement user-level mutexes to wait on the job to finish.&lt;br /&gt;&lt;br /&gt;One of the best arguments I can come up with for all this is that it maps very cleanly onto bare metal.  The executor model, of course, is the way it actually works with multiple CPUs, the synchronization implementation ends up being very lightweight and elegant, and the I/O problems go away (though the queueing becomes far more complex).&lt;br /&gt;&lt;br /&gt;Moreover, do we really expect POSIX to stick around forever as-is?  FreeBSD has already made one attempt to switch to activations.  Mac OS is on a microkernel, and I know of at least two projects to implement Linux on a microkernel.  On the other hand, OS research these days has a terrible inertia, coupled with a tendency to get badly sidetracked.  Which is, of course, why I got out of it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-7982457533601108090?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/7982457533601108090/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/nasty-io-problem-in-featherweight.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/7982457533601108090'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/7982457533601108090'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/nasty-io-problem-in-featherweight.html' title='The Nasty I/O Problem in Featherweight Threading'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-5889780585223590867</id><published>2009-06-11T21:30:00.002-04:00</published><updated>2009-07-30T15:16:41.810-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='runtime systems'/><category scheme='http://www.blogger.com/atom/ns#' term='garbage collection'/><category scheme='http://www.blogger.com/atom/ns#' term='transactional memory'/><title type='text'>Dan Grossman was Right</title><content type='html'>For those who don't follow &lt;span style="font-style: italic;"&gt;Onward!&lt;/span&gt;, Dan Grossman published an essay talking about the relationship between garbage collection and transactional memory.&lt;br /&gt;&lt;br /&gt;To give you a background, I've been working on some compiler-based transactional memory stuff in the context of Tony Hosking's RuggedJ, and on my own vision of the high-concurrency, featherweight threading runtime environment, which I'm now trying to bring to life using the components I've already built as a runtime support system, and LLVM both as an abstract instruction set and the arbitrary compiler support I need to make the whole thing fly.&lt;br /&gt;&lt;br /&gt;The grand vision is that the compiler will graft in most of the code needed to do garbage collection and transactional memory, applying the fabled "sufficiently aggressive optimizations", which I sincerely hope will get the real costs of transactional memory down low enough to where it consistently beats locking synchronization.&lt;br /&gt;&lt;br /&gt;Part of this, of course, involves creating a set of intrinsics, which your compiler frontend must generate.  LLVM's garbage collection intrinsics are... almost... good enough.  But they are just lacking enough when it comes to the kind of highly-concurrent, compiler-mangled stuff I want to do.  To give you an example, LLVM gives you:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;llvm.gc_read&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;llvm.gc_write&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;llvm.gc_root&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Where &lt;span style="font-family:courier new;"&gt;gc_root&lt;/span&gt; marks a local variable (stack-slot) for visit by the collector.  This is fine for your quaint, cute, lock-based, stop-the-world type collectors.  But that's exactly the sort of non-concurrent barbarism we're trying to move beyond!  Add in the desire to throw in STM at a future date, and it's better just to scrap theirs and build my own.  Besides I really don't have much of a desire to do anything with LLVM other than pillage it for compiler machine parts anyway...&lt;br /&gt;&lt;br /&gt;Specifically, I need a couple of things.  First, I need to separate logging from the actual read/writes.  I also need safepoints, at which actual collection can take place.  The idea is you want only one logging action for each location that is accessed between the access and the next safe point.  Enter difficult compiler analysis #1 (Eliot and I found out just how difficult...)&lt;br /&gt;&lt;br /&gt;Second, I don't have stacks.  The basic assumption is you have a frame pointer which points to an object which gets traced just like everything else.  The basic assumption is that everything gets GC-allocated.  Of course, the flatten optimization that you find in basically every functional language compiler on the planet will squash the all the frames for a non-recursive call tree into one (and my calling conventions allow this).  Or you can figure out when a C-style stack actually is better and use it instead.  But the point is that the gc_root is totally unnecessary.&lt;br /&gt;&lt;br /&gt;In a distant third, allocation can't be done with a function.  There's more state variables that get carried around, used for allocation, and passed into safepoints to be modified.  This is cruft, but it's important cruft.  All the same, I can probably have LLVM stick it all in for you, assuming you actually have a gc_alloc intrinsic.  And of course, gc_alloc needs to know the type of the thing it's allocating since I do everyhing with type signatures.&lt;br /&gt;&lt;br /&gt;So I have something along the lines of  this instruction set:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;gc_alloc&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;gc_log&lt;/span&gt; (r/w)&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;safepoint&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;And the normal LLVM load/store stuff.&lt;br /&gt;&lt;br /&gt;Now this is where it gets interesting.  At some point, I'll want to add TM intrinsics.  The basics of what you want are this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_begin&lt;/span&gt; (open/closed)&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_validate&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_commit&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_abort&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_retry&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;tx_log&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;And again, the usual LLVM load/store stuff.  Notice the log.  You now have two of them: one for the collector and one for the transactional memory.  It gets deeper...&lt;br /&gt;&lt;br /&gt;In the library-based world, you allocate objects, build up this list of logged locations, and then process it when you abort (or commit) a transaction.  In GC world, you do the same.  All of this is opaque to the compiler.  But if you have a compiler that you can make do as you wish, you can do things differently.  The compiler can stash the log itself in an SSA value, carrying it along as it goes.  How is this special, you ask?  Once again, apply the illustrious flatten.  Whenever possible, you'll squish as much of the log out into SSA values, which will in turn be stashed in registers where possible and frame slots when not.  Now, spit out the code to grind through the log directly into the program itself, and apply all your other optimizations.  Result: much better than calling out to a library (I hope).&lt;br /&gt;&lt;br /&gt;What's the point of all this digression?  You'll have different behaviors depending on whether or not you're logging a transactional-only, gc-only, or gc and transactional thing.  You also &lt;span style="font-style: italic;"&gt;really&lt;/span&gt; don't want to be having a dual regime when it comes to logging, since you'll be duplicating a whole lot of stuff, and gobbling up alot of registers in the process.&lt;br /&gt;&lt;br /&gt;Also, the TM &lt;span style="font-style: italic;"&gt;really&lt;/span&gt; needs to know about and affect the placement of safepoints.  Safepoints represent a potentially huge gap in time, so the transaction probably wants to validate immediately before and after.  You'll also want at least some kind of notification to the contention manager, and in the future, we can imagine some kind of crosstalk between the TM system and the scheduler (I'm running a transaction that's logging A, B, and C, so it'd be really nice if you don't schedule anyone who walks all over them...)&lt;br /&gt;&lt;br /&gt;Last and not least, safepoints can make the impossible possible.  In a naive world, you can't do undo-logging with lazy conflict detection, because you might see inconsistent states.  Likewise, some old lock-free TM implementations had the same problem.  Let people see inconsistent states, and you get the potential for wacky control flows that shouldn't happen, loops that should terminate not terminating, and I/O side-effects that shouldn't happen happening (I/O is to transactional memory what nuclear waste containment is to nuclear power).  In an I/O-less world, where you don't have some of the less-cooperative control-flow constructs (like continuations or exceptions), you can deal with the problem by using a timer interrupt to break and check consistency every so often.  Safepoints are the lovely alternative to this at a fraction of the cost.  Toss your safepoints around well enough, and you can actually do things like lazy detection with undo logging, or revive those old lock-free STMs...&lt;br /&gt;&lt;br /&gt;Lastly, there's potential for some really nasty or beneficial interactions between GC and STM, depending on how naïve or smart you are.  For instance, if you allocate inside a transaction, you &lt;span style="font-style: italic;"&gt;don't&lt;/span&gt; want to discard the object and allocate another one just because you got aborted.  Likewise, GC logging inside a transaction is likely to lend itself to certain subtleties.  Start throwing in stuff like boosting, and... I'm getting ahead of myself.  I'm sure there's probably several papers worth of stuff you could do here.&lt;br /&gt;&lt;br /&gt;The last thing I'll mention is that alot of transactions can probably be compiled into simpler implementations.  For instance, single-word transactions don't need a full TM suite; you can just do them with CAS or LL/SC instructions.  Likewise, read-only transactions can be transformed into an atomic snapshot.  Negotiating the interactions between these and GC logging will require more care in the compiler.&lt;br /&gt;&lt;br /&gt;So in summary, Dan Grossman, you were right.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-5889780585223590867?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/5889780585223590867/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/dan-grossman-was-right.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/5889780585223590867'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/5889780585223590867'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/dan-grossman-was-right.html' title='Dan Grossman was Right'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-1360640398509362839</id><published>2009-06-11T18:49:00.000-04:00</published><updated>2009-06-11T21:24:04.929-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='type systems'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrency'/><title type='text'>Proposition Spaces</title><content type='html'>I was on the road back from Brattleboro, thinking over my work on structural theories of atomicity, synchronization, and the like.  I'll do a brief overview, since I'm writing the tech report on all this right now, and trying to turn it into a conference paper.&lt;br /&gt;&lt;br /&gt;Concepts like atomicity, lock-freedom, and the like exist only as conceptual descriptions at the moment, but nothing you could use to build a type system or a model checker.  To do that, you need to describe these concepts using structural rules.  This is what I've been working on recently (one of the things, at least).  My approach is to develop a theory of concurrency which operates on executions, or histories of states as a program is executed.  A concept like "atomicity" can be stated as a structural property of executions, similar to the way a type judgment is a structural property of a program's syntax.  You can also state things like synchronization rules, memory consistency models, and database serializability properties this way.&lt;br /&gt;&lt;br /&gt;All this heavily depends on dependent Hoare type theory.  Dependent types let you state arbitrary properties of a term, like "an integer less than 4, a hash table containing x but not y", and so on.  Hoare types extend this with pre- and post-conditions, written {P} T {Q}, where P is a condition that must hold before executing a term, and Q is the condition that holds after.  T, of course, is the term's type.  All these can contain arbitrary predicates, so you can have something like "delete : { x ∈ t } t : Table → () { x ∉ t }", which describes the exact behavior of delete.&lt;br /&gt;&lt;br /&gt;How does all this relate to concurrency and specifically, atomicity?  These types describe behaviors.  Some operation has a type {P} T {Q}, meaning that it changes the state of the world from P to Q and evaluates to T, or else {P} T {P}, meaning that it observes P, and evaluates to T (recall all these can contain arbitrary predicates).  I make heavy use of a thing called a "definitive type" which is essentially the most restrictive type you can give a term.  So the definitive type of a "contains" function on a set might be&lt;br /&gt;&lt;br /&gt;contains : {x ∈ s} s : Set → true {x ∈ s} ∨ {x ∉ s} s : Set → false {x ∉ s}&lt;br /&gt;&lt;br /&gt;Notice the disjunction operator ∨.  If you take all the behaviors combined with this operator, you have a bunch of definitively typed possible behaviors.  Each of these is a threshold.&lt;br /&gt;&lt;br /&gt;Assuming your type system is sound, then a given term's definitive type represents a monotonically decreasing set as the term gets executed, meaning that its type encompasses a smaller and smaller set of values.  (If the type ever &lt;span style="font-style: italic;"&gt;grows&lt;/span&gt;, then your type systems lies)  Here's the key point: once your term becomes a subtype of only one of its possible behaviors, the outcome is now fixed.  This is analogous to what Herlihy calls a "univalent state".  Prior to this, your term is a supertype of some set of the possible behaviors, which is analogous to a "multivalent state".&lt;br /&gt;&lt;br /&gt;Now, finally on to what I was thinking about.  When I wrote the logic rules for spotting linearization points, good behavior of locks, and such, I remarked that it was awkward, because you have to operate on pairs of states in order to say exactly what parts of a term got reduced.  What would be really nice is a differentiation function (Harrison, here's your precious taking the deriviative of a data structure coming up soon).  To do this you need several things: 1) what part of a term got eliminated, 2) what changed in the concrete state, and lastly but not least 3) what changed in abstract state (meaning what propositions' truth changed).  3 subsumes 2, since concrete states are just sets of very uninteresting propositions (address 1 holds true, 2 holds the address of 4, ...)  Types are also propositions in a dependent system, and as we've already stated, they are monotonically decreasing, so all we only ever lose propositions.  The propositions representing abstract states, of course, can change arbitrarily.&lt;br /&gt;&lt;br /&gt;So what we're essentially setting up here is a sort of "propositional space".  A point in this space is the truth assignments to all propositions.  A vector in this space is our differentiation function.  A program represents a vector field in this space (or a system of equations for math people).  Lastly, an execution represents a single path.&lt;br /&gt;&lt;br /&gt;Of course, all this would be just a mind-bending abstraction if it weren't for what else I'm working on: chaos and computing.  You see, when you start to think about things in this way, it suddenly starts to look very much like concepts in nonlinear dynamics.  You have your vector fields, bifurcations, limit cycles, stable and unstable fixed points, and so on.  Undecidability implies that you probably can't predict the outcome or the position in propositional space past a certain point.  Most importantly, you have an infinite number of propositions represented in a single point,&lt;span style="font-style: italic;"&gt;&lt;span style="font-style: italic;"&gt; &lt;span style="font-style: italic;"&gt;but for every program, these are determined by a finite number of propositions&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;.  This is analogous to high-dimension systems which are actually determined by a small number of real dimensions.&lt;br /&gt;&lt;br /&gt;Just as Euclidean space is defined by some set of vectors, it would seem that a propositional space would be defined by a set changes in propositions.  Figure out what these are for a program, and you can probably tell an awful lot about its behavior at any given point.&lt;br /&gt;&lt;br /&gt;I also wonder about the exact relationship between a nonlinear system's being chaotic and a program's outcome being undecidable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-1360640398509362839?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/1360640398509362839/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/proposition-spaces.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/1360640398509362839'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/1360640398509362839'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/proposition-spaces.html' title='Proposition Spaces'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5817094669338788959.post-6604761165244729161</id><published>2009-06-11T18:44:00.000-04:00</published><updated>2009-06-11T18:49:32.720-04:00</updated><title type='text'>The Beginning</title><content type='html'>There will probably be a flurry of activity as I put up the more pertinent themes in my research, opting for the things I haven't yet transcribed into papers.  Some things may be out of order, as I'm more interested in getting newer ideas out than describing old ones that I've already hashed out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5817094669338788959-6604761165244729161?l=idealogbook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://idealogbook.blogspot.com/feeds/6604761165244729161/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://idealogbook.blogspot.com/2009/06/beginning.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6604761165244729161'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5817094669338788959/posts/default/6604761165244729161'/><link rel='alternate' type='text/html' href='http://idealogbook.blogspot.com/2009/06/beginning.html' title='The Beginning'/><author><name>Eric McCorkle</name><uri>http://www.blogger.com/profile/06502551170532764179</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
