I wrote earlier on the subject of tasks vs threads. The basic idea was of threads as a unit of execution, and tasks as a unit of executability. Put another way, threads represent separate time streams, where tasks represent separable control flows. The implication is that priorities and the epoch property matter only in the scheduling of threads; tasks by contrast can be scheduled using the classic, dumb FIFO (or LIFO) methods.
The truth is more complex. I'd like to do event-handling in an edge-triggered, continuation-handled manner. That is, you set a continuation to be executed in response to some particular kind of event (for instance, I/O completion, or availability of data on a socket). There are many reasons for this, which I'm not going to discuss in full detail here. The major reasons are 1) it's simple, natural, and close to what hardware actually gives you (even if OS APIs make it quite hard), 2) it avoids locking which would be necessary in a polling (or should I say level-triggered) style, yet it does not inhibit one's ability to implement such a style, 3) it provides just as fine an interface for long-running compute-bound tasks (particularly FFI calls) and 4) it provides a means to identify control flows originating from system events, which is important, as I'll discuss momentarily.
The lock-free scheduler system I designed has a very carefully-designed API designed to facilitate a kind of voluntary context switching used in conjunction with mailbox communication. Changes in thread state aren't acknowledged until a safepoint, which enables a pattern of the form "set myself to sleep status", "change some part of state", "safepoint". For example, in monitor-style programming, I'd mark myself as sleeping, append myself to the condition variable's listeners, unlock the mutex, and then execute a safepoint. If another thread managed to catch me before the safepoint and wake me back up, then I never go to sleep (obviously, in the real world, I want to spin-wait for the expected time cost of executing a safepoint, sleeping, and waking up again).
This all ties nicely into edge-triggered style events. Tasks are really just continuations, which plays nicely into the idea of registering continuations to be run in response to events. However, you don't want to hijack the I/O thread (or interrupt handler) to do lots of computation. You want the I/O system constantly handling various events. One could just take the continuation and shove it into the set of runnable tasks. However, tasks have no notion of time or priority, and we'd like to keep it that way, or as close to it as possible (since such notions come with a cost).
Ideally, tasks responding to I/O should modify data structures, possibly schedule some I/O, and then check out. The problem is, this could involve potentially time-consuming data structure operations, or operations on lock-free structures which strictly speaking aren't guaranteed to ever finish. Moreover, we'd like to harness parallelism for the completion of I/O tasks too.
The answer may lie in the safepoint mechanism. We can execute a set number of safepoints (or perhaps, for a set amount of time), then kick the task off the I/O timestream and onto the normal execution system. Ideally, this should allow short-running responses to do their jobs, but won't risk hijacking the I/O timestream for long-running responses. This requires us to split the task execution system into a time-sensitive and time-insensitive one, but that may not be a bad thing.
I will write in more detail as I design more of the system.
Tuesday, September 27, 2011
Thursday, July 28, 2011
Some More PL Ideas
These are some ideas I've had for a while, that I want to get down.
1) Move away from the linear string metaphor of program representation. Toward what exactly? One idea I've had is to something more resembling circuits and components. The idea is you'd wire together what basically amounts to a big dataflow graph, consisting of various primitive elements. The "pins" would carry dependent type information. Specifying the behavior of the components themselves is a bit more of a task.
You might be able to adopt concepts from signal analysis in EE here. Time-invariant signals are represented by pure functions, time-variant ones by Arrows. Monads (aka effects) and comonads (aka streams) are what muck everything up. Note: I need to write that article about how to explain Big Scary PL terms like monads and comonads in simpler terms.
[update]: You could probably build a really nifty programming environment with a multitouch interface. Also, I noticed alot of cryptographic algorithms are represented in this way.
2) Languages whose execution model is something other than first-order term rewriting or von Neumann-style execution. For example, theorem proving, term rewriting, algebraic normalization, convergence in a topology, manipulation of categories, and the like. This has all been done (I think), but for the most part, not in any nice way. I know there's a language like Pure, which does general term-rewriting.
Observation: a working example of this would, of course, be relational databases. Also datalog.
3) Languages with different notions of the data on which they operate, other than discrete values. For example, a language that can operate on probability distributions. Perhaps try compiling down to sets of equations, or representations like baysean networks and use the algebraic reduction-as-execution metaphor from above to attempt to derive the resulting distribution. Possible huge application: finance. Use this to compute expected outcomes of a particular strategy, or position. (I know Goldman Sachs does/did something like this; they told me about it back in 2008, and it's probably why they didn't need a bailout)
1) Move away from the linear string metaphor of program representation. Toward what exactly? One idea I've had is to something more resembling circuits and components. The idea is you'd wire together what basically amounts to a big dataflow graph, consisting of various primitive elements. The "pins" would carry dependent type information. Specifying the behavior of the components themselves is a bit more of a task.
You might be able to adopt concepts from signal analysis in EE here. Time-invariant signals are represented by pure functions, time-variant ones by Arrows. Monads (aka effects) and comonads (aka streams) are what muck everything up. Note: I need to write that article about how to explain Big Scary PL terms like monads and comonads in simpler terms.
[update]: You could probably build a really nifty programming environment with a multitouch interface. Also, I noticed alot of cryptographic algorithms are represented in this way.
2) Languages whose execution model is something other than first-order term rewriting or von Neumann-style execution. For example, theorem proving, term rewriting, algebraic normalization, convergence in a topology, manipulation of categories, and the like. This has all been done (I think), but for the most part, not in any nice way. I know there's a language like Pure, which does general term-rewriting.
Observation: a working example of this would, of course, be relational databases. Also datalog.
3) Languages with different notions of the data on which they operate, other than discrete values. For example, a language that can operate on probability distributions. Perhaps try compiling down to sets of equations, or representations like baysean networks and use the algebraic reduction-as-execution metaphor from above to attempt to derive the resulting distribution. Possible huge application: finance. Use this to compute expected outcomes of a particular strategy, or position. (I know Goldman Sachs does/did something like this; they told me about it back in 2008, and it's probably why they didn't need a bailout)
Wednesday, July 27, 2011
Scheduling for Heterogenous Multi/Many Core Architectures
I've been working on an approach to dealing with heterogenous multi/manycore architectures. These are architectures in which 1) communication costs between execution units (cores) is not uniform, 2) processing capabilities of cores is not uniform, and 3) communication costs with different sections of memory is not uniform.
Scheduling tasks to run on these is quite difficult, especially in any kind of dynamic fashion. Examples of these architectures include the IBM Cell and Tilera Tile64 architectures. Cell has 8 SPU's with their own private memories and a fast interconnect, and Tile64 has an 8x8 grid of cores with busses connecting adjacent cores. An example of a non-uniform memory architecture would be IBM's P7 architecture, which shares L3 cache amongst eight cores in a non-uniform fashion. Cell and Tilera both have had a hard time when scheduled dynamically. The Cell requires expert programming, and the Tilera suffers from highly non-uniform communication costs between cores. Even P7 has many heirarchies to it: hyperthreads, cores, dies, packages, boards, blades, and boxes.
In the future, I don't see these architectures getting any simpler; rather, I see them becoming more and more complicated, and specialized, like GPU's did. Hence, my proposed solution is to take inspiration from the way we handle GPU's (and FPGA's for that matter). Rather than try to dynamically assign tasks to cores (which inherently requires centralization- an anathema to concurrency), I propose "compiling" programs to run on a heterogenous machine by splitting up tasks in a way that minimizes communication costs and maximizes computational power.
This is most assuredly NP-hard even for simple cases; however, I suspect that modern IP solvers can handle the task. IBM's CPLEX solver routinely handles problems on the order of millions of variables, and lp_solve has similar capabilities. I've devised a method for formulating the problem as an IP problem. We'll see how practical the approach is.
Scheduling tasks to run on these is quite difficult, especially in any kind of dynamic fashion. Examples of these architectures include the IBM Cell and Tilera Tile64 architectures. Cell has 8 SPU's with their own private memories and a fast interconnect, and Tile64 has an 8x8 grid of cores with busses connecting adjacent cores. An example of a non-uniform memory architecture would be IBM's P7 architecture, which shares L3 cache amongst eight cores in a non-uniform fashion. Cell and Tilera both have had a hard time when scheduled dynamically. The Cell requires expert programming, and the Tilera suffers from highly non-uniform communication costs between cores. Even P7 has many heirarchies to it: hyperthreads, cores, dies, packages, boards, blades, and boxes.
In the future, I don't see these architectures getting any simpler; rather, I see them becoming more and more complicated, and specialized, like GPU's did. Hence, my proposed solution is to take inspiration from the way we handle GPU's (and FPGA's for that matter). Rather than try to dynamically assign tasks to cores (which inherently requires centralization- an anathema to concurrency), I propose "compiling" programs to run on a heterogenous machine by splitting up tasks in a way that minimizes communication costs and maximizes computational power.
This is most assuredly NP-hard even for simple cases; however, I suspect that modern IP solvers can handle the task. IBM's CPLEX solver routinely handles problems on the order of millions of variables, and lp_solve has similar capabilities. I've devised a method for formulating the problem as an IP problem. We'll see how practical the approach is.
Labels:
concurrency,
integer programming,
manycore,
multicore
Tuesday, July 26, 2011
A Universal Core Language
One of my long-term research goals (and likely, a significant portion of my dissertation) is the design of a very powerful, very general core language and type system. To be clear, a core language is a reduced set of syntactic forms generally not meant for human consumption. The purpose of the core language is to define the semantics of the language, and to be able to prove theorems about it. A compiler will translate a program into the core language, which it will then type check and subsequently translate into IR.
One idea I do want to explore is widening the gap between concrete syntax and core. The overall idea is that concrete syntax should be designed to be "nice" for people to use, perhaps even designed by legitimate designers. However, this is a topic for another post.
The core I envision could operate as a substrate for many programming languages, and would bring a very powerful dependent type system to bear for the purposes of verification. My philosophy with regard to dependent types is that they should be a tool, but not a hindrance. A dependently-typed language should separate the decidable and undecidable portions of its typechecking, allowing proof obligations to be ignored or deferred. However, a language which generates these kinds of obligations makes verification, testing, and many other tasks significantly easier.
I want to try to combine several things that don't tend to play nicely together: 1) impredicative polymorphism, 2) subtyping, 3) full higher-order dependent types, and 4) types describing concurrent behaviors. The final goal is made possible by my own work, and integrates well with the third. The first two lead to undecidability when combined, which is usually bad, but I don't particularly care, since the pathway to the last two begins with the sign "Abandon all decidability, ye who enter here".
The real difficulty lies in designing a logic which combines the first two and the last two, and is sound. Decidability can be sacrificed; soundness cannot. And one of the first things you learn as a logician or an algebraist is that there are fundamental limitations upon what kinds of reasoning systems one can design.
But assuming I can do this, the resulting core language would be quite powerful. For one, it would be possible to verify concurrent programs with arbitrary primitives. It would be possible to embed a language with a weaker type system (say, predicative or ad-hoc polymorphism) by synthesizing the necessary subtyping proofs by using the (decidable) type checking algorithms for those type systems. A similar approach could be used with substructural types, region types, and other interesting decidable type systems. In other words, it could act as a substrate for other languages- in essence, a universal core language.
If nothing else, this would provide a convenient mechanism for getting test examples. If I can embed languages like ML, Haskell (or Java, Scala, X10, or whatever...) in this core, then I can simply translate existing programs. Of course, none of them will have any proofs of correctness for the obligations they generate, but this could be a positive, as it gives me an opportunity to show the utility of the type system I want to design.
One idea I do want to explore is widening the gap between concrete syntax and core. The overall idea is that concrete syntax should be designed to be "nice" for people to use, perhaps even designed by legitimate designers. However, this is a topic for another post.
The core I envision could operate as a substrate for many programming languages, and would bring a very powerful dependent type system to bear for the purposes of verification. My philosophy with regard to dependent types is that they should be a tool, but not a hindrance. A dependently-typed language should separate the decidable and undecidable portions of its typechecking, allowing proof obligations to be ignored or deferred. However, a language which generates these kinds of obligations makes verification, testing, and many other tasks significantly easier.
I want to try to combine several things that don't tend to play nicely together: 1) impredicative polymorphism, 2) subtyping, 3) full higher-order dependent types, and 4) types describing concurrent behaviors. The final goal is made possible by my own work, and integrates well with the third. The first two lead to undecidability when combined, which is usually bad, but I don't particularly care, since the pathway to the last two begins with the sign "Abandon all decidability, ye who enter here".
The real difficulty lies in designing a logic which combines the first two and the last two, and is sound. Decidability can be sacrificed; soundness cannot. And one of the first things you learn as a logician or an algebraist is that there are fundamental limitations upon what kinds of reasoning systems one can design.
But assuming I can do this, the resulting core language would be quite powerful. For one, it would be possible to verify concurrent programs with arbitrary primitives. It would be possible to embed a language with a weaker type system (say, predicative or ad-hoc polymorphism) by synthesizing the necessary subtyping proofs by using the (decidable) type checking algorithms for those type systems. A similar approach could be used with substructural types, region types, and other interesting decidable type systems. In other words, it could act as a substrate for other languages- in essence, a universal core language.
If nothing else, this would provide a convenient mechanism for getting test examples. If I can embed languages like ML, Haskell (or Java, Scala, X10, or whatever...) in this core, then I can simply translate existing programs. Of course, none of them will have any proofs of correctness for the obligations they generate, but this could be a positive, as it gives me an opportunity to show the utility of the type system I want to design.
Labels:
program languages,
semantics,
type systems,
verification
Further Progress on the Algebra of Concurrency
I've further developed the algebra of concurrency, to the point where it now very much resembles vector space algebra. The primary difference is that I have meets and joins instead of sums. Differences remain, and the axioms for all these constructs are simply those from set algebra.
It comes as little surprise, given the vector space algebra similarities, but I think the semantic definitions of the resulting algebra are better given using metric spaces. It is known that a metric distance metric exists for any context-free grammar; a memory state can be described in a manner similar to a vector, which means that we can derive a distance metric between states as well (it should be obvious from this that the construct is closed under product spaces). I should be able to use known results from metric spaces to derive that outer product spaces are likewise metric, and to justify all the rules. Recursively-defined states describe a monotonically decreasing sequence of open sets (a Cauchy sequence), which is known to converge to a unique fixed-point.
I'll be working to completely formalize this in the near future.
It comes as little surprise, given the vector space algebra similarities, but I think the semantic definitions of the resulting algebra are better given using metric spaces. It is known that a metric distance metric exists for any context-free grammar; a memory state can be described in a manner similar to a vector, which means that we can derive a distance metric between states as well (it should be obvious from this that the construct is closed under product spaces). I should be able to use known results from metric spaces to derive that outer product spaces are likewise metric, and to justify all the rules. Recursively-defined states describe a monotonically decreasing sequence of open sets (a Cauchy sequence), which is known to converge to a unique fixed-point.
I'll be working to completely formalize this in the near future.
Monday, July 25, 2011
A Project
I've been self-studying on various topics, including topology, category theory, domain, theory, abstract algebra, and others for a while. It occurred to me I could perhaps facilitate more rapid learning of these subjects if I undertook implementing them as programming tasks. Experience has shown me that I can learn from reading, but I learn much more readily from building things.
This would also lay groundwork for future investigations. I suspect that many of the results from these branches of mathematics could be put to use to automatically synthesize large amounts of code (ie "I noticed that your module satisfies such-and-such properties from topology; here, have a distance metric and a normalization function").
We'll see how it goes...
This would also lay groundwork for future investigations. I suspect that many of the results from these branches of mathematics could be put to use to automatically synthesize large amounts of code (ie "I noticed that your module satisfies such-and-such properties from topology; here, have a distance metric and a normalization function").
We'll see how it goes...
Thursday, July 21, 2011
Operating Systems from First Principles: Capabilities as a Replacement for System Calls
I'm going to stop indexing this series of articles, since I anticipate there will be quite a few of them. The subject for this article is something that's been on my mind for a while.
During the height of the interest in microkernels, Jochen Liedtke published a paper in which he discussed the reasons behind the performance costs of the microkernel model. It was believed at the time that the overhead of interprocess communication and the resultant context switching was to blame. However, his work suggested that the actual overhead of a context switch in and of itself was rather low; the true overhead came from checking permissions, rather than actually performing IPC or switching contexts.
In all the talks on OS-level security I've seen, the object capability model seems to be the all-round best approach. It is the only approach which is capable of guaranteeing that a program is only allowed to do what the user wants it to do. Of course, a poorly-designed capability system is intolerable. Recall the security manager on windows vista, or the third-party security programs on windows XP ("you are about to open a file. allow/deny")? A well-designed system with capabilities will harvest information on what is allowed from user interactions, and not have to ask twice. For example, if I select a particular file, then the system won't ask me again if I can open the file.
In traditional operating systems, there's a static set of system calls, which any program can use. The kernel checks to make sure that these calls are being used appropriately. Replace system calls with capabilities, and the possibilities become interesting. For one, you get a dynamically reconfigurable user/kernel interface, but that's the least of it. You also get built-in support for capabilities, and the ability to permanently constrain a processes' capabilities. But the big win, at least to me, is that you can present capabilities that are actually interprocess calls. A server process can grant such a capability to its clients, and unlike in a more traditional operating system, the kernel does not need to perform the expensive security checks to make sure the underlying IPC is valid. Possessing the capability means a process has permission. Of course, some amount of checking does need to go on, but this can be done by the server process without the need to duplicate the effort in the kernel.
During the height of the interest in microkernels, Jochen Liedtke published a paper in which he discussed the reasons behind the performance costs of the microkernel model. It was believed at the time that the overhead of interprocess communication and the resultant context switching was to blame. However, his work suggested that the actual overhead of a context switch in and of itself was rather low; the true overhead came from checking permissions, rather than actually performing IPC or switching contexts.
In all the talks on OS-level security I've seen, the object capability model seems to be the all-round best approach. It is the only approach which is capable of guaranteeing that a program is only allowed to do what the user wants it to do. Of course, a poorly-designed capability system is intolerable. Recall the security manager on windows vista, or the third-party security programs on windows XP ("you are about to open a file. allow/deny")? A well-designed system with capabilities will harvest information on what is allowed from user interactions, and not have to ask twice. For example, if I select a particular file, then the system won't ask me again if I can open the file.
In traditional operating systems, there's a static set of system calls, which any program can use. The kernel checks to make sure that these calls are being used appropriately. Replace system calls with capabilities, and the possibilities become interesting. For one, you get a dynamically reconfigurable user/kernel interface, but that's the least of it. You also get built-in support for capabilities, and the ability to permanently constrain a processes' capabilities. But the big win, at least to me, is that you can present capabilities that are actually interprocess calls. A server process can grant such a capability to its clients, and unlike in a more traditional operating system, the kernel does not need to perform the expensive security checks to make sure the underlying IPC is valid. Possessing the capability means a process has permission. Of course, some amount of checking does need to go on, but this can be done by the server process without the need to duplicate the effort in the kernel.
Subscribe to:
Posts (Atom)