Saturday, June 13, 2009

More on I/O

Disclaimer: I don't endorse having one's car booted and having to walk 2 hours in the dark to get home as a way to stimulate thought.

I recalled something I had realized earlier about I/O in a highly-concurrent world.

Stream-based I/O is very detrimental to parallelism. While it's nice and convenient in a sequential world, in the concurrent world, it forces an awkward sort of serialization phase. I found this out while experimenting with implementing a parallel compiler (as in, a parallelized compiler, not a parallelizing compiler) when looking at the final stage, which outputs to a file. If you insist on using streams, the result is a dramatic reduction in parallelism. The same goes for input, for that matter. If you can get away with storing data in a way that doesn't require stream-based parsing (say, in a relational, or even just indexed form), this is better. So the question is, what kind of I/O do we want to be doing, exactly?

On the output side, it's much cleaner to do a kind of indexed scheme. For instance, even something as simple as writing out an ELF binary becomes much more parallel if my I/O calls are something like write(fd, index, size, data). This is, of course, because ELF files are a collection of arrays with some cross-referencing, essentially. Trees can also be written out using this style I/O, and relations are already so parallel it practically hurts. As far as input goes, the same style works just as well.

Where all this breaks down is when you have a sequence of different-sized things. Most of the time, this comes up in flat text files, XML data, and the like (which makes one wonder, might it be a good idea to implement some kind of OS-level XML framework that actually stores everything in some kind of ELF-like binary form, and pretends to be text when it needs to?). Even worse is when you have a sequence of different-sized things, and you index into it. This is the case with JVM class files (yet another bad JVM design choice).

The point is, we want to move away from sequences and streams and move towards trees, arrays, and relations. This all reminds me of a talk given by Guy Steele, which basically said the same thing about data structures (lists are bad, trees are good). Using the indexed I/O model, we can either build up buffers and flush them in very large writes (which lends itself naturally to having little threads write into a buffer and an I/O thread periodically sync it to disk), or we can mmap and work directly. The mmap approach is potentially dangerous, though, depending on exactly how the OS handles mmaped NFS files. What we don't want is the executors getting hit with page faults and blocking while the OS fetches data remotely. And while we're on the subject, we probably don't want to wait on disks either, in a concurrent setting. Not when we can just grab some memory and go to work.

Of course, there's more to I/O than shipping data to disk. We have, among other things, the filesystem maintenance operations and network traffic. Regarding the second, if we're in a highly-concurrent setting, chances are we want some kind of event-based or reactive model, rather than the single-processor "read a request, process it, write a response" approach.

The "maintenance" operations are the annoying ones. These are the operations that should be effectively function calls, but because of NFS mounts and the shortcomings of POSIX when it comes to distributed filesystems, they can potentially block. Having no better alternative, I'll default to the older idea: one I/O thread per executor, plus whatever buffer-maintainers and socket/event-watchers you end up creating.

No comments:

Post a Comment