Proliferation of ideas across media

Really good ideas proliferate from the medium they’re developed in. In fact, I would go so far as to say that the goodness (applicability, soundness, utility) of an idea can be directly measured by how many other things it’s infected.

A few good examples of this: Evolution. Sure, creationists don’t buy it. But creationists can fuck off and stop reading my blog — that’s a different post though. Anyway, basically everyone in the realm of science has some concept of evolution, as a useful process. While it originally came from biology, evolution as a concept has proliferated to virtually everything else — in fact, it’s such a pervasive, even OBVIOUS foundational principle of science that it’s not worth talking about.

So, I was thinking about databases and filesystems…

Now, filesystems started really really simple. Basically just simple dictionary systems … look for filename X, it’s at address Y, that’s it. Want information about the file? Go read the file for it. As time marched on, the need for more advanced, flexible systems and for tracking of metadata became apparent, and so newer filesystems added an abstraction layer — separating files from the information about them. This metadata tracking was probably first found in the unix permission model. The file itself doesn’t need to be modified to change its permissions — the filesystem keeps track of all of that for you. This was basically static metadata abstraction: you could store certain things about the files, but you couldn’t store anything outside of that — in the case of unix, you could store the basic permission bits, and that was it. But when basic permissions and info aren’t enough to efficiently define permission-sets on the file, or when you run into a new piece of info that would be useful to track? Well, let’s modify the filesystem, adding a dynamic metadata element. Sure, it’s a bit harder to implement, but now we can keep track of arbitrary attributes without touching the actual data. So I can tie a permission-set, locked status, and pretty much whatever arbitrary information you want to the file without actually storing any of it in the file. This is cool… this dynamic abstraction layer.

And databases do it too. I’m not sure which came first — filesystem or database metadata abstraction. Databases formalize the concept as “data independence” — you can ask for “name” in my table, and it doesn’t matter how the table is stored, it’ll give you back the name. This layer of abstraction adds complexity to the database engine (or the filesystem), but doing so takes away the complexity of arbitrarily redefining behaviors for an arbitrary amount of random programs that may be accessing the data.

The shared idea behind both of these is metadata abstraction… which is one of those great ideas.

But that’s not the only idea that’s traded back and forth between them. Another shared idea is algorithmic efficiency. Databases can store their information in b-trees, for example. They’ve been doing that for quite a while, with great success. With reiserfs, the same b-tree based implementation efficiently handles large numbers of files. It’s just … handy.

Then there’s the concept of transactions. Journalled filesystems implement this, as do modern ACID RDBMSes. This gives you the ability to roll back failures, and commit successful transactions, and generally know that both sides are in good states all the time.
DBMS: started unsafe (no locking, no transactions). implemented fine-grained locking and transaction mechanisms over time. Both originally lacked the concept of locking, and got it from each other — propagating it to the OS-level when the thread model landed (can’t have parallel threads without mutexes).

I’m curious about how these ideas flow though. I can think of examples where biology lends itself to CS or social sciences (sociology, psychology). I can see how physics lends understanding to chemistry, and how chemistry can help open the door to physics, and how both of them contribute ideas and techniques to biology. Maybe this flow of techniques is ultimately the hierarchy of scientific knowledge? Maybe everything that’s relevant to more than just its own field flows from a “higher” discipline. Like, I would say math is higher than CS, and many of the techniques in CS come from math, but basically nothing in math comes from CS. Much of biology flows downstream from chemistry, but does bio contribute much back to chem? I don’t really know … this is ultimately an attack on the question “what is the highest science?”

Comments are closed.