“Back in the day…” Don’t you hate such phrases? I do, or at least, I used to. As I get older, however, I find myself using them more often, strangely enough, and my antipathy is being forced to subside. Here’s an example where I would like to apply such a phrase: when I was working at Deutsche Bank, a loooong time ago, we used to snicker, in a self-pitying sort of way, at the acronym “RAID“. If you go follow that link, you will see that this is widely accepted to mean “Redundant Array of Independent Disks”, but to its credit, the Wikipedia article does acknowledge that the Thought Police of Political Correctness had been by to visit, and makes a passing reference to what the acronym really means: “Redundant Array of Inexpensive Disks”. Why does that matter? Well, you need to know what the acronym really means to understand why a handful of pale, geeky sys admins at Deutsche Bank, 15 years ago, were feeling sorry for themselves. See, those pale geeks were responsible, among other things, for uncrating, installing, and taking tender care of any number of RAID boxen. And they got to see the invoices for them, too. And those boxen were anything but “inexpensive”, by our st…
“Back in the day…” Don’t you hate such phrases? I do, or at least, I used to. As I get older, however, I find myself using them more often, strangely enough, and my antipathy is being forced to subside. Here’s an example where I would like to apply such a phrase: when I was working at Deutsche Bank, a loooong time ago, we used to snicker, in a self-pitying sort of way, at the acronym “RAID“. If you go follow that link, you will see that this is widely accepted to mean “Redundant Array of Independent Disks”, but to its credit, the Wikipedia article does acknowledge that the Thought Police of Political Correctness had been by to visit, and makes a passing reference to what the acronym really means: “Redundant Array of Inexpensive Disks”. Why does that matter? Well, you need to know what the acronym really means to understand why a handful of pale, geeky sys admins at Deutsche Bank, 15 years ago, were feeling sorry for themselves. See, those pale geeks were responsible, among other things, for uncrating, installing, and taking tender care of any number of RAID boxen. And they got to see the invoices for them, too. And those boxen were anything but “inexpensive”, by our standards. They cost more than a month’s pay, in many cases. And those pale geeks would have loved to have had one, but there was just no way their wives were going to allow them to spend a month’s pay on such a thing.
The thing is, though, that from the perspective of an entity like Deutsche Bank, they were inexpensive: dirt cheap, even.
So what? Well, therein, methinks, lies the enterprise answer to the apparently insoluble problem of data portability in the cloud.
Figure 1
Confused? Well, if so, bear with me.
Data portability, with regard to the cloud computing hype wave, is one of those terms that’s near the frothing crest — the bloodiest part of the wave’s bleeding edge. There are several things that people worry about:
- Lock in: if I store my data with you, will it be stored in such a way that makes it hard for me to take it back and store it somewhere else? Will you even allow me to take it back?
- Data mashups: if you store my data in some particular format, will I be able to easily mix it with data in other formats, probably from other providers?
- Reliability: if I store my data with you, will you guarantee to me that I will always have access to it?
- The laws of physics: what do I do after I have amassed a certain amount of data with you? If I have stored petabytes of data with you, is it even conceivable that I could ever “take it back”? Given the bandwidth of the pipe you provide to my data (and this is limited, if nothing else, by the speed of light), how long would it take me to pull my data back out? Is it reasonable to assume that I would ever be able to pay that price?
And so on.
But you know what? I think all of these concerns are just plain silly. I think there’s a blindingly obvious way to sidestep every single one of them, and I also think it is the only sensible way for an enterprise to plan on using the cloud.
Figured it out? Wait. Here’s a picture.
Figure 2
Still not seeing it? No worries — here’s another picture:
Figure 3
Got it now? It’s simple, really. The acronym that serves as the title of this post — RAIC (pronounced “rake”) — stands for “Redundant Array of Independent Cloud providers”. And it’s a way of using the Cloud that makes most concerns about data portability effectively moot. See, most, if not all of the speculation around data portability is predicated on the assumption that you store your data in just one place — with just one provider at a time. Break that assumption, and the bulk of the data portability problem just goes away. [1]
So how would RAIC work? Pretty simple, actually. You just store all of your data, all of the time, in the buckets of multiple cloud providers. All data, all the time. When you write a new piece of data, you write it to all of them. What does that get you? Well, let’s go through the bullet points again:
- Lock in: what lock in? Don’t like one of your providers any more? Dump ’em, get another.
- Data mashups: actually, this is the trickiest one. Allow me to defer this — I’ll come back to it, I promise
- Reliability: like RAID, reliability becomes a function of the number of
disksproviders. If one fails, you have ways to recover, without any catastrophic disruption of service. - The laws of physics: can’t beat the laws of physics? Well, then, don’t try. Instead, treat your data as disposable. Tired of a provider? Want to dump them? Then do so — and simply send them a delete job request on your way out the door. No need to move your data anywhere. Now, this implies that you’ve extracted a clear, reliable, contractual agreement from your provider, and you are reasonably certain that “delete” really means “delete”. But, trust me on this one — that’s going to be easier to achieve than defeating the laws of physics.
OK, you say, this is starting to make sense, but dude! Isn’t this incredible overkill? Ha! I chuckle in your general direction (to abusively paraphrase Monty Python), and refer you back to the anecdote that I began this post with. If it seems like ridiculous overkill to you, then I submit that you are probably not Deutsche Bank. Or a comparable entity. The point being that, from the point of view of a large enterprise, no. No, this won’t seem like overkill. It will seem bloody obvious.
Hmm, OK, you say. But what about data mashups? You said that was tricky, and that you’d come back to it. Yep. Well, flip back up to Figure 3. The label “RAID 5” there could be understood to represent the RAID controller; a piece of hardware (sometimes software) that mediates between the operating system and the actual physical disks. To the OS, the result looks like one single volume. Right? So now flip your gaze back up to Figure 2. See that “Marketplace / Broker / Orchestrator” thingie there? That’s the equivalent of the RAID controller. It does the following things:
- It makes the N cloud provider data buckets look like one
- It mediates the various data formats. It’s one hell of a RAID controller — it’s as if a single controller could have SCSI, IDE, SATA and FibreChannel disks attached to it. The facade of the controller abstracts these details away
Wow, OK, you say. Sounds like that “Marketplace / Broker / Orchestrator” thingie is pretty complex. Where do I get one? Well, that’s a fair question. There is no such thing, at the moment — at least not that I’m aware of. If you know of something that could do this, please weigh in with a comment. There are some things that are getting close: Zimory, for example. There are some others. But nothing I know of explicitly addresses this sort of a data storage problem, certainly not in this way.
Why not? I dunno. Everybody’s been pretty busy with other things, I guess. Will they? Will there be such a thing? I think so. Why? ‘Cause enterprisey wants it, that’s why. And just like Deutsche Bank, all those years ago, they have the resources to pay for what they want. And, for the record — although Figure 2 implies the the RAIC controller is a service, in the cloud itself, this won’t be the only way to crack this particular nut. In some cases, those enterprisey entities will insist on having that thingie on-premise, for reasons ranging from sensible concerns to simple paranoia. And in some of those cases, moreover, the enterprisey entity may well end up rolling its own implementation (nothing like reinventing a wheel to be absolutely certain that it’s round).
RAIC — Redundant Array of Independent Cloud providers. Remember; you heard it here first.
UPDATE: no, perhaps you didn’t. In Simon Wardley’s excellent talk on commoditisation, he likes to point out that for almost any “invention” you can think of, it’s usually possible to find some example of it having been invented by someone else, sometime earlier. That just seems to be part of the pattern of innovation — when an idea is “ripe”, it will emerge, often in more than one person’s head. In mid-December of last year, Chris Evans coined the term “RAIC”, and wrote about exactly the same idea as I have. I assure you — when I wrote this post earlier this evening, I had neither read Chris’s posts, nor seen anyone else use the term “RAIC”, anywhere else. If I had, I would have happily referred to it, as support for the general idea. So, in that respect, I’m glad that Chris pinged me on Twitter, and pointed his posts out to me. But to be clear — I had (and have) no intention of “stealing” Chris’s ideas. We merely came to the same conclusion — which, given the fact that I think it is a fairly obvious way of solving certain problems, should surprise no one. It certainly won’t surprise Simon. 😉
[1] Note that this doesn’t mean that I’m not an enthusiastic supporter of the “data portability” movement. Microformats, RSS, etc. — all awesome. More please. I just don’t think they have anything to do with lock in, or reliability, nor are they a proper solution to struggles with the laws of physics. Such standards are tools for developers, to make better software. RAIC, on the other hand, is all about making lock in, reliability, and struggles with the laws of physics non-problems. OK?