Big Data: What's a Brontobyte Again?

If you bought your first home computer in the early days – the 80s, or even the 70s if you were a very early adopter, you will probably remember when storage space was measured in kilobytes (1,000 bytes) or megabytes (1 million bytes).

The first commercially available hard disk drives made to fit popular home PCs were 5MB in size – just big enough to hold perhaps one or two songs in MP3 format today, or one large color image – and cost around $3,000.

These days, a modern smartphone is likely to hold 32 or 64GB of data – that’s around 12,800 times the capacity of those early hard disk drives, with the cost of the storage component coming in at under $50.

As data has gotten bigger, we have obviously needed bigger places to store it – and increasingly weirder words to allow us to describe these sorts of capacities in human language.

Hence, we are well into the age of the terabyte – where your average home computer is endowed with at least 1,000 gigabytes of storage capacity. And we are heading into the age of the petabyte and exobyte.

A petabyte (pb) is (usually*) 1,000 terabytes. While it may be a while before most of us need that sort of storage on-hand, industry is already frequently dealing with data on this scale. For example, Google is said to currently process around 20 pb of data per day. While this is transient data, not all of which needs (or in fact can be – see my article on the capacity crisis) to be stored, storage on this scale does happen. Facebook’s Hadoop clusters were said, in 2012, to have the largest storage capacity of any single cluster in existence, at around 100 pb.

To put that into some perspective is quite difficult, although that hasn’t stopped a lot of people from trying. One way to think of it, is that everything ever written by mankind, in any language, from the beginning of time, is thought to make up about 50 pb worth of text.

Of course text is quite easy to compress, and quite efficient to store. Most of the really big data we deal with today is likely to be in the form of pictures or videos, which require a lot more space – if you felt like watching 1 Pb of HD-quality video, for example, it would keep you occupied for a mere 13 years.

Taking into account the speed at which we moved from using kilobytes, to megabytes, to gigabytes, as our standard unit of measurement of personal storage capacity, we could expect to have a petabyte of storage in our homes within perhaps 10 to 20 years.

Which brings us onto one of the other reasons we continuously need to store more data – the increase in quality (and therefore size) of data – by the time we have Pb hard disk drives in our home PCs, we will probably be used to watching (at least) 4k Ultra HD video as standard – meaning if we fill our Pb drive with videos, we will blaze through it in just 5 years.

Beyond that, we will be looking at moving into the age of the Exabyte – 1,000 petabytes, or one million terabytes (or even a trillion gigabytes, if you want!)

No one stores information of that sort of quantity today, but annual global internet is expected to reach 950 exabytes by 2015. Which means we will very nearly need an even bigger unit of measurement to describe it – which is the zettabyte (1,000 exabytes).

With a word like that, we are equipped to talk about numbers as large as the entire size of the internet – calculated in 2013 to stand at around 4 zettabytes.

(Bear in mind, however, that it is very difficult to know how big the total internet is, as there is no single index – probably the biggest – Google’s – was said in 2010, to cover a mere 0.004% of the total internet! It probably covers a lot more now, but there are huge swathes of it cut off from its web-crawling bots which will forever be “dark” – for example, private corporate networks.)

An order of magnitude higher, we find the yottabyte – 1,000 zettabytes. No one makes yottabyte-scale storage media, so if you had a yottabyte of data (which no one does, or will have for some time) you would need to spread it across a lot of smaller disks – and at today’s rates that would cost you around $25 trillion. If you somehow had this much data stored on the cloud somewhere, and wanted to download it to your computer using current high-speed internet, it would take you roughly 11 trillion years.

Still not enough? Try the brontobyte (1,000 yottabytes) for size. Of course there is nothing in existence which is measurable on this scale. Take 1,000 of them and you have one geopbyte. There is only one way to describe this unit which comes close to conveying its scale – no one has yet bothered to think of what to call 1,000 of them.

Thinking about quantities such as zettabytes and brontobytes may seem very theoretical now, but remember, it wasn’t too long ago that Bill Gates (allegedly) said: “No computer will ever need more than 640k of memory”. This quote is now widely considered to be misattributed but the point remains, that our need for increasing amounts of storage space has grown far more quickly than we ever thought it would. So while we will probably live to see the days when it is common to carry a petabyte in our pocket, our grandchildren and great-grand-children might one day be carrying around a bronto on their bionic implants.

As always, thank you very much for reading my posts. You might also be interested in my new book: Big Data: Using Smart Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance

You can read a free sample chapter here.