THE FUNDAMENTAL CHARACTERISTICS OF DATA
Our world runs on data. In the time it takes for you to read this sentence, Americans will have used 1.8 Million megabytes of wireless data. Who you are, what you own, who you are connected with and what you have done is a combination of 1s and 0s on a server somewhere in the world.
To better understand the concept of Data, one needs become familiar with its 4 fundamental characteristics, called the ‘4 Vs of Data’. These ‘4 Vs’ will bring us to a fifth V (spoiler alert: it’s the most important one).
Data is ubiquitous and proliferating at a pace beyond our current capacity to control it. Domo’s annual study (see attached image) aptly called ‘Data Never Sleeps’ is a small peek at the vast streams of data being created and consumed every minute. As IBM researched, every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.
The past generations of databases were formated in what we called ‘Structured’ data. Information neatly organized in fields belonging to tables with unique identifiers between them. With the advent of Social Media, and other data types, like movies, audio files, feeds, etc. we have seen the rise of ‘Unstructured’ data.
With Structured Data, a name is always text and money will always have 2 decimals. However with Unstructured Data, there are no rules. A picture, a recording, a text, a tweet, a post, a comment can be in any format and with any relationship to the subject. They express ideas and thoughts based on normal human interactions. And those are not neatly organized into a nice structured dataset.
The big challenge for ‘Big Data’ is making sense of the unstructured data sets that now make up a large part of our lives.
This ‘V’ refers to the trustworthiness of the data itself. How much can you rely on the data you receive? According to an IBM study, 1 out every 3 business leaders don’t trust the information they use to make decisions! And for good reason: poor data quality cost the US economy around 1.3 Trillion dollars last year.
The quality and incorruptibility of the datasets being transfered is integral to the right interpretation of their meaning.
Artificial intelligence is slowly making strides in identifying gaps and mistakes in flowing data. It’s far from perfect, but it’s a start.
This refers to the frequency of incoming data that requires processing. Closely tied to Volume, Velocity is the most rapidly increasing characteristic. As an example, the New York Stock Exchange now captures a stream of 1 Terabyte of trade information during each trading session. The 2.5 Quintillion bytes (or Trillion Gigabytes) of data created in a 24 hour time span that we saw in the Volume part above is increasing by more than 10% year over year.
The Fifth (and most important) ‘V’
All of the characteristics of data mean absolutely nothing without the most important one: Value. What does the vast, varied, trustworthy, fast data mean to you? to your business? to your customers? to your consumers? to your stakeholders?
Most of the data will remain afloat in the ether, on a shelf accumulating virtual dust. It’s static and worthless until you call on it.
The right data at the right time in the right context, that you can analyse, interpret, slice and dice and then visualise, will be priceless.
And that is the beauty of data.