What is Big Data?

The world is being reshaped by the convergence of social, mobile, cloud, big data, community and other powerful forces. The combination of these technologies unlocks an incredible opportunity to connect everything together in a new way and is dramatically transforming the way we live and work.

Marc Benioff

Big data is a feeling. The feeling that the future is an uncertain place. The feeling that we can make decisions today, that although we might not understand them now, in tomorrow’s future, they might be the most important decisions we ever made.

Big data arises the consequence of opportunity. Only a few, short years ago the very idea of big data would have been impossible. But modern computing is in a very different place as a result of two distinct developments.

Software as a Service (SaaS)

Increasingly, more and more businesses have been running key tasks as a service. Rather than having only one specialised computer do a single task - like a computer in accounting for running the accounting software - why not put the software on a server and only run when it is needed? This way, accounting staff can run their tasks from anywhere, on any computer, whilst simultaneously allowing more people, easy access to the information.

The reason that this had not been done from day 1 is that servers (and their infrastructure - electricity, air conditioning, etc) are incredibly expensive. Not only to purchase in the first instance but also to keep running.

Another predicament with servers is that most modern computing tasks require a lot of power. Typically all at once and then you forget about them. In the case of accounting, you might only run payroll once a month or once a year when taxes are due.

This means you either buy a high-end desktop computer for every task and still have the user wait for each task to complete. In many universities, for example, payroll runs overnight. Or you can have a hugely expensive computer facility sat around, that does not do much for most of the time.

Enter Amazon

Amazon saw that unless you are the sort of company like Google or Facebook that can build your own massive server farms in frozen wastelands, chances are you do not have heavy 24-hour demands on computing. Why then, not take Amazon’s already massive server capacity, extend it a little and then rent out the servers as an “on demand” service?

Launched in 2006, Amazon Web Services (AWS) continues to offer a sort of pay-as-you-go ecosystem. For a tiny fraction of the cost of building your own server, you can borrow the entire computing clout of Amazon.

It doesn’t matter if you are a student with computer science homework, a young app entrepreneur or even Netflix (which accounts for 37% of all internet traffic in the US). Amazon with some very clever, behind the scenes scaling algorithms allows for as much or as little computing power as you need, only when you need it.

The storage locker

Simultaneously, the march of technological progress and efficiencies in manufacturing have driven the financial cost of storage. Hard drives have got bigger and bigger, but also cheaper and cheaper. The cost of storing data is at never before seen lows. Data can now be stored on a scale, that could never have been imagined before.

The combination of being able to store vast amounts of data coupled with a new way of cheap, on-demand processing has brought forward a new gold rush in the information age.

a new gold rush in the information age

Every one of us has our lives increasing online and digitized. From the super-computers we all carry in our pockets tracking everything we do, to the way we shop with credit cards and online with e-commerce. Never before has so much data been captured on such vast scales. Every moment, of every day, sees billions of the world’s population browse the web, spend money and communicate in ways that were impossible only a few short years ago.

Big data in retail

If we stay with the world’s most popular online retailer, Amazon, every day they have millions more just browsing. Each customer that makes a purchase might buy a few items and in the old way of doing things an invoice would keep track of what the customer bought. Rarely in real life though, does someone walk into a shop and buy the first thing they look at. There is almost always some looking around, some browsing, some searching, some comparison of this thing to that thing. There might even be a complex comparison of cost and features, and user reviews.

In a big data age, no one records what you buy. Instead, people are now interesting in the steps you took to make the buy.

Everything is recorded. The other items you looked at. The time you spent on each page. Did you look at the user reviews? Did you look at the pictures, maybe zoom in on some of them? Did you click on the “Read more…” in the item description? Even down to where your mouse cursor moved and rested.

Amazon makes suggestions knowing that it will make more sales

You will have seen for yourself what happens with this data. Do you think it is a coincidence that when you go to checkout there is a page of suggestions of other things you might like based on what you are buying? How often have you been tempted by that page of suggestions? Of course you have, because those items have been selected by hundreds of people just like you!

The power of huge datasets are obvious. Amazon makes suggestions knowing that it will make more sales.

When you get to truly HUGE datasets, interesting things happen.

When statistics gets creepy

Remember, huge datasets can give the illusions of the intelligence, but really big data is the product of statistics - drawing significance from patterns in literally millions of data points.

To illustrate this point, in 2012 the US retailer Target made the news when an irate father went berserk at a store manager. You see, Target learnt early on the value of data mining. Analysts at Target knew that pregnant women are subject to a few behavioural - namely a change to unscented body lotions. Target went as far as  (and presumably still does) to give every customer a pregnancy score - a likelihood that they are pregnant.

In Minneapolis, one such woman hit all the criteria for Target’s system so she was automatically sent coupons for baby goods in the mail. The idea being that by luring her into the store for baby goods early, she could become a loyal customer throughout the pregnancy and early motherhood.

There was a slight problem. The woman in question was still in high school and living at home. Her father went into the local Target raging, the New York Times reported

“My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

A week later the father apologised to store staff after having had a discussion with his daughter. Apparently there had been some goings-on under his roof that he had not known about!

Understandably, Target is now much more delicate with big data results, especially those regarding pregnant women.

Supermarkets are at it too

The British supermarket, Tesco, maintains a membership loyalty program which it promotes to customers as a way of saving them money with points, discounts and promotions. To the company it provides an easy way to track the buying habits of demographics and promote products they are keen to sell.

The interesting part of the Tesco story is that it is well known that there is a disruption protocol written into any promotion. In the early days of the scheme, Tesco only ever printed coupons to products that the consumer was known to buy. There was huge kick back, as the consumers thought that it was too “creepy” and felt a bit too “big brother”. To counter this consumer negativity, Tesco now throws in a wild card to any personalized coupon book.

For example, a promotion might give you 4 discount coupons. 2 vouchers for things you buy regularly, 1 voucher for an alternative product to the one you normally buy (i.e. Tesco cornflakes instead of those Kellogg’s ones) and then 1 voucher for dog food even though Tesco knows that you’ve never bought dog food in your life.

The point here is statistics is relatively easy and humans are creatures of habit. But we still have a caveman brain in our heads, meaning that we scare easily. We do not trust things that happen too easily, or things that work too well. By giving you dog food the system disarms your automatic defenses and you continue your routine thinking “Stupid computer, I don’t even have a dog.”

The land that never forgets

We are at the dawn of a new age.

Privacy groups are rightly concerned about how you cannot delete a Facebook account. You cannot delete a Facebook message or post, it is just… archived.

We are entering an age that never forgets. That will not ever forget.

Children are now entering adulthood with their entire lives having been archived on social media. Instantly accessible, easily searchable.

As we stand here at the edge of a new tomorrow, all that we can be certain of is that the future remains unknown. We are still building and developing the tools for this new age. So far only a few brave souls have stepped forward and are trying to embrace this new world.

Do not be fooled. This will have great consequences. Good and evil.

Right now your bank is building a profile of you. Every time you use your credit card, where you are, how much you spend, even the time of day, month, year, is building a picture of you. You might not see it yet, but this will one day influence whether you get that mortgage, and what interest rate you have to pay on it. It will determine your insurance premium because that phone in your pocket knows exactly how you drive. It will decide how much pension you get, because they know statistically when you’re going to die, thanks to that fitness band you wore for the last 20 years.

Tools for the future

The thing with big data is that you are only ever trying to pull out significance from tiny variations in a huge whole. Just think of trying to hear someone talking to you in a nightclub.

The old adage is like trying to find a needle in a haystack. But imagine if it is not just one haystack, but a whole countryside of haystacks.

When you look for a needle in the haystack, the smart man brings a magnet. The desperate man brings matches.

Right now we are at the point where we are stockpiling hay, just in case it might contain needles, but even the most patient man would not start this task with his bare hands. Currently, we are designing those magnets and matches for the big data age. We are deciding how much hay we should stockpile. Who will win? No one knows. I will guarantee you though, where there is the potential to make money, there will be competition.

The truth is, big data is a new term for what a lot of people were already doing. From big business to big issues like climate change - we have been recording data for a long time. The difference now is that not only can we store so much data from those same, old sources we used to have, but had to through away or approximate. But with every day, thousands more data collections devices are made, all ready to record the world.

Many organizations are just storing anything they can get their hands on. The truth is many organisations do not have the first clue what to do with all the data yet, but when storage is so cheap, and the future so unknown, isn’t it better to keep everything - just in case?

This page has previously appeared on previous versions of morganbye.com1 2