Machine Talk

Non-stop data generation is without a doubt one of the greatest characteristics of the widely-connected and integrated world to which we are heading. According to estimates from Cisco, between 2017 and 2022 Internet data traffic is going to grow 26% per year — from 122 exabytes per month to nearly 400. It is estimated that, counting only mobile devices, data traffic will see a sevenfold increase by 2022, representing an annualized growth of 46%. Devices that communicate directly among themselves, with no human intervention — a modality known as machine to machine (or M2M), accounted for just over 6 billion connections in 2017, and by 2022 this number is set to exceed 14.5 billion. And it’s not only the number of connections that is going to increase: In 2017, M2M traffic was around four exabytes per month, and it is estimated that this volume will hit 25 exabytes per month by 2022.

Cisco’s report also states that by 2022, virtually 50% of M2M connections will be inside our homes (through home security and automation equipment, for instance). The next-largest source of these types of connections will be in workplaces, followed by three segments we have already discussed: health, cities , and vehicles. This also gives us an idea of the type of data machines will be exchanging over the coming years, such as medical data (including videos and images and health indicators), and data from navigation systems.

The future, therefore, holds a relevant increase in data volume generated and transmitted — by us, our friends, relatives, co-workers, and the machines around us that make up the infrastructure of the modern world. Every e-mail, tweet, like, photo, video, song or datum contributes to one of the fundamental challenges we must face as technology becomes ubiquitous in our lives: how to extract relevant data from this gigantic mass of bytes? How to intelligently use the unimaginable amount of valuable information produced daily by both human and artificial users alike? This is the challenge that the branch known as Big Data seeks to address.

The term Big Data rose to attention in the late 1990s, and its characteristics are often associated with the five Vs: volume, variety, velocity, variability, and veracity. In 2016, three Italian researchers, Andrea De Mauro, Marco Greco, and Michele Grimaldi, published an article in which they define Big Data as “the Information characterized by such a High Volume, Velocity, and Variety to require specific Technology and Analytical Methods for its transformation into Value.”

Big Data techniques go beyond data subsets, using all available data, and can deal with information that in computer science jargon is called “unstructured.” This means that the data used does not have to be of the same type nor organized in the same way, and is often processed in a distributed and parallel fashion (i.e., simultaneously, by multiple processors). For example, using social media to comment on a particular sporting event generates text, images, videos and soundbites authored by multiple sources (spectators, fans, reporters, players) and produced over seconds or days. With proper analysis it is possible to extract value for consumers and brands from these varied sets of data. Next time, we will talk about Big Data applications across a number of business sectors. See you then.