The primary objective of big data technology is the extraction of recommendations based on a vast sample set. Sometimes all samples available are taken into consideration; in others, just a subset is used. Imagine a traffic engineer needs to analyze traffic patterns in a certain part of a city between 5:00 and 7:00 p.m. Using traditional techniques, samples of routes taken by a few vehicles would be used to support planning for any prospective actions. But with big data, the routes taken by all vehicles can be analyzed. In statistics this is known as n=N. The number of samples (represented by n) is equal to the total number of events (N).

An important characteristic of big data techniques is that understanding the why of a certain phenomenon is not the most important thing. The recommendations obtained are at times counterintuitive, but they work empirically — since the data show the result unequivocally. This is a consequence of the growing complexity of information systems, something that is set to increase over time. The sensors surrounding us in our daily lives, the constant gathering of data by smart cities, and the development of data capture techniques (such as cameras equipped with facial recognition systems) provide big data algorithms (whether based on artificial intelligence techniques or not) with the raw materials they need to generate their predictions. This field is known as predictive analysis.

Predicting the contours of a specific phenomenon is an advantage that many businesses are willing to pay for. What will be the effect of a certain marketing initiative? How long will it take for one of the fleet’s delivery trucks to start showing problems? When will one of the machines on an assembly line stop working? What is a patient’s risk of having a major medical complication in the near future? In his 2013 book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, American professor Eric Siegel cites examples of how different companies use this technique.

One example had to do with how Facebook selects the order of news items it shows its users, seeking to maximize usefulness and revenues on its platform. Another example showed that some health insurance companies are now able to estimate certain deaths more than a year in advance, allowing them to begin counseling and support services for families.

The security of this massive quantity of data — one of the most important assets of the Fourth Industrial Revolution — has become a central issue for individuals, families, communities, governments, and businesses to deal with. How can we protect data and still guarantee appropriate levels of access while respecting privacy and neutrality? How can technology act to solve a problem that technology itself created? Cybersecurity will be our next topic.

Founder at GRIDS Capital, Award-winning author of “Present Future: Business, Science, and the Deep Tech Revolution”, Twitter @guyperelmuter