Shining a light on dark data

The majority of data collected is invisible to the businesses that create it. Thanks to AI software, business information previously hidden in plain sight has become usable

It’s taken a long time, but computers are finally beginning to think like people. Technology has always been good at arranging, analysing and performing calculations on data, but that is not how humans have traditionally stored information. People use literature and images that require a degree of interpretation to understand. If your company relies on a legacy IT system to inform its decision-marking process, this could be very worrying. The overwhelming majority of information in IT systems has been housed in a way that computers – at least up until very recently – simply don’t understand. From a company’s quarterly report to a Twitter user providing feedback on poor service, information is being stored in a way that computer systems just cannot comprehend.

“Video content is terrific, unless you’re a computer”

Take the amount of information stored online in video content. People have gravitated towards video as a communication tool as increasing internet speeds have made it more practical and usable. It’s an engaging, compelling and quick way to convey a significant amount of information. Which is terrific, unless you’re a computer. According to a 2016 report from Cisco, 79 percent of global internet traffic will be in video form by 2020, meaning traditional computing won’t have the capacity to analyse nearly 80 percent of online information. For businesses, this is known as ‘dark data’: unorganised, uncategorised and untagged data, usually designed for consumption by humans, not machines. As well as videos, this could mean newspaper articles, handwritten or typewritten reports, and photographs. Unless a human being is able to manually sift through all this information – a task that could take a lifetime – business leaders will continue to make decisions based on the recommendations of computers less informed than they could be.

However, a new era of computing has begun, one in which AI is able to sort through all this dark data. By interpreting meaning from the mess, computers will allow business leaders to make decisions with more information and confidence than they ever thought possible.

Ralph Demuth

Vice President, IBM Cloud Technical Executive Europe

In the near future, online videos will represent almost 90 percent of all consumer traffic. This huge amount of information is currently unsearchable, and is more commonly referred to as ‘dark data’. Using image tagging and face detection technology, we can start to make sense of this information.

Dark Vision processes videos by extracting frames and tagging them independently: you can jump right to the frame you are looking for, giving access to previously unmanageable information. This particular example for video processing uses IBM Bluemix OpenWhisk and Watson services.

What’s more, the potential for utilising Watson in the IBM Cloud is unlimited, from helping an energy company to predict pipeline corrosion to assisting a start-up in using social data to predict market trends.

IBM’s capabilities were put to the test in mid-2016 with a historic challenge: 20th Century Fox wanted to know if its cognitive technology platform, Watson, could be used to generate a trailer for its forthcoming AI horror thriller, Morgan. It turned out it could, and while the trailer Watson created was impressive, the method it employed was extraordinary. “Watson was able to model the scenes visually to determine: was a scene scary? Was it a tender moment? Was there sadness or happiness?”, explained John Smith, IBM Fellow, Machine Vision – IBM Research.

Real smarts, artificially AI offers a new way of analysing, sorting and creating data in a more intelligent and thorough way. It is a significant improvement over the old programmable systems that have been popular for decades.

In the past, computer programs were based on mathematics and strict rules, following a rigid decision tree to come to a conclusion. This worked fine with the limited amount of organised and sorted data that was used back then, but in today’s world – with the sheer volume and complexity of data that is now on hand – these antiquated systems often fail under the pressure.

If you were to take a short YouTube video, a computer would be able to tell you very simple, tangible information about it: its length, common colours, the number of people who appear in it, and so on. However, this information doesn’t usually matter to anyone watching – they are more interested in what can be learnt from the video itself.

AI systems make unlocking this information possible. They are capable of writing their own rules based on feedback – a manner not dissimilar to how humans learn – and by creating complex decision trees, similar to those people automatically use to interpret information and make an informed choice. Through this method, a computer can learn to watch a video and interpret information from it, and in that way improve the recommendations it makes.

“Firewalls can only respond to an attack that is either in progress or has already happened”

These are immensely complex systems that require a tremendous amount of computing power to operate, which means that owning a private AI system is rarely a practical choice. Fortunately, companies have taken notice of this trend’s potential, and have pushed towards making AI systems available for hire to the private sector. There is a real race on among industry players to make APIs for AI systems available to even the smallest companies. IBM, Microsoft and Google are all currently working on improving the AI systems they have produced to unlock all kinds of additional applications.

One such area is the field of cognitive security. Protection against cybercrime used to mean firewalls and antivirus programs; systems initially designed to lock up data. While these systems remain in place today, they are not able to fight off modern attacks on their own. Currently, firewalls analyse data to detect abnormalities, such as a spike in server traffic. While effective, they can only respond to an attack that is either in progress or has already happened.

Cognitive security, however, is able to inspect dark data to identify what prompts an increased risk of attack, and how to prepare accordingly. Instead of waiting for an attack to strike, cognitive security proactively looks for where future threats may be coming from and identifies where defences can be improved. As the system learns, it is able to identify and block new threats quicker than any human could.

Security applications currently in widespread use are undoubtedly impressive. However, the next era of computing – the doors of which have only just started to open to us – holds untold promises of what may be possible.