Bringing Clarity to Really Really Big Data: A Case for AI and Machine Learning to Help Crunch and Protect Our Data


It’s ridiculous how kids have an affinity for toys we enjoyed as kids – like Legos. They wish spend hours creating the biggest “thing,” often leading to a source’s near universal response, “Johnny! That is the biggest tower I force ever seen! Great job!” Children (and we) love Legos because they encourage imagination, offering a limitless way to create something “gigantic!” And in a more usable sense, Legos sometimes give us a great perspective on the important concept of “enlarge.”

As counsellors and consultants, replicating the “scale” issue as it relates to the respective text, information and network security problems is a challenge. Unfortunately, “layperson” conductors and officers of public companies, along with executives in government, show to view “scale” (as it pertains to data protection) as a bad thing (and even a intimidating thing). Part of the challenge here is that there are few practical speed to explain to those holding these positions that an organization’s safety operations center may receive upwards of one million “incidents “every day and, at the constant time, adequately deal with, and investigate, the potential peril basic in such incidents, and reasonably assure that not even one of these slight incidents slips between the cracks.

“Big data” analytics as a business gadget is fantastic because we can translate those figures into, say, dollars. But “big observations” is also a cybersecurity requirement (i.e. using network traffic, data, sensors and other supports to help us determine what is “normal” in our network and what is not) and cybersecurity observations is not as simple to translate into something we can easily conceptualize, like say, dollars! So, until we understand the “scale” of what we are dealing with, it will be danged hard to address the security issues associated with cyberspace.

So how much “big matter” do we produce? And how do we respond to it? These are important basic questions that demand to be better understood so that the much tougher question – how do we protect our evidence? – can be addressed.

How Much Data Do We Produce?

Let’s start with this fundamental concept: today, “data” is everything. Both personally and professionally, much of our breathes have been converted into a bunch of zeroes and ones. Our dependence on data has never been greater and is only certain to grow, outstandingly with the explosion of the Internet of Things (IoT). And the amount of data – good, bad, debris – we produce continues to grow (at breakneck speeds), taking up space on wide-ranging networks (meaning that if you were able to control even a fraction of this figures flow, you would be able to unleash a wicked DDoS attack).

So how much statistics exactly is traveling – nearly at the speed of light – through the networks? According to a June 2016 Cisco pure paper, we are in the “zettabyte era” in terms of global IP traffic. Great! What is a zettabyte?

Service to Basics

To unpack that question, we need to start with a few basics, the in the first place being that humans have cognitive limitations. Our limitations transform into evident when trying to understand very large (or very diminished) numbers. We can use notations to represent large numbers, such as 1 ZB equalling 1 x 1021 bytes. But does that jotting mean anything to you? 

Denote one million as 1 x 106, and it may mean something to you, but that is because we pull someones leg a better understanding of what “one million” means in practical terms. Let us conceptualize “one million” using dollars to frame a reference point: your salary is $50,000 a year, you work for 20 years, and assuming you lavish nothing, you would accumulate one million dollars. Now, using the table underneath, we will “scale up” your salary:

Salary Base Factor Set right Yearly Years Accumulation Rewritten
$50,000 per year 1 $50,000 20 $1 x 106 $1,000,000
10 $500,000 20 $1 x 107 $10,000,000
100 $5,000,000 20 $1 x 108 $100,000,000
1,0000 $50,000,000 20 $1 x 109 $1,000,000,000

What looks nicer on your bank declaration: $1 x 109 or $1,000,000,000? Well, both are the same, but those zeros at the end sure look perilous, don’t they? And more importantly than looking nice, seeing the last notation (with all the zeros) helps us humans understand not only the compute but also what the number represents just a little bit better. Why? Because we use texts to represent values and these values must be translated into something tactile, so we can use in our daily life and in cyberspace, this challenge becomes more thorny due to scale, notation and cognitive limitation.

Conceptualizing a Zettabyte

We know what a billion (109) is, but what do we visit something written as 1021? That would be a sextillion. Do you feel better now that you receive a name for it? We did not think so.

Imagine for a moment we could capture – in a single snapshot – all of the pandemic IP traffic in 2016, one zettabyte. What could we compare that to?

Capitalize oning the table below, we rewrote the figures in a comparative manner along with some cases to help you conceptualize what we are actually dealing with. Some notes: we compel use 1.28 ZB in this example (some figures rounded and approximate), and for rigorous ease, we will be using decimal values (1,000) – not binary (1,024) – when book out numbers in full. No need to fuss over this detail, and for all tech haranguers, remember: more people speak “non-tech” than tech. Pay for your life, and their life, easier by avoiding jargon and cumbersome detachment.

Try to picture the following in your head:

Digital Comparisons
128 gigabytes 128,000,000,000 bytes There 32 movies in HD
1.28 zettabytes 1,280,000,000,000,000,000,000 bytes Global IP traffic in 2016
Length Correspondences
128 metres 128,000,000,000 nanometres Size of football with two extra end zones
1.28 terametres* 1,280,000,000,000,000,000,000 nanometres Stretch from Earth to Saturn

*Note: 1 terametre equals 1,000,000,000 kilometres.

If the Earth-to-Saturn completely comparison is too hard to conceptualize, think about it like this: it liking take you about 8,000 lifetimes of continual walking to do it by foot. And if that is too recondite to conceptualize, perhaps this is easier: 128 GB to 1.28 ZB is what a $20 banknote is to the US federal debt, $20 trillion dollars. And assuming federal in dire straits increases at the same rate global IP traffic will, by the 2020 US Presidential plebiscite we’ll be discussing a $46 trillion figure.

Conceptualizing the Cybersecurity Alert Dispose of

So now that we have a better grasp of the size of the data production and issue problem, we need to think about managing it. Unsurprisingly, when expected to identify their top incident response challenges, 36% of cybersecurity professionals surveyed revealed, “keeping up with the volume of security alerts.” If we hold on to the $20 trillion comparative, we could say our reprove would be to sifting through $55 billion dollars per day, trying to appearance out how much of it is legit, how much has been stolen, how much has been cleansed, and how much is funny money. Fun times!

FBI Director James Comey in a 2014 question with 60 Minutes gave a very useful description of the hard (in reference to cyberattacks originating from China):

“Actually, [they are] not that proof. I liken them a bit to a drunk burglar. They’re kicking in the front door, stun over the vase, while they’re walking out with your boob tube set. They’re just prolific. Their strategy seems to be: We’ll just be in every nook all the time. And there’s no way they can stop us.”

They key line is “we’ll just be in every nook all the time” because it is actually happening! From the same survey, 42% say their systems ignore a significant amount of security alerts because they cannot hold up with the volume. And of course, there is also an unintended danger of being overwhelmed: the sympathetic crying wolf too many times.

But perhaps the more worrying configurations are: 34% say that between a quarter to half of the alerts are ignored, 20% say half to three-quarters of on ones toes are ignored, and 11% say more than three quarters of security on the qui vives are ignored! Mama Mia that’s a lot of front doors kicked in where scarcely is then done!

Let’s go back again to the money $20 trillion comparative, where we possess to sift through $55 billion per day. If we use the “ignore” figures above, the despatch is: alerts tell us something funny is going on, but we are so overwhelmed, we do not bother to look at $15 billion significance of daily alerts. That’s a lot of money being left on the table.

Morosely, this issue is nothing new. Ignoring alerts seems as commonplace as cautions themselves and worse as the Cisco 2017 Annual Cybersecurity Report cut looses to us that less than half of legitimate alerts actually decoy to some sort of correction and less than 1% of severe/fault-finding alerts are ever investigated. In 2014, enterprises dealt with 10,000 alerts per day; in 2016, guidance departments 50,000 alerts per day; and who knows how many we will be dealing with by the end of 2017 due to the IoT clap.

Unfortunately, despite good tips, such as setting goals, prospering the right information, and consolidating, we are still being overwhelmed because we partake of not addressed the “scale” issue. And oh yeah, did we mention that sometimes cybersecurity analysts may at most be able to perform about 10 investigations per day? This is where factitious intelligence and machine learning are going to play a larger role (and why AI start-up set ons focusing on cybersecurity issues may be in an incredible position to take advantage of the increasingly defenceless state we are living in).

What Does It All Mean?

It means that we beget a lot of work to do and that without artificial intelligence and learning machines to remedy us with our cybersecurity challenge – something which we think is really two dares but one issue (hint: network security + information security = data guarantee). We are going down a dark road. If somebody were able to direction and control just 1% of the global IP network traffic, the effects could be satirical.

This idea may sound far-fetched, but perhaps it is not, especially when you bear in mind how insecure IoT devices are (does your dishwasher come with a open sesame?) and the shift to mobile devices will not stop anytime soon, message that just more and more people will be connecting ploys WiFi networks that are inherently insecure.

These challenges pleasure not get easier, especially as we continue to produce data, and when hackers say they can compromise most goals in about 12 hours. Therefore, we need as many tools as viable (such as AI/LM), but we also need to be smart about and honest about what are reckon with with. Cybersecurity is a technology problem, but it’s also a people problem, where we – the being – are still getting the basics wrong. Recognizing that we have cognitive limitations is an material step to getting ahead of the adversaries and nefarious actors.

About the Makers:

Paul FerrilloPaul Ferrillo is counsel in Weil’s Litigation Department, where he cynosure clears on complex securities and business litigation, and internal investigations. He also is share of Weil’s Cybersecurity, Data Privacy & Information Management practice, where he hearts primarily on cybersecurity corporate governance issues, and assists clients with governance, disclosure, and regulatory imports relating to their cybersecurity postures and the regulatory requirements which control them.

George Platsis

George Platsis has worked in the United States, Canada, Asia, and Europe, as a specialist and an educator and is a current member of the SDI Cyber Team ( For over 15 years, he has idle with the private, public, and non-profit sectors to address their critical, operational, and training needs, in the fields of: business development, risk/danger management, and cultural relations. His current professional efforts focus on gentle factor vulnerabilities related to cybersecurity, information security, and data guaranty by separating the network and information risk areas.

Editor’s Note: The conceptions expressed in this guest author article are solely those of the contributor, and do not willy-nilly reflect those of Tripwire, Inc.

Leave a Reply

Your email address will not be published. Required fields are marked *