Our Data Online

Over 2.5 quintillion bytes of data are created every day and with each passing moment more devices are introduced online. The Internet of Things promises to further connect every part of our world, producing even more data ripe for analytics. Where this data is stored, who has access to it and the types of information gathered produces both opportunity and risks.

In 2018, Facebook users across the world learned that large scale harvesting of personal data without their consent had occurred, the main culprit in this instance was a company called ‘Cambridge Analytica’. Millions of users activity, such as what they ‘liked’ and their friend circles, were passed onto the company which then used an algorithm to psychologically profile people based on their interactions. Using these personality profiles, politically motivated advertisements were tailored and shown to specific users.

Unfortunately, Facebook are not alone in the large scale collection of personal data. Companies like Amazon and Google use our shopping and browsing habits to ‘improve’ their services, YouTube and Instagram give recommendations based on profiles and search history. Users are increasingly worried about this practice, 67 of 92 participants in the survey answered yes when asked if they are concerned about their data being collected and stored. These concerns are not unwarranted as even with anonymisation techniques, it is possible to infer many details when analysing data sets. In 2006, AOL released millions of search queries made by 650 of their users. The company removed the IDs and IP addresses to allow researchers to study the information, it took only a couple of days for them to identify individuals. In what could be seen as a breach of privacy, the US retailer Target sent coupons for baby clothes to a customer’s daughter; using shopping data they correctly determined that she was pregnant.

Is the collection of user data always a bad thing? Not necessarily. A team of researchers from Google were able to track the spread of influenza without the results of a single medical check-up and they could do this quicker than the CDC. A recent report estimated that the US healthcare system could save hundreds of billions of dollars each year through better integration and analysis of medical data, an overarching system that uses information gathered from clinical studies to smart devices could not only save money but also lives. Interestingly, when survey participants were asked if they would use a blockchain application that offered financial incentives for their data, 39% of respondents advised yes. So perhaps it is not that data collection is inherently bad, but that users want more control over when and who they share their data with. The idea that you are rewarded for your information instead of it being harvested and sold with no immediate personal benefit, is certainly a more attractive proposition.


Recent Breaches

In 2018 over 2000 data breaches were reported from more than 60 different countries, out of these the healthcare industry suffered around 500 breaches. In 2019 an estimated 41 million healthcare records were exposed, stolen or illegally disclosed and the average cost of a data breach was close to $4 million. The cost of these data breaches is steadily increasing over time, gaining 12% from 2014 to 2019. Throughout 2020, at least 8 billion records containing sensitive information have been exposed online making it one of the worst years in data breach history. Personal data breaches from organisations enables mass identity fraud and the risk grows every day. The information leaked is often distributed online, accumulating in the hands of criminals and causing an erosion of privacy. Vulnerable individuals are often targeted repeatedly, resulting in a profound loss in quality of life.

Equifax is a large, top tier credit reporting agency. In 2017, they released statements acknowledging that it was the victim of a cyberattack where some 148 million citizens personal data was compromised. This information included names, dates of birth, driving licenses and even credit card numbers. Data stored by Equifax is not an opt-in system, the information is sourced from businesses and institutions and can be very comprehensive. The attack made use of a vulnerability called Apache Struts CVE-2017-5638 which allows for remote command execution. On March 7th 2017, the Apache Software Foundation published a patch to fix the issue and on March 8th the Department of Homeland Security notified Equifax, along with other credit agencies, directing them to install the patch. A week after the company was notified of the patch, Equifax conducted a scan of their system and the subsequent report failed to highlight any vulnerability to the Apache Struts bug; this left the systems unpatched and unprotected up until late July 2017.

During this period Equifax noticed suspicious activity within their systems and therefore took the application offline while hiring an external cybersecurity firm to conduct forensic analysis, this investigation disclosed that many files had been breached.

The situation was further complicated when the company attempted to address the issue. To help disseminate details of the breach to affected users, Equifax created a separate domain and webpage. Almost immediately fake settlement and informational sites were created to exploit the situation, resulting in further opportunity for criminals.

The U.S bank Capitol One was subject to a large security incident in 2019, they are the fifth largest consumer bank in America with strong investments in IT infrastructure (one of the first banks in the world to migrate their datacentres to the cloud). Details of the leak showed that names, addresses, phone numbers and income details were amongst some of the data subject to unauthorised access. The breach affected approximately 100 million consumers and small businesses across the U.S and Canada. Interestingly, this breach was discovered via their responsible disclosure program when an email from an outsider informed them that their customers data was available on a GitHub page.

As a result of FBI investigations, it was discovered that a woman named Paige A. Thompson was singlehandedly responsible for the breach, she was later accused of stealing data from over 30 different companies. Thompson created a scanning software tool that was able to check cloud based servers and identify misconfigured firewalls, enabling the execution of commands remotely and therefore gaining access to the servers. The FBI identified a script hosted on GitHub that with only 3 commands allowed unauthorised access to servers hosted by Amazon.

Highlighted via the survey, 28 respondents confirmed that their personal information had been leaked and 18% of individuals questioned advised that they had experienced identity fraud. Worryingly only 38% of participants know how to check if their information has been compromised which reveals the growing challenge of managing personal data online.


Challenges

It is becoming increasingly difficult to keep track of where our data is being used and who has access to it. We are often trusting many different services to keep sensitive information safe and apply good security practices. However as shown by the recent breaches, this is not always the case and it is not unusual for companies to take a relaxed approach to security. Simple misconfiguration of firewalls and server software, along with a lack of regular updates and failure to remain diligent is commonplace. Despite increased investment in cybersecurity solutions, organisations are still suffering from major breaches. The consequences go far beyond financial settlements and loss of reputation, as more personal data is leaked it becomes extremely difficult to ensure that institutions are actually communicating with the true users of their service and not an attacker. According to Verizon’s security research, more than a quarter of security incidents went unnoticed for many months with some remaining vulnerable for over a year.

The World Economic Forum rates cyber security breaches as one of the five most serious risks facing the globe today. Increasing complexity and sophistication of attacks, profitability of obtaining personal data, lack of trained cybersecurity analysts, the increase in remote working and the use of cyber warfare by state sponsored actors are serious concerns for organisations. Consistent evaluation of security controls is vital but costly to implement, it can be difficult for small businesses to effectively monitor the wide scope of the cybersecurity landscape. That being said, research shows that 97% of attacks can be mitigated if organisations implemented effective controls. Within healthcare, the biggest threat of data breaches comes from internal sources. As of 2017 46% of security incidents were caused by employee behaviour such as clicking on malicious links, worker negligence and access abuse. Computer training is mandatory for NHS staff, yet it is documented that only 12% of trusts have fulfilled their training obligations. Chronic underinvestment in healthcare related IT services is stark in comparison to other industries, with as little as 1-2% of annual budgets being spent on IT services when 4-10% can be seen in other business sectors.

As the world pushes forward with the Internet of Things, securing each and every device is a daunting challenge. The IoT is diverse and can include devices that are very different from traditional computers, often these devices are deployed on a large scale and users are unaware of the potential threats. These devices can regularly be seen dotted around homes and businesses, always turned on and listening. The interconnected future poses significant problems for our privacy and security will not just be the job of organisations, individuals must adapt and implement solid security practices themselves.