How the GDPR affects our data sharing efforts.

For the past several months we’ve all heard about the new European Union (EU) privacy law that went into effect a few weeks ago – the General Data Protection Regulation (GDPR)(EU-2016/679). Some parties had suggested that when it took effect the world would end, aliens would invade, or life as we know it would stop. But the regulation took effect and we’re still here fighting cybercrime. Since other countries are also using it as a model for their privacy regulations we have decided to confront the regulations head-on.

A number of our members and their legal staffs have inquired about the new regulation’s effect on our data sharing operations. There are some changes that we have made – and that we must still complete – to comply with the new regulations but they are not catastrophic and most data exchange operations will continue as usual.

The goal of the GDPR is not to stop the free flow of information but to “balance the privacy of EU data subjects against the use of their personal data.”  You may have heard all the hoopla about those nasty “IP Addresses” that seem to come up whenever one talks about the GDPR but the regulation is manageable with current technology and user agreement language that weíve organized on behalf of our members over the years. Note that to get data from us you need to agree to, and abide by, the Data Sharing agreement (DSA). This is a big win for us all.

Examining the regulation over the past few years, we concluded that the APWGís big data sets have varying GDPR-compliance issues. Note that the GDPR covers entities offering goods and services to EU data subjects. This impacts our member’s website, too, since some of you are “EU data subjects”. You will notice changes to the website in the near future as Foy implements the changes.

Our data sharing programs are actually about crime investigation and public safety. Since we’re not a typical data controller defined in the GDPR our compliance efforts are a little different. E.g., would we really want to ask an attacker for their consent to share their attack data with others? Or, do we want to let a criminal gang ask us to delete all the attack data we have about them?

From here on out, APWG, like every enterprise that mediates the exchange of data that can be defined as descriptive of natural persons under this regulation, has to manage access to data sets in accordance to the regulation. The management schema will vary according to the nature and character of the dataset (i.e. how directly relatable the data are to a natural person).

Our GDPR compliance varies amongst our data sets:

  1. The Phishing URL dataset is archived for APWG member access as the phish module on the eCrime Exchange (eCX). This dataset contains URLs pointing to phishing credential collector web sites. As these are compromised or maliciously registered servers their identification does not contain EU-defined personal data, as a server does not uniquely identify an EU data subject. A large sampling of the URLs themselves also did not show inclusion of any EU-defined personal data.  As such, we expect no changes to our data sharing operation for the phish module or dataset.
  2. The cryptocurrency dataset is archived for member access in the Cryptocurrency private Working Group on the eCX.  The identifiers in this data set contain pseudo anonymized identifiers and as such, would not be considered EU-defined personal data (GDPR, sec 26). We expect to make no changes to our data exchange operations on this dataset unless we receive further guidance from appropriate data protection officials.
  3. The malicious_ip dataset archived as the mal_ip module on the eCX. This dataset contains Internet Protocol (IP) Addresses that have attempted some sort of malicious activity against one of our membersí networked assets. We believe many of these IP Addresses are compromised systems. Although we don’t believe we could identify the actual user behind these IPs using only the IP Address unenriched by other correlative data, some of our members’ legal teams are concerned just because the dataset contains “IP Addresses”.Though APWG directors and managers may disagree with that position, we decided to mark this dataset as containing EU-defined personal data and be GDPR-compliant. Discussions with our advisors and some Data Protection Authorities led us to develop a Code of Conduct and some new procedures for this dataset. If you exchange data with the malicious_IP addresses data set, you will be required to sign and abide by the Code of Conduct, unless we have suitable alternative contracts with your organization.Users of the mal_ip module on eCX can expect to get the Code of conduct soon for your signature. Failure to sign the Code will result in your access to the mal_ip module on the eCX being disabled.
  4. Our reference database of phishing emails. Yes, emails have “to” and “from” fields which contain personal data. We are developing a code of conduct for this dataset as well and you will be required to agree to and adhere to it when using this dataset. Since this is “research data”, we have tried to make the data retention requirements less stringent than for some of the other data sets. Why? As discussed with some of our researchers, a researcher may take our data, do some research, attempt publication of the results, may get publication rejected, perform more research, and then successfully publish.  They may need to retain the data for a while to allow for challenges to the research conclusions. This sometimes takes many years. We are thinking of a process where a researcher can ask for data retention extensions so they can keep the data longer than normal. More details will be forthcoming to our researchers. In the end there will be a code of conduct for this dataset that researchers will be required to comply with to get access to the data.