When using the Internet, we leave long data traces of our all online activities, which can lead back to us as individuals with a few lines of code. With the majority of both US and EU citizens feeling worried about the lack of control over their personal data handled by governments and corporations, privacy has become an important question in the fields of AI and data labeling too.

Personal data is defined as any information that relates to an identified living individual or may be used to lead to an identifiable living individual, or, as GDPR refers to it, "data subject". There are also different types of personal data that may be aggregated — provided, observed, derived and inferred. Provided data is consciously given by data subjects, observed data is collected automatically by systems like browsers or cookies, while the other two are a bit more complex and come from analytics. Derived and inferred types of data come from calculation from raw information and observing correlations between multiple datasets respectively. AI and, specifically, data annotation come in close contact with privacy, for instance, when it comes to labeling of opinions (NLP) from social media or annotation of people on the video frames, since this data can be directly associated with a specific person.

Although many instances of big data analytics do not necessarily involve personal data (for instance, climate, astronomy, zoology), when they do — the process complicates significantly. What kind of laws should a business take into account when outsourcing data labeling service? Why is it the best practice to outsource to a trusted partner?

Data privacy laws and their impact on data annotation

Privacy laws vary largely from country to country or even within the same country among the regions or states, but the consensus is that the data collection and usage must be addressed more thoroughly. The largest and the most comprehensive data protection and privacy regulation to date is the one introduced in 2018 in the European Union and the European Economic Area — General Data Protection Regulation (GDPR). According to it, individuals have rights to transparent information processing and have extended control over their personal data being processed online.

GDPR applies to many processes, such as collection of personal data, in development of AI algorithms with the help of personal data and during analysis and decision-making processes based on its outputs. Overall, it places data owners and handlers in legal order to ensure social accountability. GDPR is the regulation we use at Label Your Data while conducting all of our internal procedures and while using our software too.

When adhering to GDPR, an important thing to remember is that in order to process the personal information the business must have a lawful basis of the processing. There are six kinds of lawful bases for processing: consent, contract, legal obligation, vital interests, public task and legitimate interest. No single basis is better or more important than the others — the most appropriate one is usually chosen by the legal professionals based on a set of factors, such as purpose, relationship between parties, etc. The privacy notice should include the lawful basis for processing the data, as well as the justified purposes of the processing.

Additionally, when doing business with the labeling partner, parties must establish an agreement which provides the specifics for labeling. Among the points required to include are subject-matter, the purpose of processing, the time frame during which the data may be processed, type and categories of personal data, obligations and rights. It should clarify:

  • Confidentiality of processes;
  • Processing personal data only in accordance with instructions;
  • Taking of appropriate organizational and technical measures to ensure compliance with applicable laws and regulations;
  • Deletion or return of data after the end of processing period;
  • Allowance to conduct inspections of compliance with the partner’s data protection obligations.

Recent industry practice is also to use a uniform set of data labels to help indicate personal data, sensitive data or any restrictions associated with it. Another utensil in this toolbox is data anonymization. Anonymization is a data processing technique whose purpose is to remove, conceal or modify personally identifiable information within the dataset. In this case, the data can no longer be classified as personal and is not covered by data protection legislation.

How We Approach Privacy at Label Your Data

We use our extensive experience to help companies grow faster by providing data annotation services for their projects. We understand the need in establishing trust with our customers, so we take data privacy seriously. Data privacy laws enhance rights of individuals. Since the data we process often comes from different places and needs special attention with regards to laws and regulations, all of our assets, including all the software we use and our internal procedures, are GDPR compliant. Moreover, our company is compliant with the industry's information security standard ISO 27001, proving confidentiality and integrity of all our assets, including the workforce, the IT infrastructure and the workplace. To ensure the safety of the cardholders’ data processed through our systems, we implemented necessary security controls to be compliant with the Payment Card Industry Data Security Standard (PCI DSS).

PCI DSS Compliance
ISO 27001:213 Security Certification
General Data Protection Regulation (GDPR)

Data confidentiality and integrity remain as some of the highest priorities when outsourcing data labeling service. Specialized companies can empower your workflow by helping you eliminate the time-consuming processes of organizing, cleaning and categorizing your datasets. Successful communication and implementation of industry security standards — all guarantee that the collaboration with Label Your Data runs smoothly.

Veronika Gladchuk by Veronika Gladchuk
on May 19, 2020.