Contact
Implementing an annotation-driven GDPR cleaning service

Blog

Implementing an annotation-driven GDPR cleaning service

A common problem in applications and services dealing with personal data is not only how to efficiently and reliably store the data, but also how to get rid of it once the related user decides to delete their profile. With the General Data Protection Regulation (GDPR, also known as DSGVO in Austria), there’s a strict requirement for all services operating in EU states to remove any user identifiable data when the account is deleted or the user requests to do so. While there’s no equivalent in the US yet, with the California Consumer Protection Act (“CCPA”) already the first state has put in place a statute and regulations that mirrored the GDPR in several respects.

 

Motivation for a reusable technical solution
After having to implement GDPR conforming user data removal yet again for our latest automation service at viesure, I decided it was finally time to conquer the problem once and for all: I created a tiny reusable service that automatically cleans all relevant data without having to manually implement data manipulation code for every affected domain object.

 

Concept
The chosen approach is simple enough (as it should be):

  1. sensitive data fields are marked with an annotation (let’s call it @SensitiveData for simplicity’s sake)
  2. when it’s time to clean user data, you just pass in the related object to a utility class (let’s call it SensitiveDataRemover) and let it perform its magic
  3. the SensitiveDataRemover uses an object graph traversal to recursively scan your class for fields marked with the annotation from point 1
  4. for any field with this annotation, it replaces the value with null or a suitable expression for an empty value (an empty String, 0, or a custom default value)
  5. this is a convenience feature to efficiently clean the data for nullable fields (think database Entities), and allow for a suitable replacement otherwise
  6. this allows us to keep empty fields valid, if constraints are imposed (e.g. valid IBAN, social security number, password, non-empty name,…)
  7. once finished, the SensitiveDataRemover returns the object passed in with any sensitive data replaced according to the aforementioned rules.

If you have a hierarchical domain model, you can just pass in the root node of your user data object and have it cleaned in a GDPR-conformant way with a simple call. Lastly, the utility class works on any POJO, but in the common case you might pass in an entity retrieved from a query for a user id or similar.

 

Notes on implementation
You can find the source here, so let me just mention the more interesting parts in the following:

 

The @SensitiveData annotation
To start, we need to mark out sensitive data as such, so we can later detect affected fields and empty them accordingly. This is done via the @SensitiveData annotation, and because a simple marker interface would be boring, and we usually need some more functionality in all but the simplest use cases encountered only in blog posts, we add some more finesse, namely a default value to be used.

 

As mentioned above, the possibility to specify a default value has the added benefit of satisfying any constraints you may have on your domain or Entity objects, like a password field or an IBAN.

 

Recursive traversal applying a transformer function
The core of our solution is obviously the algorithm that actually walks our object tree and replaces any fields marked as @SensitiveData with empty, default or null values. Since I couldn’t find a suitable solution out of the box, I implemented a custom ObjectGraphTransformer that takes a root object and a transformer function and walks down all nested fields checking for our annotation and applies the provided function on matching fields of each node encountered.

public ObjectGraphTransformer(T root, BiFunction<Object, Field, Object> nodeVisitor)

private void transform(Object node) {
    // null check cut for brevity
    transform(node, FieldUtils.getAllFieldsList(node.getClass()));
}

private void transform(Object node, List<Field> fields) {
    // remember visited field
    // for each field:
    //     check if it belongs to our package
    //     if yes: identify field type and check for collections, iterating further:
    //             Object actualNode = FieldUtils.readField(actualField, node, true);
    //             transform(actualNode);
    //     if no: call transformer function (we have encountered a "leaf node", a
    //            primitive or standard type like "String")
    //            nodeVisitor.apply(node, field);
}

The object graph transformer detects collections and uses the standard iterator to traverse them (support for maps is currently missing but simple enough to add). The class uses Apache FieldUtils for a little more convenience when scanning and handling fields and for reading and writing their values.

 

One important catch here is that we need to distinguish between fields and nested classes in our own model and external classes, e.g., from the JDK itself or from external libraries. Scanning these makes no sense and might even lead into infinite recursion/traversal. Since there is no easy way to detect this, we need to specify a base path for our own custom domain model, which is a simple solution that just works fine. Note that we need to check via contains, not startsWith, because when we hit a node that is a collection, there’s no easy way (without parsing the signature) to get the package name of a generic collection type, which looks like ‘java.util.List<my.company.app.package…>’.

 

Final step: cleaning our data
The actual replacement of sensitive data is done from a service class that calls the ObjectGraphTransformer, let’s call it SensitiveDataCleaningService.
We just need to call our transformer with a function to detect the field type and a little code to clean or replace data based on the data type and other annotations present, like @Nullable or @NotEmpty, so we don’t break our data layer when cleaning it. In our service method, we then have something like the following:

new ObjectGraphTransformer<>(root, SensitiveDataCleaningService::cleanField).transform();

 

Where root is a root object in our domain, e.g. a concrete User or a Submission.

 

Example
Here is an example from the unit test, which illustrates what your domain class could look like (using Lombok for simplicity and compactness):

    @Data
    @AllArgsConstructor
    class SubmissionEntity {
        String id;
        @SensitiveData(defaultValue = "4711")
        Integer partnerNumber;
        @NotNull
        @SensitiveData
        String policyNumber;
        @SensitiveData
        Boolean accepted;
        @NotNull
        @SensitiveData(defaultValue = "2020-12-24")
        LocalDate received;

        List<SubmissionResultEntity> results;
    }

    @Data
    @AllArgsConstructor
    class SubmissionResultEntity {
        SubmissionEntity submission;    // test for safeguard against circular references
        String id;
        String status;
        @SensitiveData
        @NotNull
        String[] diagnoses;
        @SensitiveData
        @NotBlank
        String iban;
    }

Notice how we can support various types, as well as collections and arrays, and how circular references are avoided by a lookup map used internally by the ObjectGraphTransformer.