[nexa] Recovering area shapes from anonymous contact tracing data

May 3, 2020

      Esperimento concettuale molto interessante su
https://github.com/miculan/shaping-anonymous-contact-tracing-data

The actual COVID-19 pandemic has induced many governments to call for
technological solutions for tracing contacts between people. Dozens of
applications have been introduced so far, basically boiling down to
five main frameworks: PEPP-PT, Google/Apple Privacy Preserving Tracing
Project (GA-PPTP), DP-3T, Blue Trace and TCN; a detailed comparison
between these approaches is outside the scope of this project (see
e.g. here and here).

A common problem with all these approaches is the privacy of users: we
would like to trace who has been in contact with an infected patient,
without revealing their real positions and movements. This is even
more important in centralized-based solutions (such as PEPP-PT), where
tracing data is stored "in the cloud" or on servers run either by the
Government or by some private institution. To this end, several
anonymization techniques are used; for instance, the "tags" that the
apps exchange via Bluetooth, and eventually upload on the servers, are
random-like strings like 387e07342c243b50a05da363f67e17ea25fe03bc,
generated on a daily base (or more often) using hash functions. Even
if these tags are calculated from some private/sensitive data (e.g.
phone number, IMEI, Bluetooth MAC, GPS position), it is not possible
to recover such data from the hash digest. In most protocols, no other
information are exchanged between apps nor are uploaded to servers.
Thus the user identity cannot be associated to tags. (In centralized

solutions, however, the central service may geolocalize a device when
the tracing app connects to the server, either for uploading its tags
or for collecting the tags that new positive patients have uploaded.)

Now, a natural question arises: can some sensible geographic
information leak, even adopting strong anonymization techniques? In
this exercise we will see that the answer is: yes. More precisely, we
will see that from a (dense enough) set of anonymous contact trace
data we can reconstruct (quite precisely) the geometric shape of the
area from where the data come from. (And notice that this contact
trace data can be easily collected in centralized contact tracing
solution, while it would be much more difficult with truly distributed
ones.)

[... vari esempi ...]

# Conclusions

In this exercise we have seen that an anonymized proximity graph can
contain enough information to reproduce the geometry of the area where
the data came from. No geometric / GPS / geolocation / distance
information is needed: the contact information will suffice.

Once a map is obtained in this way, it can be laid over the actual map
of the area of origin (e.g. using Google Maps). Thus, the (anonymous)
points can be given an actual position. Accuracy of the result depends
on the number of points (devices), and the proximity distance.

It should be observed that in order to construct the proximity graph
we need to collect exchanged tags in a centralised server. This can be
achieved quite easily with centralized contact tracing solution, while
it would be much more difficult with truly distributed ones.

The analysis we have carried out in this small example is still
primitive, and based on very limited techniques and tools. After all,
I'm no big data expert :) More sophisticated techniques could be used
in order to achieve a similar result, even in presence of limited
data. Further improvements can be obtained by taking advantage of
extra information, not included in the proximity graph, such as:
"pinned points" (i.e., points whose coordinates are known), population
distribution on the area, etc.

[nexa] Recovering area shapes from anonymous contact tracing data

Giacomo Tesio