Esperimento concettuale molto interessante su https://github.com/miculan/shaping-anonymous-contact-tracing-data The actual COVID-19 pandemic has induced many governments to call for technological solutions for tracing contacts between people. Dozens of applications have been introduced so far, basically boiling down to five main frameworks: PEPP-PT, Google/Apple Privacy Preserving Tracing Project (GA-PPTP), DP-3T, Blue Trace and TCN; a detailed comparison between these approaches is outside the scope of this project (see e.g. here and here). A common problem with all these approaches is the privacy of users: we would like to trace who has been in contact with an infected patient, without revealing their real positions and movements. This is even more important in centralized-based solutions (such as PEPP-PT), where tracing data is stored "in the cloud" or on servers run either by the Government or by some private institution. To this end, several anonymization techniques are used; for instance, the "tags" that the apps exchange via Bluetooth, and eventually upload on the servers, are random-like strings like 387e07342c243b50a05da363f67e17ea25fe03bc, generated on a daily base (or more often) using hash functions. Even if these tags are calculated from some private/sensitive data (e.g. phone number, IMEI, Bluetooth MAC, GPS position), it is not possible to recover such data from the hash digest. In most protocols, no other information are exchanged between apps nor are uploaded to servers. Thus the user identity cannot be associated to tags. (In centralized solutions, however, the central service may geolocalize a device when the tracing app connects to the server, either for uploading its tags or for collecting the tags that new positive patients have uploaded.) Now, a natural question arises: can some sensible geographic information leak, even adopting strong anonymization techniques? In this exercise we will see that the answer is: yes. More precisely, we will see that from a (dense enough) set of anonymous contact trace data we can reconstruct (quite precisely) the geometric shape of the area from where the data come from. (And notice that this contact trace data can be easily collected in centralized contact tracing solution, while it would be much more difficult with truly distributed ones.) [... vari esempi ...] # Conclusions In this exercise we have seen that an anonymized proximity graph can contain enough information to reproduce the geometry of the area where the data came from. No geometric / GPS / geolocation / distance information is needed: the contact information will suffice. Once a map is obtained in this way, it can be laid over the actual map of the area of origin (e.g. using Google Maps). Thus, the (anonymous) points can be given an actual position. Accuracy of the result depends on the number of points (devices), and the proximity distance. It should be observed that in order to construct the proximity graph we need to collect exchanged tags in a centralised server. This can be achieved quite easily with centralized contact tracing solution, while it would be much more difficult with truly distributed ones. The analysis we have carried out in this small example is still primitive, and based on very limited techniques and tools. After all, I'm no big data expert :) More sophisticated techniques could be used in order to achieve a similar result, even in presence of limited data. Further improvements can be obtained by taking advantage of extra information, not included in the proximity graph, such as: "pinned points" (i.e., points whose coordinates are known), population distribution on the area, etc.