GeoPriv is a plugin for QGIS 3+ that offers location privacy protection mechanisms for users to protect the privacy of their datasets before publishing or putting them on maps.
In GeoPriv you can choose between three location privacy methods that can be found in scientific articles and
that we have implemented to be easily used by technical and non-technical users, providing location privacy for everyone!
The location privacy methods implemented work with Vector Layers that have Point geometries only and provide privacy
by obfuscation and k-anonymity.
This plugin is our final project for our Computer Science undergrad and is intended to help the Open Source and the GIS community with a reliable tool for geoprivacy.
Along this brief tutorial we will be using this sample dataset, consisting of 18967 points of locations of Tweets
from Milano, Italy. Feel free to download it.
Index:
This plugin integrates three Location Privacy Protection Mechanisms (or LPPM) that were extracted from academic papers and research to provide technical and non-technical folks a suite of geoprivacy methods to protect their datasets before publishing or exporting. This plugin works with vector layers that have Point geometries (Multipoint geometries are not supported yet) and that are projected to a projection that maintains latitude and longitude coordinates normal.
There are several categories for LPPMs, but for this kind of data, we chose to work with obfuscation and K-Anonymity methods, that
are well suited for Point geometries and for offline data.
Obfuscation methods hide the original location by changing the longitude and latitude of the point to new ones, while K-Anonymity
hides user information and location by grouping similar or close points to be indistinguishable between them.
This plugin is located in the Vector menu and can be found in the QGIS Python Plugins Repository.
Now create a new layer with the test dataset provided and let's get started!
The plugin consists of a layer picker(1), three LPPMs(2,3,4), a Data Preview tab(5) and a Results tab(6).
The layer picker allows you to select one of the valid layers, that means, a Vector Layer with Point geometry, every other layer is filtered out
from the selector. It is initially empty so you have to select the layer by yourself.
Next section will tell you everything you need to know about the
The Data Preview shows the first 100 or fewer features of the selected layer and its fields so you can explore the data. After a new layer is selected the information of the Data Preview tab is automatically updated. If your data has a field with the same values of the latitude or longitude it will not be shown to avoid revealing the original location of the points, also, when the data is processed, those fields with location data will disappear.
Each one of the LPPMs has its own tab, a set of parameters and a Process Data button in the bottom of its tab. The next section will analyze each mechanism and explain each one of its parameters.
After the layer is processed by clicking on the Process Data button in any of the LPPMs, the Results tab is opened automatically and a temporal layer is created with the name of the mechanism and the original layer projection. The Results tab shows the progress of the mechanism (for large datasets it will take long processing times) and metrics of the new layer generated.
Two metrics are shown as well on the Results tab, the Quadratic Error or mean squared error and the point loss. The Quadratic error indicates how alike is the new layer with the original one by calculating the distance between the new and the original point and showing the mean of all errors squared (see Mean Squared Error). While the error increases, protection increases, but the utility of the data decreases.
Point loss is the number of points that were erased during processing. Some mechanisms delete the noise of the dataset, but if every point is necessary you may want this value low.
Feel free to process your data as many times as you like testing different values for the parameters until you find the one you like the most.
There are three LPPMs implemented in this plugin:
This is a K-Anonymity mechanism that provides privacy by aggregating points by their distance.
Based on the VoKA algorithm with modifications in the clustering algorithms. You can find the paper here.
This mechanism does a pre-processing, where it gridifies data, that means, aggregates data building a virtual grid according to a precision defined by the user.
The precision for the gridification is the number of decimals that will be truncated from each coordinate to aggregate points by matching truncated coordinates.
After that, the user can choose between K-Means and DBScan as the spatial clustering algorithm for the geolocated data.
PRO TIP:
Both K-Means and DBSCAN algorithms are very parameter-dependant. Default values are set for small cities but you will have to
tune the parameters very carefully to truly unleash the power of these two clustering algorithms.
DBSCAN is really useful for clusters with non-uniform shapes while KMeans works better with evenly distributed data and uniform clusters.
This mechanism receives 3 global parameters:
Additionally, each algorithm accepts its own set of parameters:
- KMeans:
For method explanation see KMeans.
The parameters that KMeans need are the number of clusters K and a random seed for reproducibility,
which is zero (0) by default.
- DBScan:
For method explanation see DBSCAN.
The parameters that DBSCAN need are the aggregation radius R and the minimum quantity of points that must be grouped to make a cluster. This number
can be different from the global minimum K.
Radius values of 0.00001 approximates to a 10cm radius, 0.0001 to a 1m radius and 0.001 to a 100m radius.
NRandK is an obfuscation Location Privacy Protection Mechanism that provides privacy by the modification of the original coordinates of the point.
This mechanism consists in applying an algorithm named NRand to each point of the dataset by generating N random points (four (4) by default)
on a circle of radius R and choosing the farthest from the original point.
The radius of the circle of each point is decided by how many points are close to it after an aggregation process.
If the number of points that are close to the point is greater than K, the point uses a small radius, otherwise uses a larger radius.
This mechanism receives 6 parameters:
You can find the scientific paper here
This is a differential privacy mechanism that provides privacy by changing the coordinates of a feature generating a new random point following a
Laplacian distribution.
For an explanation of differential privacy see Differential Privacy.
Also, it is based on the principle of Geo indistinguishability, explained in this paper.
This mechanism receives 2 parameters:
You can contribute to GeoPriv by adding new Location Privacy Protection Mechanisms and reporting bugs and issues.
Check the project repository and ask our team anything you want about
the source code.
To add your own methods into this plugin you have to follow some guidelines and use the data structures that we designed to ensure uniformity between every implemented mechanism.
The DataModel is how we distribute the information along the several functionalities of the plugin, making everything uniform and modular. That means that if someone wants to add a new mechanism it just have to receive the data as a DataModel and return a new DataModel with the processed data.
DataModel.py is located into the utils folder in the GeoPriv base folder. The constructor receives the following parameters:
The DataModel contains the folowing fields:
DataModel contains the following methods:
Ideally, your Location Privacy Mechanism should receive the layer data that the plugin contains as a DataModel, process it, calculate error and point loss and return the data as a DataModel. Because every method is different, the error calculation is a matter of each Location Privacy Mechanism separately, When you are done, you can connect the labels in the user interface to show the error of the new layer. As well, you can calculate the point loss and print into the results tab. Only use the Results tab and leave it in the last place in the tab widget.
To add a new mechanism, add a new tab to the pluginTabs tab widget in the QT Designer UI, add every field you need as parameters and put a Process Data button in the bottom.
After that, create a new method in the geoprivacy.py file located in the base folder of the plugin with the name of your mechanism and connect all of the form fields and buttons.
Finally, execute your method inside a try-catch printing error messages if the processing fails and showing the desired metrics when it succeeds.