GeoPriv

Usage

Along this brief tutorial we will be using this sample dataset, consisting of 18967 points of locations of Tweets from Milano, Italy. Feel free to download it.

Index:

Getting started
Functionalities
The Location Privacy Protection Mechanisms (LPPM)

Getting Started

This plugin integrates three Location Privacy Protection Mechanisms (or LPPM) that were extracted from academic papers and research to provide technical and non-technical folks a suite of geoprivacy methods to protect their datasets before publishing or exporting. This plugin works with vector layers that have Point geometries (Multipoint geometries are not supported yet) and that are projected to a projection that maintains latitude and longitude coordinates normal.

There are several categories for LPPMs, but for this kind of data, we chose to work with obfuscation and K-Anonymity methods, that are well suited for Point geometries and for offline data.
Obfuscation methods hide the original location by changing the longitude and latitude of the point to new ones, while K-Anonymity hides user information and location by grouping similar or close points to be indistinguishable between them.

This plugin is located in the Vector menu and can be found in the QGIS Python Plugins Repository.
Now create a new layer with the test dataset provided and let's get started!

Functionalities

The plugin consists of a layer picker(1), three LPPMs(2,3,4), a Data Preview tab(5) and a Results tab(6).

The layer picker allows you to select one of the valid layers, that means, a Vector Layer with Point geometry, every other layer is filtered out from the selector. It is initially empty so you have to select the layer by yourself.
Next section will tell you everything you need to know about the

The Data Preview shows the first 100 or fewer features of the selected layer and its fields so you can explore the data. After a new layer is selected the information of the Data Preview tab is automatically updated. If your data has a field with the same values of the latitude or longitude it will not be shown to avoid revealing the original location of the points, also, when the data is processed, those fields with location data will disappear.

Each one of the LPPMs has its own tab, a set of parameters and a Process Data button in the bottom of its tab. The next section will analyze each mechanism and explain each one of its parameters.

After the layer is processed by clicking on the Process Data button in any of the LPPMs, the Results tab is opened automatically and a temporal layer is created with the name of the mechanism and the original layer projection. The Results tab shows the progress of the mechanism (for large datasets it will take long processing times) and metrics of the new layer generated.

Two metrics are shown as well on the Results tab, the Quadratic Error or mean squared error and the point loss. The Quadratic error indicates how alike is the new layer with the original one by calculating the distance between the new and the original point and showing the mean of all errors squared (see Mean Squared Error). While the error increases, protection increases, but the utility of the data decreases.

Point loss is the number of points that were erased during processing. Some mechanisms delete the noise of the dataset, but if every point is necessary you may want this value low.

Feel free to process your data as many times as you like testing different values for the parameters until you find the one you like the most.

The Location Privacy Protection Mechanisms (LPPM)

There are three LPPMs implemented in this plugin:

Spatial Clustering
NRandK
Laplace Noise (or Geo indistinguishability)

Spatial Clustering

This is a K-Anonymity mechanism that provides privacy by aggregating points by their distance.
Based on the VoKA algorithm with modifications in the clustering algorithms. You can find the paper here. This mechanism does a pre-processing, where it gridifies data, that means, aggregates data building a virtual grid according to a precision defined by the user.
The precision for the gridification is the number of decimals that will be truncated from each coordinate to aggregate points by matching truncated coordinates.
After that, the user can choose between K-Means and DBScan as the spatial clustering algorithm for the geolocated data.

PRO TIP:

Both K-Means and DBSCAN algorithms are very parameter-dependant. Default values are set for small cities but you will have to tune the parameters very carefully to truly unleash the power of these two clustering algorithms.
DBSCAN is really useful for clusters with non-uniform shapes while KMeans works better with evenly distributed data and uniform clusters.

This mechanism receives 3 global parameters:

Grid decimal Precision: Number of decimals to be truncated for the aggregation process (gridification). A value of 5 approximates to 10cm, 4 to 10m, 3 to 100m and so on.
Clustering algorithm: Selector with the available clustering algorithms.
Minimum K: Minimum number of points that the cluster needs to group to ensure K-Anonymity.

Additionally, each algorithm accepts its own set of parameters:

- KMeans:
For method explanation see KMeans.
The parameters that KMeans need are the number of clusters K and a random seed for reproducibility, which is zero (0) by default.

- DBScan:
For method explanation see DBSCAN.
The parameters that DBSCAN need are the aggregation radius R and the minimum quantity of points that must be grouped to make a cluster. This number can be different from the global minimum K.
Radius values of 0.00001 approximates to a 10cm radius, 0.0001 to a 1m radius and 0.001 to a 100m radius.

NRandK

NRandK is an obfuscation Location Privacy Protection Mechanism that provides privacy by the modification of the original coordinates of the point.

This mechanism consists in applying an algorithm named NRand to each point of the dataset by generating N random points (four (4) by default) on a circle of radius R and choosing the farthest from the original point.
The radius of the circle of each point is decided by how many points are close to it after an aggregation process.
If the number of points that are close to the point is greater than K, the point uses a small radius, otherwise uses a larger radius.

This mechanism receives 6 parameters:

Grid decimal Precision: Number of decimals to be truncated for the aggregation process (gridification).
A value of 5 approximates to 10cm, 4 to 10m, 3 to 100m and so on.
Number of random points generated: Number of points that are generated inside the circle for each point. Four by default.
Minimum Points: Number of points that determines if the point uses the small or the large radius.
Small radius: Small radius for points in high-density areas. Values of 0.00001 approximate to a 10cm radius, 0.0001 to a 1m radius and 0.001 to a 100m radius.
Large radius: large radius for points in low-density areas. Values of 0.00001 approximate to a 10cm radius, 0.0001 to a 1m radius and 0.001 to a 100m radius.
Random seed: Random seed for reproducibility.

You can find the scientific paper here

Laplace Noise (or Geo indistinguishability)

This is a differential privacy mechanism that provides privacy by changing the coordinates of a feature generating a new random point following a Laplacian distribution.
For an explanation of differential privacy see Differential Privacy.

Also, it is based on the principle of Geo indistinguishability, explained in this paper.

This mechanism receives 2 parameters:

Sensitivity: Sensitivity value of the Laplace distribution.
A higher value means a larger area where the points can be generated.
Random seed: Random seed for reproducibility.

Geoprivacy Plugin

A set of location privacy tools
for geographic data.

About

If you like the plugin, give it 5 stars!

Usage

Getting Started

Functionalities

The Location Privacy Protection Mechanisms (LPPM)

Spatial Clustering

NRandK

Laplace Noise (or Geo indistinguishability)

Contribute

Adding a new Location Privacy Protection Mechanism

Structure of the DataModel

Adding the method to the UI

Geoprivacy Plugin

A set of location privacy tools for geographic data.

About

If you like the plugin, give it 5 stars!

Usage

Getting Started

Functionalities

The Location Privacy Protection Mechanisms (LPPM)

Spatial Clustering

NRandK

Laplace Noise (or Geo indistinguishability)

Contribute

Adding a new Location Privacy Protection Mechanism

Structure of the DataModel

Adding the method to the UI

A set of location privacy tools
for geographic data.