Preprocessing

The preprocessing module contains a variety of functions to transform mobility and tracking data into richer data sources.

Filtering

trackintel.preprocessing.filter.spatial_filter(source, areas, method='within', re_project=False)

Filter staypoints, locations or triplegs with a geo extent.

Parameters
  • source (GeoDataFrame) – The source feature to perform the spatial filtering

  • areas (GeoDataFrame) – The areas used to perform the spatial filtering. Note, you can have multiple Polygons and it will return all the features intersect with ANY of those geometries.

  • method (str, {'within', 'intersects', 'crosses'}, default 'within') –

    • ‘within’ : return instances in ‘source’ where no points of these instances lies in the exterior of the ‘areas’ and at least one point of the interior of these instances lies in the interior of ‘areas’.

    • ’intersects’: return instances in ‘source’ where the boundary or interior of these instances intersect in any way with those of the ‘areas’

    • ’crosses’ : return instances in ‘source’ where the interior of these instances intersects the interior of the ‘areas’ but does not contain it, and the dimension of the intersection is less than the dimension of the one of the ‘areas’.

  • re_project (bool, default False) – If this is set to True, the ‘source’ will be projected to the coordinate reference system of ‘areas’

Returns

A new GeoDataFrame containing the features after the spatial filtering.

Return type

GeoDataFrame

Examples

>>> spts.as_staypoints.spatial_filter(areas, method="within", re_project=False)

Positionfixes

As positionfixes are usually the data we receive from a tracking application of some sort, there are various functions that extract meaningful information from it (and in the process turn it into a higher-level trackintel data structure).

In particular, we can extract staypoints and triplegs from positionfixes.

trackintel.preprocessing.positionfixes.generate_staypoints(positionfixes, method='sliding', dist_threshold=50, time_threshold=300, epsilon=100, dist_func=<function haversine_dist>, num_samples=None)

Generates staypoints from positionfixes.

Parameters
  • positionfixes (GeoDataFrame) – The positionfixes have to follow the standard definition for positionfixes DataFrames.

  • method (str, {'sliding' or 'dbscan'}, default 'sliding') –

    • ‘sliding’ : Applies a sliding window over the data.

    • ’dbscan’ : Uses the DBSCAN algorithm to find clusters of staypoints.

  • dist_threshold (float, default 50) – The distance threshold for the ‘sliding’ method, i.e., how far someone has to travel to generate a new staypoint.

  • time_threshold (float, default 300 (seconds)) – The time threshold for the ‘sliding’ method in seconds, i.e., how long someone has to stay within an area to consider it as a staypoint.

  • epsilon (float, default 100) – The epsilon for the ‘dbscan’ method.

  • dist_func (function, defaut haversine_dist) – A function that expects (lon_1, lat_1, lon_2, lat_2) and computes a distance in meters.

  • num_samples

Returns

Tuple of (positionfixes, staypoints). Positionfixes is the original GeoDataFrame with a new column ‘staypoint_id’.

Return type

(GeoDataFrame, GeoDataFrame)

Examples

>>> pfs.as_positionfixes.generate_staypoints('sliding', dist_threshold=100)

References

Zheng, Y. (2015). Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology (TIST), 6(3), 29.

Li, Q., Zheng, Y., Xie, X., Chen, Y., Liu, W., & Ma, W. Y. (2008, November). Mining user similarity based on location history. In Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems (p. 34). ACM.

trackintel.preprocessing.positionfixes.generate_triplegs(positionfixes, staypoints=None, *args, **kwargs)

Generates triplegs from positionfixes. A tripleg is (for now) defined as anything that happens between two consecutive staypoints.

Attention: This function requires either a column staypoint_id on the positionfixes or passing some staypoints that correspond to the positionfixes! This means you usually should call extract_staypoints() first.

Parameters
  • positionfixes (GeoDataFrame) – The positionfixes have to follow the standard definition for positionfixes DataFrames.

  • staypoints (GeoDataFrame, optional) – The staypoints (corresponding to the positionfixes). If this is not passed, the positionfixes need staypoint_id associated with them.

Returns

Tuple of (positionfixes, triplegs). Positionfixes is the original GeoDataFrame with a new column ‘tripleg_id’.

Return type

(GeoDataFrame, GeoDataFrame)

Examples

>>> pfs.as_positionfixes.generate_triplegs(staypoints)

Staypoints

Staypoints are points where someone stayed for a longer period of time (e.g., during a transfer between two transport modes). We can cluster these into locations that a user frequently visits.

trackintel.preprocessing.staypoints.generate_locations(staypoints, method='dbscan', epsilon=100, num_samples=1, distance_matrix_metric='euclidean', agg_level='user')

generate locations from the staypoints.

Parameters
  • staypoints (GeoDataFrame) – The staypoints have to follow the standard definition for staypoints DataFrames.

  • method (str, {'dbscan'}, default 'dbscan') –

    • ‘dbscan’ : Uses the DBSCAN algorithm to cluster staypoints.

  • epsilon (float, default 100) – The epsilon for the ‘dbscan’ method. if ‘distance_matrix_metric’ is ‘haversine’ or ‘euclidean’, the unit is in meters.

  • num_samples (int, default 1) – The minimal number of samples in a cluster.

  • distance_matrix_metric (str, default 'euclidean') – The distance matrix used by the applied method. Possible metrics are: {‘haversine’, ‘euclidean’} or any mentioned in: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html

  • agg_level (str, {'user','dataset'}, default 'user') – The level of aggregation when generating locations: - ‘user’ : locations are generated independently per-user. - ‘dataset’ : shared locations are generated for all users.

Returns

  • ret_sp (GeoDataFrame) – Original ‘staypoints’ containing one additional column ‘location_id’ linking to the ‘ret_loc’

  • ret_loc (GeoDataFrame) – A new GeoDataFrame containing locations that a person visited multiple times.

Examples

>>> spts.as_staypoints.generate_locations(method='dbscan', epsilon=100, num_samples=1)

Triplegs

Triplegs denote routes taken between two consecutive staypoint. Usually, these are traveled with a single mode of transport. Depending on the tracking data, they can be rather noisy, for which reason we often want to smoothen them.

The extraction of triplegs in generate_trips follows this algorithm:

../_images/tripalgorithm.png