Geography Utils

To make handling of mobility and geodata easier, trackintel features several geographic utility functions and distance functions for points and trajectories

trackintel.geogr.distances.calculate_distance_matrix(X, Y=None, dist_metric='haversine', n_jobs=0, **kwds)

Calculate a distance matrix based on a specific distance metric.

If only X is given, the pair-wise distances between all elements in X are calculated. If X and Y are given, the

distances between all combinations of X and Y are calculated. Distances between elements of X and X, and distances between elements of Y and Y are not calculated.

Parameters
  • X (GeoDataFrame) – GeoPandas DataFrame in trackintel staypoints or triplegs format.

  • Y (GeoDataFrame) – [optional] GeoPandas DataFrame in trackintel staypoints or triplegs format.

  • dist_metric (str, {'haversine', 'euclidean', 'dtw', 'frechet'}, default 'haversine') – The distance metric to be used for calculating the matrix. This function wraps around the pairwise_distance function from scikit-learn if only X is given and wraps around the scipy.spatial.distance.cdist function if X and Y are given. Therefore the following metrics are also accepted: via scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’] via scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’] triplegs can only be used in combination with [‘dtw’, ‘frechet’]

  • n_jobs (int) – Number of cores to use: ‘dtw’, ‘frechet’ and all distance metrics from pairwise_distance (only available if only X is given) are parallelized

  • kwds (optional keywords passed to the distance functions) –

  • numpy array (Returns) – returns matrix of shape (len(X), len(X)) or of shape (len(X), len(Y))

  • -------

trackintel.geogr.distances.meters_to_decimal_degrees(meters, latitude)

Converts meters to decimal degrees (approximately).

Parameters
  • meters (float) – The meters to convert to degrees.

  • latitude (float) – As the conversion is dependent (approximatively) on the latitude where the conversion happens, this needs to be specified. Use 0 for the equator.

Returns

An approximation of a distance (given in meters) in degrees.

Return type

float

Point distances

trackintel.geogr.point_distances.haversine_dist(lon_1, lat_1, lon_2, lat_2, r=6371000)

Computes the great circle or haversine distance between two coordinates in WGS84. Serialized version of the haversine distance.

Parameters
  • lon_1 (float or numpy.array of shape (-1,)) – The longitude of the first point.

  • lat_1 (float or numpy.array of shape (-1,)) – The latitude of the first point.

  • lon_2 (float or numpy.array of shape (-1,)) – The longitude of the second point.

  • lat_2 (float or numpy.array of shape (-1,)) – The latitude of the second point.

  • r (float) – Radius of the reference sphere for the calculation. The average Earth radius is 6’371’000 m.

Returns

An approximation of the distance between two points in WGS84 given in meters.

Return type

float

Examples

>>> haversine_dist(8.5, 47.3, 8.7, 47.2)
18749.056277719905

References

https://en.wikipedia.org/wiki/Haversine_formula https://stackoverflow.com/questions/19413259/efficient-way-to-calculate-distance-matrix-given-latitude-and-longitude-data-in

Trajectory distances

trackintel.geogr.trajectory_distances.dtw(exp_data, num_data, metric='euclidean', **kwargs)

https://github.com/cjekel/similarity_measures/blob/master/similaritymeasures/similaritymeasures.py Compute the Dynamic Time Warping distance. This computes a generic Dynamic Time Warping (DTW) distance and follows the algorithm from [1]_. This can use all distance metrics that are available in scipy.spatial.distance.cdist. :param exp_data: Curve from your experimental data. exp_data is of (M, N) shape, where

M is the number of data points, and N is the number of dimmensions

Parameters
  • num_data (array_like) – Curve from your numerical data. num_data is of (P, N) shape, where P is the number of data points, and N is the number of dimmensions

  • metric (str or callable, optional) – The distance metric to use. Default=’euclidean’. Refer to the documentation for scipy.spatial.distance.cdist. Some examples: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.

  • **kwargs (dict, optional) –

    Extra arguments to metric: refer to each metric documentation in scipy.spatial.distance. Some examples: p : scalar

    The p-norm to apply for Minkowski, weighted and unweighted. Default: 2.

    wndarray

    The weight vector for metrics that support weights (e.g., Minkowski).

    Vndarray

    The variance vector for standardized Euclidean. Default: var(vstack([XA, XB]), axis=0, ddof=1)

    VIndarray

    The inverse of the covariance matrix for Mahalanobis. Default: inv(cov(vstack([XA, XB].T))).T

    outndarray

    The output array If not None, the distance matrix Y is stored in this array.

  • Retruns

  • -------

  • r (float) – DTW distance.

  • d (ndarray (2-D)) – Cumulative distance matrix

Notes

The DTW distance is d[-1, -1]. This has O(M, P) computational cost. The latest scipy.spatial.distance.cdist information can be found at https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html Your x locations of data points should be exp_data[:, 0], and the y locations of the data points should be exp_data[:, 1]. Same for num_data. This uses the euclidean distance for now. In the future it should be possible to support other metrics. DTW is a non-metric distance, which means DTW doesn’t hold the triangle inequality. https://en.wikipedia.org/wiki/Triangle_inequality

References

1

Senin, P., 2008. Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 855, pp.1-23. http://seninp.github.io/assets/pubs/senin_dtw_litreview_2008.pdf

Examples

>>> # Generate random experimental data
>>> x = np.random.random(100)
>>> y = np.random.random(100)
>>> exp_data = np.zeros((100, 2))
>>> exp_data[:, 0] = x
>>> exp_data[:, 1] = y
>>> # Generate random numerical data
>>> x = np.random.random(100)
>>> y = np.random.random(100)
>>> num_data = np.zeros((100, 2))
>>> num_data[:, 0] = x
>>> num_data[:, 1] = y
>>> r, d = dtw(exp_data, num_data)
The euclidean distance is used by default. You can use metric and **kwargs
to specify different types of distance metrics. The following example uses
the city block or Manhattan distance between points.
>>> r, d = dtw(exp_data, num_data, metric='cityblock')
trackintel.geogr.trajectory_distances.frechet_dist(exp_data, num_data, p=2)

https://github.com/cjekel/similarity_measures/blob/master/similaritymeasures/similaritymeasures.py Compute the discrete Frechet distance Compute the Discrete Frechet Distance between two N-D curves according to [1]_. The Frechet distance has been defined as the walking dog problem. From Wikipedia: “In mathematics, the Frechet distance is a measure of similarity between curves that takes into account the location and ordering of the points along the curves. It is named after Maurice Frechet. https://en.wikipedia.org/wiki/Fr%C3%A9chet_distance :param exp_data: Curve from your experimental data. exp_data is of (M, N) shape, where

M is the number of data points, and N is the number of dimmensions

Parameters
  • num_data (array_like) – Curve from your numerical data. num_data is of (P, N) shape, where P is the number of data points, and N is the number of dimmensions

  • p (float, 1 <= p <= infinity) – Which Minkowski p-norm to use. Default is p=2 (Eculidean). The manhattan distance is p=1.

Returns

df – discrete Frechet distance

Return type

float

References

1

Thomas Eiter and Heikki Mannila. Computing discrete Frechet distance. Technical report, 1994. http://www.kr.tuwien.ac.at/staff/eiter/et-archive/cdtr9464.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.937&rep=rep1&type=pdf

Notes

Your x locations of data points should be exp_data[:, 0], and the y locations of the data points should be exp_data[:, 1]. Same for num_data. Python has a default limit to the amount of recursive calls a single function can make. If you have a large dataset, you may need to increase this limit. Check out the following resources. https://docs.python.org/3/library/sys.html#sys.setrecursionlimit https://stackoverflow.com/questions/3323001/what-is-the-maximum-recursion-depth-in-python-and-how-to-increase-it Thanks to MaxBareiss https://gist.github.com/MaxBareiss/ba2f9441d9455b56fbc9 This sets a global variable named pnorm, where pnorm = p.

Examples

>>> # Generate random experimental data
>>> x = np.random.random(100)
>>> y = np.random.random(100)
>>> exp_data = np.zeros((100, 2))
>>> exp_data[:, 0] = x
>>> exp_data[:, 1] = y
>>> # Generate random numerical data
>>> x = np.random.random(100)
>>> y = np.random.random(100)
>>> num_data = np.zeros((100, 2))
>>> num_data[:, 0] = x
>>> num_data[:, 1] = y
>>> df = frechet_dist(exp_data, num_data)