Geography Utils¶
To make handling of mobility and geodata easier, trackintel features several geographic utility functions and distance functions for points and trajectories
-
trackintel.geogr.distances.
calculate_distance_matrix
(X, Y=None, dist_metric='haversine', n_jobs=0, **kwds)¶ Calculate a distance matrix based on a specific distance metric.
- If only X is given, the pair-wise distances between all elements in X are calculated. If X and Y are given, the
distances between all combinations of X and Y are calculated. Distances between elements of X and X, and distances between elements of Y and Y are not calculated.
- Parameters
X (GeoDataFrame) – GeoPandas DataFrame in trackintel staypoints or triplegs format.
Y (GeoDataFrame) – [optional] GeoPandas DataFrame in trackintel staypoints or triplegs format.
dist_metric (str, {'haversine', 'euclidean', 'dtw', 'frechet'}, default 'haversine') – The distance metric to be used for calculating the matrix. This function wraps around the
pairwise_distance
function from scikit-learn if only X is given and wraps around the scipy.spatial.distance.cdist function if X and Y are given. Therefore the following metrics are also accepted: via scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’] via scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’] triplegs can only be used in combination with [‘dtw’, ‘frechet’]n_jobs (int) – Number of cores to use: ‘dtw’, ‘frechet’ and all distance metrics from pairwise_distance (only available if only X is given) are parallelized
kwds (optional keywords passed to the distance functions) –
numpy array (Returns) – returns matrix of shape (len(X), len(X)) or of shape (len(X), len(Y))
------- –
-
trackintel.geogr.distances.
meters_to_decimal_degrees
(meters, latitude)¶ Converts meters to decimal degrees (approximately).
- Parameters
- Returns
An approximation of a distance (given in meters) in degrees.
- Return type
Point distances¶
-
trackintel.geogr.point_distances.
haversine_dist
(lon_1, lat_1, lon_2, lat_2, r=6371000)¶ Computes the great circle or haversine distance between two coordinates in WGS84. Serialized version of the haversine distance.
- Parameters
lon_1 (float or numpy.array of shape (-1,)) – The longitude of the first point.
lat_1 (float or numpy.array of shape (-1,)) – The latitude of the first point.
lon_2 (float or numpy.array of shape (-1,)) – The longitude of the second point.
lat_2 (float or numpy.array of shape (-1,)) – The latitude of the second point.
r (float) – Radius of the reference sphere for the calculation. The average Earth radius is 6’371’000 m.
- Returns
An approximation of the distance between two points in WGS84 given in meters.
- Return type
Examples
>>> haversine_dist(8.5, 47.3, 8.7, 47.2) 18749.056277719905
References
https://en.wikipedia.org/wiki/Haversine_formula https://stackoverflow.com/questions/19413259/efficient-way-to-calculate-distance-matrix-given-latitude-and-longitude-data-in
Trajectory distances¶
-
trackintel.geogr.trajectory_distances.
dtw
(exp_data, num_data, metric='euclidean', **kwargs)¶ https://github.com/cjekel/similarity_measures/blob/master/similaritymeasures/similaritymeasures.py Compute the Dynamic Time Warping distance. This computes a generic Dynamic Time Warping (DTW) distance and follows the algorithm from [1]_. This can use all distance metrics that are available in scipy.spatial.distance.cdist. :param exp_data: Curve from your experimental data. exp_data is of (M, N) shape, where
M is the number of data points, and N is the number of dimmensions
- Parameters
num_data (array_like) – Curve from your numerical data. num_data is of (P, N) shape, where P is the number of data points, and N is the number of dimmensions
metric (str or callable, optional) – The distance metric to use. Default=’euclidean’. Refer to the documentation for scipy.spatial.distance.cdist. Some examples: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.
**kwargs (dict, optional) –
Extra arguments to metric: refer to each metric documentation in scipy.spatial.distance. Some examples: p : scalar
The p-norm to apply for Minkowski, weighted and unweighted. Default: 2.
- wndarray
The weight vector for metrics that support weights (e.g., Minkowski).
- Vndarray
The variance vector for standardized Euclidean. Default: var(vstack([XA, XB]), axis=0, ddof=1)
- VIndarray
The inverse of the covariance matrix for Mahalanobis. Default: inv(cov(vstack([XA, XB].T))).T
- outndarray
The output array If not None, the distance matrix Y is stored in this array.
Retruns –
------- –
r (float) – DTW distance.
d (ndarray (2-D)) – Cumulative distance matrix
Notes
The DTW distance is d[-1, -1]. This has O(M, P) computational cost. The latest scipy.spatial.distance.cdist information can be found at https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html Your x locations of data points should be exp_data[:, 0], and the y locations of the data points should be exp_data[:, 1]. Same for num_data. This uses the euclidean distance for now. In the future it should be possible to support other metrics. DTW is a non-metric distance, which means DTW doesn’t hold the triangle inequality. https://en.wikipedia.org/wiki/Triangle_inequality
References
- 1
Senin, P., 2008. Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 855, pp.1-23. http://seninp.github.io/assets/pubs/senin_dtw_litreview_2008.pdf
Examples
>>> # Generate random experimental data >>> x = np.random.random(100) >>> y = np.random.random(100) >>> exp_data = np.zeros((100, 2)) >>> exp_data[:, 0] = x >>> exp_data[:, 1] = y >>> # Generate random numerical data >>> x = np.random.random(100) >>> y = np.random.random(100) >>> num_data = np.zeros((100, 2)) >>> num_data[:, 0] = x >>> num_data[:, 1] = y >>> r, d = dtw(exp_data, num_data) The euclidean distance is used by default. You can use metric and **kwargs to specify different types of distance metrics. The following example uses the city block or Manhattan distance between points. >>> r, d = dtw(exp_data, num_data, metric='cityblock')
-
trackintel.geogr.trajectory_distances.
frechet_dist
(exp_data, num_data, p=2)¶ https://github.com/cjekel/similarity_measures/blob/master/similaritymeasures/similaritymeasures.py Compute the discrete Frechet distance Compute the Discrete Frechet Distance between two N-D curves according to [1]_. The Frechet distance has been defined as the walking dog problem. From Wikipedia: “In mathematics, the Frechet distance is a measure of similarity between curves that takes into account the location and ordering of the points along the curves. It is named after Maurice Frechet. https://en.wikipedia.org/wiki/Fr%C3%A9chet_distance :param exp_data: Curve from your experimental data. exp_data is of (M, N) shape, where
M is the number of data points, and N is the number of dimmensions
- Parameters
num_data (array_like) – Curve from your numerical data. num_data is of (P, N) shape, where P is the number of data points, and N is the number of dimmensions
p (float, 1 <= p <= infinity) – Which Minkowski p-norm to use. Default is p=2 (Eculidean). The manhattan distance is p=1.
- Returns
df – discrete Frechet distance
- Return type
References
- 1
Thomas Eiter and Heikki Mannila. Computing discrete Frechet distance. Technical report, 1994. http://www.kr.tuwien.ac.at/staff/eiter/et-archive/cdtr9464.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.937&rep=rep1&type=pdf
Notes
Your x locations of data points should be exp_data[:, 0], and the y locations of the data points should be exp_data[:, 1]. Same for num_data. Python has a default limit to the amount of recursive calls a single function can make. If you have a large dataset, you may need to increase this limit. Check out the following resources. https://docs.python.org/3/library/sys.html#sys.setrecursionlimit https://stackoverflow.com/questions/3323001/what-is-the-maximum-recursion-depth-in-python-and-how-to-increase-it Thanks to MaxBareiss https://gist.github.com/MaxBareiss/ba2f9441d9455b56fbc9 This sets a global variable named pnorm, where pnorm = p.
Examples
>>> # Generate random experimental data >>> x = np.random.random(100) >>> y = np.random.random(100) >>> exp_data = np.zeros((100, 2)) >>> exp_data[:, 0] = x >>> exp_data[:, 1] = y >>> # Generate random numerical data >>> x = np.random.random(100) >>> y = np.random.random(100) >>> num_data = np.zeros((100, 2)) >>> num_data[:, 0] = x >>> num_data[:, 1] = y >>> df = frechet_dist(exp_data, num_data)