Bogy: February 2008

1. Distance between a point and a road segment

Let G(V,E) be the directed graph describing the road network. V is the set of vertexes or nodes and E is the set of edges or links. Let Q_i be the set of points given by the GPS data stream i = 1...T. Each GPS point consists of a pair of coordinates (x_i, y_i) and a time-stamp t_i. To measure the distance between a GPS point Q and an oriented link AB, we use the istance as introduced in [8]. Let Q` be the projection of Q on line AB. The distance is defined as follows:

Where de denotes the Euclidean distance, With this definition, d_Q,AB = d_Q,BA and a point is equally distant from the two opposite directed links of a given road segment.

For this reason, a small shift perpendicular to the link direction is introduced, according to the driving side of the studied area. Therefore, in the computation of the distance, an oriented link AB is replaced by A`B` where A` = A + λ1_┴and B` = B + λ1_┴ and 1_┴ ┴ AB, λ is chosen to reflect the average physical distance between the middle of the road and the middle of the driving lanes (in our application, ┴ = 3m). The computation of the distance between a point and a road segment is illustrated on Figure 1 where Q,Q1 and Q2 are equidistant to AB.

2. Score of a path

Let P {E₁,E₂, . . . ,E_p} denote the path composed of the p subse

quent links E₁,E₂, . . . ,E_p.

The absolute score F of path P in matching the sequence Q_i is defined as:

Where δ_ij = 1 if Q_i was matched by link E_j and δ_ij = 0 otherwise. The problem is to find the path P* that minimizes F. The relative score F_r is defined as F_r = F/T.

F_r will be used as a surrogate for the estimation of the error of the algorithm. Note that F_ris highly dependent on the resolution of the coded network. Each point is matched by only one link of the path, so that:

3. Initialization

The algorithm is initialized by finding the set S of the N nearest links from the first GPS point Q₁. Note that we could use in practice a few more GPS points so that the initialization is more robust and less dependent on potential initial error in the measurement. For each link in the set, we create a one-link path with the initial link.

The path P₁, P₂, . . . , P_N are ranked according to their scores: F₁ ≤ F₂ ≤ . . . ≤ Fn.

Technically, they are stored in a sorted set. Alphabetical rank is used to discriminate paths that are different but that have the same score. Searching for the nearest link can be accelerated using quad-tree storage for the links. Experiments done on a network with 400, 000 links showed performances of about 1 million of points per minute on a PC clocked at 2Ghz.

4. Breaking routes into paths

In most cases, a whole route will not be matched by a single continuous path because of interruptions in the GPS data stream. These interruptions can be due to tunnels, tree canopy, poor signal, etc. Therefore, a route will often be matched by a sequence of paths. We do not address here the problem of connecting these paths together.

The outer loop of the algorithm processes the GPS points sequentially. Assume Q_i is the next data point to be processed. If the GPS device is searching for the signal, the point is discarded. If de (Q_i-1,Q_i) is bigger than a given threshold (e.g. 300 meters) or if t_i-1−t_i is longer than a given period (e.g. 30 seconds), the matching is stopped for the current path and a fresh initialization is started with Q_i.

Obviously, this procedure could be improved to be less sensitive on outliers.

5. Following the network topology

Given a new point Q_i, the following procedure is applied for each of the path P in the set S.

Assume that Q_i is matched by the last link E_p, that is δ_ip = 1.
Update the score of path P using (1).
Insert P in a new sorted set V.
Check if the route has reached the next intersection (i.e. the destination of E_p).
If yes, create the new paths P_K^FS that correspond to the forward star of E_p.
Update the score of paths P_K^FS using (1).
Insert P_K^FS in the new sorted set V.

When the condition is met on step 4, new paths are created depending on the network topology. However, it is not obvious to determine if the vehicle has reached the end of the last link. We use the bird distance covered by the GPS points since the beginning of E_p to determine if the vehicle might have cruised the whole link.

Let Q_p0 the first point matched by E_p and L(E_p) the length of E_p. If the following condition holds,

Then we assume that the vehicle has reached the destination of E_p. A problem arises if a vehicle performs a u-turn in the middle of a long link. The parameter α = 0.5 is introduced for that reason.

Let FS (E_p) be the set of downstream links from E_p (forward star). If the condition(2) holds, a new path P_K^FS is created for each of the link in FS (E_p). These children paths differ from their ancestor since they have one more link: E^k_p+1Є FS(Ep) where E^k_p+1 is the last link of path P_K^FS. The children paths are initialized with the last point so that δ_ip+1 = 1. The path creation mechanism is represented on Figure 2 where a vehicle is driving from O to D. At the intersection A, three children paths are created (P₁, P₂, P₃). Clearly the first points close to A do not allow discriminating which direction has been taken. Indeed, the first point suggests AC was the chosen direction. Obviously, the direction was AD but the network description was not accurate enough to describe the curvature of the road section.

Initially, the score of P₃ is better than that of P₂. However, as more points are added, the score of P₂ will dominate P₃ and P₁ in the end.

6. Maintaining a set of candidate paths

Children paths are inserted into the sorted set V and ranked according to their score. Consequently, V has more elements than S. When Q_i is processed, S is emptied and only the N best elements of V are inserted into S. This avoids that the sorted set S grows indefinitely. In practice, the branching and the path creation can happen for a range of different Q_i as soon as the condition (2) is met. The range depends on α. Therefore, there are generally several instances of paths P₁, P₂, P₃ in the S, depending on the point Q_i when the branching occurred. Eventually, the less efficient paths are discarded. The sorted set S has to be large enough to keep track of unlikely trajectories that might turn to be correct ultimately as illustrated on the previous example. Intuitively, the number of candidate paths to maintain is a function of the resolution of the coded network.

When the last point Q_i is processed, the best path from S is kept as the result of the map-matching process. Similarly, when the route is broken and a new path has to be started; only the best path from S is kept as the best matching part for the previous GPS points. The flow-chart of the whole algorithm is presented on Figure 3. It depends on two parameters: α and the size of the set of candidates N.

References:

Efficient map-matching of large GPS data sets - Tests on a speed monitoring experiment in Zurich, F.Marchal_ J. Hackney K.W. Axhausen†, 26th July 2004.

Links:

http://www.strc.ch/pdf_2005/STRC05_D2_Marchal.pdf

Bogy

Thursday, February 7, 2008

Efficient map-matching of large GPS data sets

Blog Archive

Contributors