Let G(V,E) be the directed graph describing the road network. V is the set of vertexes or nodes and E is the set of edges or links. Let Qi be the set of points given by the GPS data stream i = 1...T. Each GPS point consists of a pair of coordinates (xi, yi) and a time-stamp ti. To measure the distance between a GPS point Q and an oriented link AB, we use the istance as introduced in [8]. Let Q` be the projection of Q on line AB. The distance is defined as follows:
Where de denotes the Euclidean distance, With this definition, dQ,AB = dQ,BA and a point is equally distant from the two opposite directed links of a given road segment.
For this reason, a small shift perpendicular to the link direction is introduced, according to the driving side of the studied area. Therefore, in the computation of the distance, an oriented link AB is replaced by A`B` where A` = A + λ1┴ and B` = B + λ1┴ and 1┴ ┴ AB, λ is chosen to reflect the average physical distance between the middle of the road and the middle of the driving lanes (in our application, ┴ = 3m). The computation of the distance between a point and a road segment is illustrated on Figure 1 where Q,Q1 and Q2 are equidistant to AB.
2. Score of a path
Let P {E1,E2, . . . ,Ep} denote the path composed of the p subse
quent links E1,E2, . . . ,Ep.
The absolute score F of path P in matching the sequence Qi is defined as:
Where δij = 1 if Qi was matched by link Ej and δij = 0 otherwise. The problem is to find the path P* that minimizes F. The relative score Fr is defined as Fr = F/T.
Fr will be used as a surrogate for the estimation of the error of the algorithm. Note that Fr is highly dependent on the resolution of the coded network. Each point is matched by only one link of the path, so that:
3. Initialization
The algorithm is initialized by finding the set S of the N nearest links from the first GPS point Q1. Note that we could use in practice a few more GPS points so that the initialization is more robust and less dependent on potential initial error in the measurement. For each link in the set, we create a one-link path with the initial link.
The path P1, P2, . . . , PN are ranked according to their scores: F1 ≤ F2 ≤ . . . ≤ Fn.
Technically, they are stored in a sorted set. Alphabetical rank is used to discriminate paths that are different but that have the same score. Searching for the nearest link can be accelerated using quad-tree storage for the links. Experiments done on a network with 400, 000 links showed performances of about 1 million of points per minute on a PC clocked at 2Ghz.
4. Breaking routes into paths
In most cases, a whole route will not be matched by a single continuous path because of interruptions in the GPS data stream. These interruptions can be due to tunnels, tree canopy, poor signal, etc. Therefore, a route will often be matched by a sequence of paths. We do not address here the problem of connecting these paths together.
The outer loop of the algorithm processes the GPS points sequentially. Assume Qi is the next data point to be processed. If the GPS device is searching for the signal, the point is discarded. If de (Qi-1,Qi) is bigger than a given threshold (e.g. 300 meters) or if ti-1−ti is longer than a given period (e.g. 30 seconds), the matching is stopped for the current path and a fresh initialization is started with Qi.
Obviously, this procedure could be improved to be less sensitive on outliers.
5. Following the network topology
Given a new point Qi, the following procedure is applied for each of the path P in the set S.
- Assume that Qi is matched by the last link Ep, that is δip = 1.
- Update the score of path P using (1).
- Insert P in a new sorted set V.
- Check if the route has reached the next intersection (i.e. the destination of Ep).
- If yes, create the new paths PKFS that correspond to the forward star of Ep.
- Update the score of paths PKFS using (1).
- Insert PKFS in the new sorted set V.
When the condition is met on step 4, new paths are created depending on the network topology. However, it is not obvious to determine if the vehicle has reached the end of the last link. We use the bird distance covered by the GPS points since the beginning of Ep to determine if the vehicle might have cruised the whole link.
Let Qp0 the first point matched by Ep and L(Ep) the length of Ep. If the following condition holds,
Then we assume that the vehicle has reached the destination of Ep. A problem arises if a vehicle performs a u-turn in the middle of a long link. The parameter α = 0.5 is introduced for that reason.
Let FS (Ep) be the set of downstream links from Ep (forward star). If the condition(2) holds, a new path PKFS is created for each of the link in FS (Ep). These children paths differ from their ancestor since they have one more link: Ekp+1 Є FS(Ep) where Ekp+1 is the last link of path PKFS. The children paths are initialized with the last point so that δip+1 = 1. The path creation mechanism is represented on Figure 2 where a vehicle is driving from O to D. At the intersection A, three children paths are created (P1, P2, P3). Clearly the first points close to A do not allow discriminating which direction has been taken. Indeed, the first point suggests AC was the chosen direction. Obviously, the direction was AD but the network description was not accurate enough to describe the curvature of the road section.
Initially, the score of P3 is better than that of P2. However, as more points are added, the score of P2 will dominate P3 and P1 in the end.
6. Maintaining a set of candidate paths
Children paths are inserted into the sorted set V and ranked according to their score. Consequently, V has more elements than S. When Qi is processed, S is emptied and only the N best elements of V are inserted into S. This avoids that the sorted set S grows indefinitely. In practice, the branching and the path creation can happen for a range of different Qi as soon as the condition (2) is met. The range depends on α. Therefore, there are generally several instances of paths P1, P2, P3 in the S, depending on the point Qi when the branching occurred. Eventually, the less efficient paths are discarded. The sorted set S has to be large enough to keep track of unlikely trajectories that might turn to be correct ultimately as illustrated on the previous example. Intuitively, the number of candidate paths to maintain is a function of the resolution of the coded network.
When the last point Qi is processed, the best path from S is kept as the result of the map-matching process. Similarly, when the route is broken and a new path has to be started; only the best path from S is kept as the best matching part for the previous GPS points. The flow-chart of the whole algorithm is presented on Figure 3. It depends on two parameters: α and the size of the set of candidates N.
References:
Efficient map-matching of large GPS data sets - Tests on a speed monitoring experiment in
Links:
http://www.strc.ch/pdf_2005/STRC05_D2_Marchal.pdf