(PDF) MCMLSD: A Probabilistic Algorithm

1 MCMLSD: A Probabilistic Algorithm and Evaluation Framework for Line Segment Detection James H. Elder, Member, IEEE, Emilio J. Almazàn, Yiming Qian, Student Member, IEEE, and Ron Tal Abstract—Traditional approaches to line segment detection typically involve perceptual grouping in the image domain and/or global accumulation in the Hough domain. Here we propose a probabilistic algorithm that merges the advantages of both approaches. In a first stage lines are detected using a global probabilistic Hough approach. In the second stage each detected line is analyzed in the image domain to localize the line segments that generated the peak in the Hough map. By limiting search to a line, the distribution of arXiv:2001.01788v1 [cs.CV] 6 Jan 2020 segments over the sequence of points on the line can be modeled as a Markov chain, and a probabilistically optimal labelling can be computed exactly using a standard dynamic programming algorithm, in linear time. The Markov assumption also leads to an intuitive ranking method that uses the local marginal posterior probabilities to estimate the expected number of correctly labelled points on a segment. To assess the resulting Markov Chain Marginal Line Segment Detector (MCMLSD) we develop and apply a novel quantitative evaluation methodology that controls for under- and over-segmentation. Evaluation on the YorkUrbanDB and Wireframe datasets shows that the proposed MCMLSD method outperforms prior traditional approaches, as well as more recent deep learning methods. Index Terms—Line Segment Detection, Hough Transform, Probabilistic Models, Perceptual Organization, Evaluation Methodologies F 1 I NTRODUCTION Much of our visual world can be approximated as piecewise Another issue in this perceptual grouping framework is that planar, particularly in built environments. The boundaries and some threshold on the quality of fit measure must be applied in creases of these piecewise planar surfaces project to the image order to discriminate ‘true’ line segments from false conjunctions as line segments, and as a consequence the accurate detection of that might arise by chance. This issue was addressed in the LSD line segments continues to be one of the most important low-level framework introduced by von Gioi et al. [12] and based on earlier problems in the field of computer vision. Line segments are im- work by Desolneux et al. [13]. In this framework the so-called portant features for many tasks, including feature matching across a-contrario approach is used to explicitly compute the probability views [1], vanishing point detection [2] and 3D reconstruction [3], that inferred line segments might have occurred by chance, given [4], [5]. a maximum entropy model of the edge map. (This is related to Two frameworks have been popular for line segment detection: the minimum reliable scale null hypothesis testing framework for perceptual grouping and global Hough analysis. edge detection developed by Elder & Zucker [14].) While this approach does not eliminate the need for a threshold, it transfers 1.1 The perceptual grouping approach the threshold to a quantity (e.g., expected number of false positives In the perceptual grouping framework, a set of heuristics typically per image) that is much easier to set rationally. A much faster based upon geometric grouping cues (e.g., proximity, good con- version of this method dubbed EDLines was later introduced by tinuation) is used to group roughly collinear local features (e.g., Akinlar & Topal [15]. edges or vectors tangent to isophotes) into extended line segments, Recent work in this area has focused on trying to discrim- which are evaluated according to some quality of fit measure. An inate salient or important line segments from less important early example is the hierarchical heuristic framework developed ‘background’ segments. Kim et al. [16] used a combination of by Boldt and colleagues [6]. More recent multi-stage grouping luminance and geometric features to select the most significant efforts include the SSWMS approach of Nieto et al. [7], which in- edges, reporting superior performance to LSD on two test im- volves an iterative selection of image points with strongly oriented ages. Brown et al. [17] used a measure of divergence between gradient structure, followed by an iterative growing process, the colour statistics on either side of a hypothesized line segment approach of Lu et al. [8], which involves both linking and splitting, to favour salient segments. The method outperformed LSD and and the biologically inspired approach of Liu et al. [9], which Hough methods using quantitative measures of repeatability and employs ‘simple cell’ filters to detect local oriented structure, registration accuracy on image pairs (see Section 4 below). ‘complex cell’ mechanisms that locally integrate these responses and ‘hyper-complex’ mechanisms to detect endpoints. An alternative to this multi-stage grouping approach is to 1.2 The Hough approach analyze the covariance matrix of image locations in a set of A drawback of the perceptual grouping approach is that local connected edges and label a set as a line segment if the smallest decisions are made before potentially relevant global information eigenvalue falls below a threshold [10], [11]. While beautifully can be brought to bear. The Hough approach avoids this problem simple, these methods are not robust to gaps or intersections in by accumulating edges over the entire image into a histogram the edge map. of potential line positions and orientations. Accuracy can be 2 improved by modeling uncertainty in local edges and propagating and lengths of the line segments in the image. In particular, if a that uncertainty to the Hough map [18]. pixel is judged to lie on a segment, the pixel value indicates the While the Hough approach to line detection has the advantage estimated length of that segment, while a pixel not lying on any of integrating information globally, identifying the endpoints that segment should have value 0. The network is trained to minimize define the extent of the line segment in the image is not necessarily an L2 loss and the scalar output is thresholded to filter out shorter straightforward. A number of methods scan the detected lines or lower-confidence segments. in the image space looking for a maximal chain of connected Note that the Wireframe network delivers a raster map - or nearly-connected edges [19], [20]. Others have attempted to essentially an edge map where edges are constrained to lie on identify the endpoints of each line segment by analyzing the exact straight lines - rather than a vectorized description of the locations, shape of a characteristic ‘butterfly’ pattern around the associated lengths and orientations of the line segments in the image. To peak in the Hough map [21], [22], [23], [24], [25]. One major obtain the latter, a parallel network is trained to detect junctions limitation of this approach is that only one segment can be found in the image, and then a somewhat complex process is followed to per line, whereas in built environments it is quite common to find segment the edge map into line segments between junctions. multiple co-linear segments. The Attraction Field algorithm [29] also employs a deep network, but in a rather different way. The key insight is that it is easier for a deep network to map the input image to a dense 2 O UR APPROACH pixel grid of values than to a sparse boundary map. Thus to The advantage of the Hough approach is that it can integrate all adapt the problem of line segment detection to deep networks, evidence for line hypotheses prior to inference. The perceptual each sparse ground truth line segment map is used to generate a grouping approach, on the other hand, allows endpoints to be dense ground truth ‘attraction field’ map (AFM) that represents detected more directly, and permits the identification of multiple the vector displacement to the nearest line segment point at every segments per line. pixel in the image. A network is then trained to estimate this Our two-stage method, an early version of which was pub- dense AFM. At inference, the estimated dense AFM is reduced lished at CVPR 2017 [26], combines the advantages of these two to a sparse line segment map using a ‘squeeze module’ that approaches. In the first stage we employ the probabilistic Hough accumulates votes for line segment pixels by summing discretized method of Tal & Elder [18] to identify globally optimal lines. In displacement vectors over all pixels in the AFM. the second stage we search each of these lines in the image for the The authors experiment with two network architectures: A U- segment(s) that gave rise to it. Net [30] and a modified U-Net, referred to by the authors as The key observation that recommends this approach is that a-trous, that uses the ASSP module of DeepLab v3+ [31] and narrowing the search for segments from the 2D image to 1D lines the skip connections of ResNet [32]. An L1 loss on the AFM is allows the problem to be modeled as the labelling of hidden states employed. in a linear Markov chain model. The problem of determining the As for Wireframe, the AFM approach produces a raster map maximum probability (MAP) assignment of segments can then be of edge pixels that must then somehow be grouped into line shown to have an optimal substructure property that leads to an segments. In the case of AFM, a heuristic, iterative, greedy region- exact dynamic programming solution in linear time. growing approach similar to that used in LSD [12] is employed. The benefits of this approach are several: Both Wireframe and Attraction Field algorithms are trained on 1) Each of the lines identified by a peak in the Hough map the Wireframe training dataset. results from careful accumulation of the global evidence Both Wireframe and AFM are claimed to outperform all prior for the line, and thus will more accurately identify the non-DNN approaches to line segment detection, including our position and orientation (ρ, θ) parameters of the line MCMLSD approach [26]. However, we argue in this paper that the segments than will a few local edges. evaluation performed in these two DNN papers is limited, and that 2) The lines identified by the probabilistic Hough method a more careful analysis reveals that for the problem of line segment have a natural order according to their significance in detection (not edge detection), MCMLSD and indeed some older the Hough map, allowing the line segment search to be non-DNN algorithms substantially outperform both of these DNN limited to the most significant lines. approaches on a number of metrics. Given that these networks 3) In urban scenes, co-linear line segments are common, also involve tens of millions of free parameters, we argue that arising from architectural repetition seen in cladding, more explainable methods such as MCMLSD may be preferable windows, etc. Unlike many Hough methods, our ap- for many applications. proach allows multiple segments to be recovered for each line. 4 P RIOR E VALUATION M ETHODOLOGY 4) Limiting search to a line allows the problem of determin- ing maximum probability segments to be solved exactly, Due in part to a lack of high quality labelled ground truth, using dynamic programming, in linear time. most traditional line segment detection methods were evaluated only qualitatively on real imagery [6], [11], [12], [15], [22]. More recently, quantitative evaluations have been conducted based 3 T HE DEEP LEARNING APPROACH on datasets consisting of pairs of images related by a known Most recently, two deep neural network (DNN) algorithms for homography [17]. This is a promising method, but it does suffer line segment detection have been reported. The Wireframe algo- from two potential drawbacks. First, it is restricted to an analysis rithm [27] is based upon a stacked hourglass network [28] that of co-planar line segments. Second, the evaluation presupposes takes as input a 320 × 320 pixel RGB image and produces as that the goal of line segment detection is for the association of output a 320 × 320 pixel map encoding the estimated locations these segments across images for the purposes of homography 3 or disparity estimation. However there are many other possible applications - single view reconstruction, for example. While task-specific evaluation methodologies may be appro- priate in some cases, it would be nice to have an evaluation method that is more general. In this work we present a new methodology for quantitative evaluation of line segment detectors on real images that does not assume a specific task, using images from the YorkUrbanDB [33] (www.elderlab.yorku.ca/YorkUrbanDB/) and Wireframe (https://github.com/huangkuns/wireframe) datasets. Fig. 1: Orthogonal projections (thin black lines) of all pixels within two pixels of a detected line (thick black line) define an 5 A LGORITHM ordinal sampling of the line i ∈ [1 . . . N ]. Pixels within this band 5.1 Line Detection occupied by edges (shown red on grey) with orientations similar to the line support the assignment of the ON state for the associated One problem with traditional Houghing methods is that noise in segment variable xi at sampled line locations. the observations tends to cause each line to generate multiple peaks in the Hough map. To address this issue we employ a probabilistic Hough transform method [18] (code available from distributions as histograms. (The likelihoods for non-edge obser- elderlab.yorku.ca/resources). The method uses edges detected by vations p(ei = 0|xi = ON, di ) and p(ei = 0|xi = OFF, di ) are the multi-scale Elder & Zucker edge detector [14], models uncer- the complements of the edge likelihoods.) tainty in the location and orientation of the detected edges and propagates this uncertainty to the Hough map. This propagation Figs. 2(c-d) show the probability p(θi |xi , ei = 1) as a of uncertainty produces a smooth Hough map that is roughly function of the angular deviation θi for ON (xi = 1) and OFF resolution invariant and greatly reduces the multiple response (xi = 0) states, respectively. For the ON state we approximate problem. The problem is mitigated further by a sequential line the heavy-tailed distribution as a mixture of a uniform and a extraction step in which each peak in the Hough map is visited Gaussian distribution (shown in red). For the OFF state we employ in descending order of significance, and edges contributing to the a histogram representation. peak are subtracted from the Hough map when it is visited. 5.2 Line Segment Detection Each selected peak in the Hough map identifies a line that extends p(yi) p(yi) from one of the image borders to another. In general, this line is only partially occupied by line segments in the image. The goal is now to find these segments, based on the location and orientation 0 0 of nearby edges. 0 0.8 1.6 0 0.8 1.6 Distance (pixels) Distance (pixels) Prior work [33] suggests that most edges generated by a line (a) (b) and detected by the Elder & Zucker edge detector lie within one pixel of the line. To ensure we capture all edges related to a line we extend our search to all pixels within two pixels of the line (Fig. 1). The orthogonal projections of these pixel locations onto the line then define an ordinal sampling i ∈ [1, . . . , N ] of the line. We let xi represent the binary hidden segment state (ON or OFF) indicating whether a visible segment is present at position i on the line, di the distance from the line to the associated pixel and yi −90 −60 −30 0 30 60 90 -90 -60 -30 0 30 60 90 Angular Error (deg) Angular Error (deg) the associated image observation at that pixel. Each observation (c) (d) yi consists of 1-2 features: 1) A binary variable ei indicating whether an edge exists at Fig. 2: Likelihoods for line segment extraction, learned from the this pixel. YorkUrbanDB training dataset [33]. (a-b) Likelihood p (ei |xi , di ) 2) The angular deviation θi of the edge from the line, if the for distance di of observations from line for (a) ON (xi = 1) edge exists. and (b) OFF (xi = 0) states. (c-d) Probability p (θi |xi , ei ) for the angular deviation θi of observed edges from the line for (c) ON These features provide information about the probable state of (xi = 1) and (d) OFF (xi = 0) states. the line at the associated position: p(yi |xi ) ∝ p(ei = 1|xi , di )p(θi |xi , ei = 1) for edge pixels. Given these observations, we wish to determine the sequence p(yi |xi ) ∝ p(ei = 0|xi , di ) for non-edge pixels. of hidden states x1 , ..., xN that maximizes (Note that we have assumed that the angular deviation θi is independent of the distance di of the pixel from the line.) We learned these distributions from the 640×480 pixel p(x1 , . . . , xN |y1 , . . . , yN ) images and hand-labelled ground truth lines of the YorkUr- ∝ p(y1 , . . . , yN |x1 , . . . , xN )p(x1 , . . . , xN ) (1) banDB training dataset [33]. Figs. 2(a-b) show the likelihoods p(ei = 1|xi = ON, di ) and p(ei = 1|xi = OFF, di ) as func- We assume that, when conditioned on the hidden states xi , the tions of di for ON and OFF states respectively. We represent these observations yi are mutually independent and independent of all 4 xj , j 6= i. We further assume that the hidden states are first order we then have that Markov so that Eqn. 1 becomes C1 (k) = − log (p(y1 |x1 = k)p(x1 = k)) (4) p(x1 , . . . , xN |y1 , . . . , yN ) Ci (k) = min (Ci−1 (j) + ci (j, k)) , i = 2, . . . , N (5) j N Y ∝ p(y1 |x1 )p(x1 ) p(yi |xi )p(xi |xi−1 ) (2) Thus the cost function Ci (k) can be computed sequentially i=2 from i = 1 to i = N in O(N ) time (Fig. 3). In order to recover the maximum probability configuration, an auxiliary data structure The Markov assumption implies an exponential distribution of containing segment lengths; for the YorkUrbanDB training dataset we have verified that this distribution is indeed very close to exponential ŝi (k) = arg min(Ci−1 (j) + ci (j, k)) (6) j for segments down to ∼ 15 pixels in length. (For smaller segments the density falls off, possibly due to difficulties in hand-labelling is maintained, allowing the maximum probability configuration to shorter segments.) be unwound from xN back to x1 . Table 1 shows values for the priors p(x1 ) and p(xi |xi−1 ), es- timated from the 51 640×480 pixel images of the YorkUrbanDB labeled training dataset [33]. (Note that since the probabilities for ci(1,1) ON and OFF states sum to 1 there are only 3 free parameters.) ON C1(1) Ci(1) Ci+1(1) CN(1) We make the approximation that p(xi |xi−1 ) is independent of CN(1) ci(0,1) the variation in spacing between points on the line; since the average segment in the YorkUrbanDB generates more than 500 ci(1,0) point samples, errors due to this approximation tend to average CN(0) OFF C1(0) Ci(0) Ci+1(0) CN(0) out. ci(0,0) The standard errors for these parameter estimates are relatively 1 i i+1 N small, and we have verified that variation within this range has negligible effect on results. While these parameters are specific Fig. 3: The sequence of segment state variables xi are assumed to the YorkUrbanDB dataset and may therefore be sub-optimal to form a Markov chain. To compute the MAP solution we build for other kinds of imagery, they can be generalized to other a trellis table from the first line position i = 1 to the last line image resolutions. Assuming that the number of segments per line position i = N that identifies the minimum cost (negative log and their relative length are functions of the scene and not the probability) to reach either possible state (ON or OFF) at each sensor, p(OF F ) and p(ON ) will be resolution-invariant and the position i. The selected MAP path is shown in red, and the probability of state changes will vary inversely with resolution. resulting ON/OFF states are indicated by the solid/dashed line For example, doubling the resolution to 1280×960 pixels will above the trellis. halve the probability of transition from OFF to ON or ON to OFF. Once a line segment is detected all associated edges (i.e., edges TABLE 1: Prior marginal probabilities p (xi ) and conditional within two pixels of the segment) are removed from the image. transition probabilities p (xi |xi−1 ) for the hidden segment state This serves to reduce the incidence of multiple detections for the xi , derived from the YorkUrbanDB training dataset. same segment. Parameter Mean Std. Err. 6 R ANKING p(OF F ) 0.75 0.0079 p(ON ) 0.25 0.0079 Having extracted MAP segments for each line in the image, we p(OF F |OF F ) 0.9986 0.0001 would like to rank their significance. This will allow downstream p(ON |OF F ) 0.0014 0.0001 applications to select only the number of segments needed to p(ON |ON ) 0.9949 0.0004 p(OF F |ON ) 0.0051 0.0004 support their application, and can serve to eliminate low-ranked noise segments. Our Markov chain model allows us to approach the ranking problem from a probabilistic perspective. In particular, The factoring of the global probability of the line segment we evaluate the following four probabilistic methods for ranking a configuration along the line confers an optimal substructure prop- segment of length M extending from position i to position i + M : erty that allows a dynamic programming solution to the problem Ranking Method 1. Posterior probability of line segment. of finding the maximum a posteriori configuration. In particular, let the cost function Ci (j) represent the minimum negative log p(xi...i+M = ON |yi...i+M ) probability of all sequences {x1 , . . . , xi } ending in state xi = j . This ranking criterion will maximize the expected number of Then the maximum probability sequence of states over the whole segments with no false alarms. line is the sequence that minimizes minj CN (j). Ranking Method 2. Posterior probability of line segment multi- Defining the cost of transitioning from state j at location i − 1 plied by length. to state k at location i as p(xi...i+M = ON |yi...i+M ) ∗ M ci (j, k) = − log (p(yi |xi = k)p(xi = k|xi−1 = j)) , This criterion will maximize the expected total length of segments i = 2, . . . , N (3) with no false alarms. 5 Ranking Method 3. Posterior odds for fully ON vs fully OFF 7.1 Recall as a Function of the Number of Segments configurations. We can compute a measure of recall as the number of ground p(xi...i+M = ON |yi...i+M ) truth point samples matched to detector samples, divided by the p(xi...i+M = OF F |yi...i+M ) total number of ground truth point samples. This measure of recall is problematic if we allow matches without regard to the segments Ranking Method 4. Sum of marginal posterior probabilities for on which the points lie, as it does not penalize under-segmentation ON states. The forward-backward algorithm is used to compute (joining multiple short segments into a single long segment) or the posterior probability at each location. over-segmentation (breaking up a long segment into multiple short i+M segments). X p(xj = ON |yi:i+M ) However, constraining matches to lie on 1:1 associated seg- j=i ments solves both of these problems. In the case of under- segmentation, only one of the shorter ground truth segments This measure reflects the expected number of ON samples on the is matched, leading to a high penalty. In the case of over- segment, and thus will maximize the expected number of correctly segmentation, only one of the detector segments is matched, again labelled locations within the segment. generating a high penalty. Without additional constraints, using recall by itself is still 7 E VALUATION M ETHODOLOGY problematic, as it is biased toward detectors that report a larger number of segments, thereby maximizing the probability of de- It is important to evaluate line segment detection algorithms tecting ground truth points. We address this by comparing recall on real, complex images. Prior evaluations have generally been as a function of the same number k of segments reported. qualitative (i.e., visual). Recent efforts to quantify the evaluation require pairs of images related by a known homography, and are perhaps thus best suited for matching tasks [17]. Here we propose 7.2 Recall as a Function of Total Segment Length an alternative quantitative evaluation methodology that does not There is still a potential bias in this recall-vs-k measure. Neglect- assume the existence of image pairs or known homographies and ing co-linear ground truth segments, the method can be biased thus could be applicable for a broader range of tasks. toward detectors that report segments of maximal length (i.e., Our proposed evaluation method does require an image dataset global lines) as this minimizes the risk of missing ground truth in which important segments have been labelled. Here we employ points. To address this potential bias, our second performance two. The YorkUrbanDB dataset [33], which consists of 102 images measure reports recall as a function of the sum L of the lengths of urban scenes, randomly divided into training and test subsets of detected segments. This severely penalizes detectors that report of 51 images each. In each image, major line segments that over-long segments. conform to one of the three so-called Manhattan directions [34] (i.e., vertical or horizontal and conforming to the main directions 7.3 Precision-Recall of orthogonal walls, streets, etc.) have been identified and labelled Our third and final performance measure is conventional precision- by hand. This database has been used widely to train and evaluate recall. We can take as a measure of precision the number of ground algorithms for vanishing point detection [35], line detection [36] truth point samples matched to detector samples, divided by the and Manhattan frame estimation [18], [33]. We also evaluate on total number of detector point samples. Again, by enforcing a 1:1 the more recent and much larger Wireframe dataset [27], which matching at the segment level, both under-segmentation and over- consists of 5,462 images (5,000 for training, 462 for test) of man- segmentation are penalized. made scenes. To facilitate future comparisons, the code that performs these We assume that the line segment detector under evaluation evaluations as well as the code for the MCMLSD algorithm is returns a list of line segments in ranked order. We sample each available online at elderlab.yorku.ca/resources. ground truth and detector segment uniformly with a sample spacing of one pixel and use these point samples to evaluate the detector as a function of the number k of top-ranked segments 7.4 Limitations of Precision-Based Measures of Perfor- selected, varying k from 10 to 500. mance For each value of k we first identify potential point matches as Since the YorkUrbanDB dataset does not provide a complete those (ground√truth, detector) point pairs lying within a threshold labelling of all segments in an image, detection of a segment that distance of 2 2 pixels of each other. This threshold was selected is not in the dataset does not necessarily represent an error. For to associate any pair of lines that could potentially appear in the this reason, the absolute precision values reported here should image with less than a one-pixel intervening gap. We then sort be interpreted with caution. Nevertheless, since the segments these candidate matches by Euclidean distance and accept matches labelled in the YorkUrbanDB dataset are highly-visible Manhattan in greedy fashion starting with the smallest distance, matching features projecting from prominent structures in the scene, it is each point at most once, and thus arriving at a near-optimal bi- reasonable to expect a superior detector to rank these highly, partite match. Having associated ground truth and detector points, and therefore attain higher relative precision values compared to we employ the Hungarian algorithm [37] to identify the optimal inferior detectors. bipartite match between ground truth and detector segments that The creators of the Wireframe dataset attempted to label ‘all maximizes the total number of points matched. the line segments associated with the scene structures’. Unlike Now that we have a 1:1 association between ground truth the YorkUrbanDB dataset, these are not restricted to Manhattan and detector segments, it remains to evaluate the quality of this lines, and so one expects the dataset to contain a more complete association. We propose three evaluative measures. labelling, potentially allowing for higher precision. However, the 6 authors also avoided labelling line segments in what they con- 1 Method 1 sidered ‘texture’. This includes straight line segments projecting Method 2 Method 3 from regular tiling and cladding patterns on horizontal and vertical 0.8 Method 4 surfaces, which are very common in the built environment. Since these can be quite useful in establishing surface orientation for 0.6 Recall both human and computer vision systems, for many applications one would want a line segment detector to detect these, yet such 0.4 detectors will tend to generate lower precision on the Wireframe dataset. 0.2 Given the limitations of precision measures for these two datasets, we feel it is important to consider multiple different mea- 0 0 100 200 300 400 500 sures of performance when evaluating and comparing algorithms, Number of line segments and so we report performance using all three measures in what follows. Fig. 5: Performance of the four ranking methods described in section 6, as measured by recall vs number of segments returned, on the YorkUrbanDB training dataset. 8 R ESULTS 8.1 Ranking Our first goal is to evaluate the four candidate ranking methods selected by leave-one-out cross-validation to generate a smooth discussed in Section 6. Using the default Hough resolution recom- objective surface (Figure 6). The optimal parameter values using mended by Tal & Elder [18] (∆ρ = 0.2 pixels, ∆θ = 0.1 deg), this approach were found to be ∆ρ= 0.4 pixels and ∆θ= 0.46 deg. we find that our MCMLSD algorithm generates an average of 414 We adopt these parameter values for all subsequent experiments lines and 488 line segments for each 640×480-pixel image of the on both the YorkUrbanDB and Wireframe datasets. YorkUrbanDB training dataset. Note that not all lines generate a segment and some generate several segments. 8.3 Algorithms Evaluated Fig. 4 shows the 10 top-ranked segments produced by each We compare the proposed MCMLSD method against five leading of our four ranking methods on an example image. We find methods:1 that the multiplicative nature of the first criterion favours short high-confidence segments. This problem can be addressed by 1) The slice sampling weighted mean shift (SSWMS) multiplying by segment length (Method 2), forming a contrast method of Nieto et al. [7] between purely ON and purely OFF configurations (Method 3), 2) The widely-used line segment detector (LSD) method of or summing the ON point marginals (Method 4) to estimate the Grompone von Gioi et al. [12] number of correctly labeled points. 3) The linelet-based method (linelet) of Nam-Gyu et al. [38] 4) The deep-learning Wireframe Parser method of Huang et al. [27] 5) The deep-learning Attraction Field method of Xue et al. [29], with a-trous architecture SSWMS. We obtained the code for the SSWMS method from sourceforge.net/projects/lswms. (The authors renamed the method (a) Method 1 (b) Method 2 (c) Method 3 (d) Method 4 LSWMS there.) There are two parameters - we used the author- recommended default values for both: Fig. 4: 10 top-ranked segments for four ranking methods on example image. • orientation threshold ∆θ = 22.5 deg • mean shift bandwidth = 3 pixels Figure 5 shows the recall for each of these ranking meth- The SSWMS algorithm is designed to output segments in descend- ods as a function of the number of segments returned, on the ing order of salience - we therefore use this order to rank the YorkUrbanDB training dataset. The bias toward shorter segments segments. leads to poor recall for Method 1. Methods 2-4 yield much better LSD. We obtained the code for LSD from www.ipol.im/pub/ results and in the sequel we adopt Method 4 as our ranking art/2012/gjmr-lsd/. We rank segments using the criterion recom- method of choice, given its superior performance and intuitive mended by the authors and employed in later work [17], namely probabilistic interpretation. We call the resulting algorithm the in increasing order of the number of expected false alarms, which Markov Chain Marginal Line Segment Detector (MCMLSD) to is one of the outputs of the LSD detector. capture the importance of the Markov chain model of the line Linelet. We obtained the code for the Linelet as well as the probabilistic ranking that maximizes the expected [38] algorithm from https://github.com/NamgyuCho/ number of correctly labelled points on the segment. 1. In the MCMLSD conference paper [26], we also compared against the Progressive Probabilistic Hough Transform (PPHT) method of Matas et 8.2 Hough Resolution al. [20]. We have removed this comparison as the PPHT algorithm has not Having selected the ranking method, we fine-tune the Hough proven to be competitive with more recent methods and there seems to be quite resolution parameters (∆ρ, ∆θ) on the YorkUrbanDB training a diversity in the literature on how exactly is it implemented and parameterized. We add here comparison to the Wireframe Parser [27] and Attraction Field [29] data, computing recall for the top 100 lines over a range of methods, which were published after the MCMLSD conference paper was parameter values and then using kernel regression with bandwidths published. 7 0.1 8.54 6.58 5.61 4.96 4.57 4.3 4.04 3.89 3.74 3.72 0.1 0.1 8 0.698 0.2 5.14 4.23 3.81 3.47 3.27 3.11 3 2.93 2.89 2.89 0.2 0.2 0.485 0.696 0.3 4.11 3.51 3.19 2.95 2.83 2.74 2.7 2.67 2.7 2.64 7 0.3 0.694 0.3 0.48 0.4 3.62 3.15 2.88 2.76 2.65 2.6 2.54 2.51 2.48 2.52 (pixels) 0.4 0.692 0.4 6 (pixels) (pixels) 0.5 3.35 2.92 2.72 2.6 2.53 2.48 2.44 2.44 2.44 2.45 0.5 0.69 0.5 0.475 0.6 3.1 2.77 2.6 2.51 2.46 2.39 2.41 2.37 2.38 2.43 5 0.6 0.688 0.6 0.7 3.02 2.73 2.6 2.52 2.44 2.43 2.42 2.41 2.41 2.43 0.7 0.686 0.7 0.47 0.8 2.97 2.7 2.56 2.46 2.43 2.4 2.37 2.34 2.37 2.38 4 0.8 0.684 0.8 0.465 0.9 2.9 2.65 2.54 2.48 2.45 2.38 2.38 2.36 2.37 2.39 0.9 0.682 0.9 3 1 2.88 2.76 2.67 2.53 2.4 2.38 2.36 2.36 2.35 2.39 1 0.68 1 0.46 15 25 35 45 55 1 2 3 4 5 0. 0. 0. 0. 0. 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0. 0. 0. 0. 0. (deg) (deg) (deg) (a) (b) (c) Fig. 6: Performance and run-time analysis of Hough map resolution. (a) Mean recall over number of segments k = 1, . . . , 500 returned. (b) Mean precision over number of segments k = 1, . . . , 30 returned. (c) Corresponding run time of MCMLSD algorithm per image (sec). Linelet-code-and-YorkUrban-LineSegment-DB. We rank MCMLSD is roughly 140% higher than for the Wireframe Parser segments using the criterion recommended by the authors. algorithm and 90% higher than for the Attraction Field algorithm. Wireframe Parser. Xue et al. [29] provide the segments generated Analysis of each of the three performance measures yields by the Wireframe Parser for both YorkUrbanDB and Wireframe additional insights. Fig. 9(a) shows that if a constraint is placed test sets; we use these to compute performance. Since the authors on the number of segments returned, e.g., to limit complexity do not specify a ranking method, we rank the segments in for downstream analysis, MCMLSD consistently achieves higher descending order of segment length. recall. While the deep Attraction Field algorithm is competitive Attraction Field. As for the Wireframe Parser, Xue et al. [29] with the traditional LSD and Linelet algorithms for very tight provide the segments generated by the Attraction Field method constraints (fewer than 100 segments), it falls behind as this for both YorkUrbanDB and Wireframe test sets, so we use these constraint is relaxed. to compute performance. We rank the segments using the criterion The story is a little different if the constraint is placed on the recommended by the authors. total segment length rather than the total number of segments (Fig. 9(b)). Here we see that while MCMLSD vastly outperforms the other methods for more relaxed constraints (more than 104 pixels), 8.4 Qualitative Results for tighter constraints, the Attraction Field, LSD and Linelet Fig. 7 shows the top-ranked 90 segments returned by each algo- algorithms become slightly superior. This can be accounted for rithm on four example images from the YorkUrbanDB test dataset. by the tendency of MCMLSD to extract longer segments than the To our eyes, the Attraction Field and MCMLSD results look Attraction Field, LSD and Linelet algorithms. strongest, but in complementary ways. While the Attraction Field Finally, Fig. 9(c) shows that the algorithm of choice very method appears more adept at picking out short segments (e.g., much depends upon the relative value of precision and recall the windows in the first example), MCMLSD is more successful for a particular application. If recall greater than 0.4 is required, at recovering the longer segments (e.g., the lines on the ground MCMLSD is clearly preferred. However, if precision greater than plane in the first three examples). 0.45 is required, then recall must be sacrificed and the Attraction Interestingly, in the second example the Attraction Field Field or Linelet algorithms are preferred. It should be remembered, method succeeds in detecting some of the line segments projecting however, that since the YorkUrbanDB ground truth is incomplete, from the tiling pattern on the floor, despite being trained (on the lower precision may be due to detection of useful segments that Wireframe dataset) to ignore these. just do not happen to be labelled in the ground truth. 8.5.2 Wireframe Test Set 8.5 Quantitative Results Fig. 10 shows the same evaluation on the Wireframe test set. The 8.5.1 YorkUrbanDB Test Set maximum recall achieved by MCMLSD is 0.75, almost as high as Fig. 8(a) shows the mean length of ranked line segments returned for YorkUrbanDB, even without fine-tuning of parameters or dis- by each of the six algorithms, compared to the ground truth tributions, indicating good generalization ability. The performance line segment lengths. Although the algorithms generally rank advantage is smaller than for YorkUrbanDB, but MCMLSD is still longer segments higher, all ultimately return segments that are 26% better than its closest competitors. In terms of maximum on average shorter than the ground truth segments. Consistent recall, the Attraction Field algorithm is now competitive with with the qualitative observations above, MCMLSD tends to return the LSD and Linelet algorithms. MCMLSD still leads the pack longer segments than the other approaches. when the number of line segments is constrained (Fig. 10(a)). Fig. 9 provides a quantitative comparison of all six algorithms As for the YorkUrbanDB dataset, when total line segment length on the YorkUrbanDB test set. MCMLSD achieves a maximum is constrained, MCMLSD dominates in the high-recall regime. recall of 0.8, roughly 45% better than the LSD and Linelet meth- However, for the Wireframe dataset, the Attraction Field algorithm ods. Interestingly, MCMLSD outperforms the more recent deep now dominates in the low-recall regime. Fig. 10(c) tells a similar learning algorithms by an even larger margin - maximum recall for story for precision-recall: MCMLSD dominates in the high-recall 8 SSWMS LSD Linelet Wireframe Attraction Field MCMLSD Ground Truth Fig. 7: Top 90 segments returned by the six algorithms under evaluation, together with hand-labelled ground truth, for four example test images drawn from the YorkUrbanDB dataset. 9 350 250 SSWMS SSWMS Mean line segment length Mean line segment length 300 LSD LSD Linelet 200 Linelet 250 WireframeParser WireframeParser Attraction Field Attraction Field 200 MCMLSD 150 MCMLSD Ground Truth Ground Truth 150 100 100 50 50 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of line segments Number of line segments (a) (b) Fig. 8: Mean length of ranked line segments returned by each algorithm for (a) YorkUrbanDB and (b) Wireframe test sets, as a function of the number of segments returned. Ground truth line segments are ranked from longest to shortest. 1 1 1 SSWMS LSD 0.8 0.8 0.8 Linelet WireframeParser Attraction Field Precision 0.6 0.6 0.6 MCMLSD Recall Recall 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 200 400 600 800 1000 0 0.5 1 1.5 2 2.5 0 0.2 0.4 0.6 0.8 1 Number of line segments Total line segment length (pixels) 104 Recall (a) (b) (c) Fig. 9: Performance of the six algorithms under evaluation on the YorkUrban Dataset. (a) Recall as a function of number of segments returned. (b) Recall as a function of the total length of segments returned. (c) Precision-Recall. regime, but the Attraction Field method is superior in the low- line segments but from superior ranking. This in turn suggests that recall (high-precision) regime. The improved performance of the the performance of other methods such as MCMLSD in the low- Attraction Field method for the Wireframe dataset relative to the recall regime might be improved by adopting a revised ranking YorkUrbanDB dataset is not surprising, as it was trained on the strategy. Wireframe training partition. One limitation of the ranking strategy adopted in our original CVPR paper [26] is that it considers only the location and 8.5.3 Ranking Revisited orientation of edges detected by the Elder & Zucker multiscale What accounts for the superiority of MCMLSD in the high- edge detector [14], which employs a signal detection approach recall regime, and the superiority of the Attraction Field algorithm based only on the local luminance contrast. This ignores local (and, for YorkUrbanDB, the LSD and Linelet algorithms) in the colour and texture cues that can signal the relative importance of low-recall regime? One possible factor is the quality of the line these edges. segments they return. But another possible factor is the ranking To incorporate this appearance information, we employ the employed by each method. To dissociate these two factors, we em- structured forests edge detector of Dollár and Lawrence [39] ployed an oracle to rank the segments returned by each algorithm (code obtained from https://github.com/pdollar/edges), which was for the YorkUrbanDB dataset according to ground truth precision. trained on the BSDS 500 dataset to use the local pattern of Specifically, after 1:1 association with ground truth segments, the colours and textures to identify edges delineating “distinguished algorithm segments are ranked in terms of the proportion of their things”, as judged by human observers [40]. Our hypothesis is points that have a 1:1 ground truth match. Ties are resolved by that the output of the structured forests edge detector will thus length, with longer segments ranked first. carry appearance cues complementary to our probabilistic ranking The results are illuminating (Fig. 11). While MCMLSD nec- measure, which is based solely on the geometry of the edges. essarily still achieves highest recall, and still dominates when the To test this hypothesis, we construct a logistic regressor that number of line segments is constrained, the precision advantage takes both of these cues as input to predict the precision of each of the Attraction Field method in the low-recall regime has segment, train the regressor on the YorkUrbanDB training set, and evaporated. This indicates that the advantage of the Attraction then use it to rank segments in the test set. Since we are interested Field algorithm in the low-recall regime derives not from superior in improving the precision of the detector, we employ a modified 10 1 1 1 0.8 0.8 0.8 Precision 0.6 0.6 0.6 Recall Recall 0.4 0.4 SSWMS 0.4 LSD Linelet 0.2 0.2 WireframeParser 0.2 Attraction Field MCMLSD 0 0 0 0 200 400 600 800 1000 0 0.5 1 1.5 2 0 0.2 0.4 0.6 0.8 1 Number of line segments Total line segment length (pixels) 104 Recall (a) (b) (c) Fig. 10: Performance of the proposed MCMLSD methods compared with the state of the art on the Wireframe dataset [27]. (a) Recall as a function of number of segments returned. (b) Recall as a function of the total length of segments returned. (c) Precision-Recall. 1 1 1 SSWMS LSD 0.8 0.8 Linelet 0.8 WireframeParser Attraction Field Precision 0.6 0.6 MCMLSD 0.6 Recall Recall 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 200 400 600 800 1000 0 5000 10000 15000 0 0.2 0.4 0.6 0.8 1 Number of line segments Total line segment length (pixels) Recall (a) (b) (c) Fig. 11: Performance on the YorkUrbanDB dataset when an oracle is used to rank the segments by their precision. (a) Recall as a function of number of segments returned. (b) Recall as a function of the total length of segments returned. (c) Precision-Recall. version of Ranking Method 4 (Section 6), using the mean of the 8.6.1 Distance Threshold √ marginal probabilities along the segment instead of the sum. To In our evaluations, we employ a distance threshold of 2 2 pixels, form the appearance cue we average the scalar responses of the to associate any pair of lines that could potentially appear in the structured forests edge detector at the locations of the Elder & image with less than a one-pixel intervening gap. This seems like a Zucker edges within a 2-pixel distance of the line segment. reasonable threshold, able to account for small localization errors Fig. 12 shows results of the MCMLSD algorithm using this in edge detection due to pixel discretization. However, in the deep revised ranking strategy (dubbed MCMLSD2), alongside the orig- network papers, a threshold of 1% of diagonal image size was inal MCMLSD algorithm and the five competitors. We see that employed, which for the YorkUrbanDB results in a threshold of 8 with this revised ranking strategy, MCMLSD2 loses some recall pixels, 2.8 times the threshold we employed. This looser threshold performance when the number of line segments is constrained is convenient for the deep networks, which by necessity use sub- (Fig. 12(a)), but is still vastly superior to the other methods. At sampled images and struggle to localize segments with precision. the same time, the precision of MCMLSD2 matches that of the Fig. 13 shows an example. Attraction Field, LSD and Linelet algorithms in the low-recall To assess the importance of this threshold, we re-evaluated regime, and is far superior in the high-recall regime (Fig. 12(c)). all algorithms using the looser threshold of 8 pixels. Fig. 14 shows the results for the YorkUrbanDB dataset. We see that as 8.6 Reconciling with Recent Evaluations we loosen the threshold, performance rises for all algorithms, The results above may at first seem puzzling, since they seem but the performance of the deep algorithms (Attraction Field and to contradict the evaluations reported in recent papers that claim Wireframe Parser) rises disproportionately, confirming the poorer superiority of the deep Wireframe Parser and Attraction Field localization performance of these methods. methods [27], [29]. This contradiction is due to differences in how algorithms were evaluated. We laid out our evaluation methods in 8.6.2 Enforcing 1:1 Matches Section 7 of this paper and in our original CVPR paper [26]. There While the looser distance threshold clearly helps the deep algo- are two key deviations between our evaluation approach and the rithms, Fig. 14 makes clear that this alone cannot fully account approach employed in the Wireframe and Attraction Field papers for the claim that deep networks uniformly perform better than that account for this contradiction. MCMLSD and earlier algorithms such as LSD and Linelet. Here 11 1 1 1 SSWMS LSD 0.8 0.8 0.8 Linelet WireframeParser Attraction Field Precision 0.6 0.6 0.6 Recall MCMLSD Recall MCMLSD2 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 200 400 600 800 1000 0 0.5 1 1.5 2 2.5 0 0.2 0.4 0.6 0.8 1 Number of line segments Total line segment length (pixels) 104 Recall (a) (b) (c) Fig. 12: Results including MCMLSD2, which uses the structured forests edge detector [39] to incorporate local appearance cues when ranking segments. (a) Recall as a function of number of segments returned. (b) Recall as a function of the total length of segments returned. (c) Precision-Recall. (a) (b) Fig. 13: Crop of example YorkUrbanDB test result for the (a) Attraction Field and (b) MCMLSD algorithms. Observe the alignment errors of some of the segments returned by the Attraction Field algorithm. 1 1 1 0.8 0.8 0.8 Precision 0.6 0.6 0.6 Recall Recall 0.4 0.4 SSWMS 0.4 LSD Linelet 0.2 0.2 WireframeParser 0.2 Attraction Field MCMLSD 0 0 0 0 200 400 600 800 1000 0 0.5 1 1.5 2 2.5 0 0.2 0.4 0.6 0.8 1 Total line segment length (pixels) 10 4 Number of line segments Recall (a) (b) (c) Fig. 14: Evaluation on the YorkUrbanDB dataset using the looser distance threshold of 8 pixels employed in the Wireframe [27] and Attraction Field [29] papers. (a) Recall as a function of number of segments returned. (b) Recall as a function of the total length of segments returned. (c) Precision-Recall. 12 we address a more serious issue that gets to the heart of what we 8.7 Summary of Quantitative Results mean by line segment detection. To summarize, the relative performance of line segment detection In both the Wireframe and Attraction Field papers, a very algorithms very much depends on how performance is measured. simple method is employed to match algorithm segments and Recent deep learning papers have loosened distance thresholds and ground truth: Points on detected segments that lie within 8 pixels not enforced 1:1 matching constraints, and under these conditions of a ground truth segment are identified as hits. Normalizing by they achieve higher precision, although still inferior recall. It is the total length of the detected segments and the ground truth possible that there are some applications for which this measure of segments forms the precision and recall measures, respectively. performance is appropriate. For example, one may attempt to use only a pixel-level Hausdorff distance to register two images or to In Section 7 of this paper and in our original CVPR paper [26], register an image to a CAD model. However, for most downstream we were careful to articulate the problems with this simplistic applications, e.g., single-view 3D reconstruction [41], [42], an approach. First, when matching points on detected segments with organization of points into accurate line segments is desirable, points on ground truth segments, it is important that these matches and to evaluate this one must impose 1:1 matching constraints. be 1:1. In other words, the same ground truth point should not Fig. 17 (copied from Fig. 12 for convenience) summarizes be used to generate hits for multiple points on detected segments. performance relevant to √ these requirements, specifically for a Similarly, the same point on a detected segment should not be distance threshold of 2 2 to ensure accuracy, and a 1:1 match- matched to multiple ground truth points. Importantly, this con- ing constraint imposed at the segment level. Here we see that straint penallizes algorithms that generate multiple detections for by any of the three measures of performance, one of the two a single ground truth segment, or that confuse two neighbouring versions of MCMLSD is recommended. If the number of output ground truth segments as a single segment. Note that not enforcing lines is to be restricted and recall is the priority, the original this constraint leads to pathological results. For example, an MCMLSD algorithm vastly outperforms other methods. However, algorithm that generates dense, 16-pixel wide regions of filled if precision-recall performance is the priority, then MCMLSD2 is pixels centred on the ground truth segments will be evaluated to recommended, as it matches or surpasses the performance of all have perfect precision and perfect recall. other methods in the low-recall regime while vastly outperforming in the high-recall regime. However, enforcing 1:1 matches between points is not enough. The problem of line segment detection is not to detect isolated edges but to recover the continuous line segments present in an 9 I MAGE R ESOLUTION image. The output line segments can be coded in various ways, One limitation of current deep learning methods is that the compu- e.g., by the 2D locations of their endpoints. Critically, the output tational load for learning and inference may become untenable for is more than an edge map: each line segment is a higher-level higher image resolutions. In contrast, the MCMLSD algorithm organization of edge points into a more global representation. adapts well to different image resolutions without fine-tuning This means that to fairly evaluate a line segment detector, the as long as the transition probabilities are scaled appropriately 1:1 constraint must be applied at the segment level, not at the pixel (Section 5.2). For example, doubling the resolution requires that level. As articulated in Section 7 of this paper and in our original the transition probabilities from OFF to ON and from ON to OFF CVPR paper [26], this is critical in order to penalize under- and be halved. over-segmentation. Again, not imposing this constraint will lead Fig. 18 shows the top 90 segments returned for an example to pathological results. For example, an algorithm that returns a image from the York UrbanDB dataset at normal (640×480 pixel) scatter of tiny line segments that are all only one pixel long but and high (1280×960 pixel) resolutions. Note that the algorithm lie within the distance threshold of ground truth and account for is able to take advantage of the higher resolution to deliver more all ground truth points will generate perfect precision and recall complete and accurate segments. scores. To assess the importance of this segment-level 1:1 matching 10 R UN T IME constraint, we re-evaluated all algorithms without this constraint, The dynamic programming√ solution for line segment detection i.e., using the simple matching method employed in the Wireframe runs in O (N ) = O ( n) time, where N is the number of point Parser [27] and Attraction Field [29] papers, and also using the samples on the line and n is the number of pixels in the image. looser distance threshold employed in these papers. As shown in Given a set of m detected lines, √ the total time complexity of line Figs. 15 and 16, despite this relaxation in the evaluation criteria, segment extraction is O (m n). MCMLSD still outperforms the deep learning algorithms in terms Table 2 shows the average run time for the six algorithms of maximum recall and recall as a function of the number of line tested here on the 640×480 pixel images of the YorkUrbanDB segments returned. However, the authors of these deep learning training dataset. The SSWMS, LSD, Linelet and MCMLSD algo- papers did not report these measures of performance but only rithm were tested using a 3.4 GHz Intel Core i7 with 8GB RAM. the precision-recall curves shown in Figs. 15(c) and 16(c). Here The deep network Wireframe and Attraction Field algorithms we see that removing the 1:1 matching constraint particularly were tested using an NVIDIA Titan X GPU with Xeon E5-2620 advantages the deep Wireframe and Attraction Field algorithms, 2.10GHz CPU. leading to clear superiority of the Attraction Field method in the The MATLAB implementation of our MCMLSD algorithm low-recall regime, although MCMLSD and Linelet methods still has an average run time of 2.81 sec per image. Aside from the achieve much higher recall. But again, we reminder the reader of Linelet algorithm, which is very slow, the other algorithms are the limitations of precision measures for these incomplete datasets optimized and implemented in C++, returning results within a few (Section 7). hundred milliseconds. 13 1 1 1 0.8 0.8 0.8 Precision 0.6 0.6 0.6 Recall Recall 0.4 0.4 SSWMS 0.4 LSD Linelet 0.2 0.2 WireframeParser 0.2 Attraction Field MCMLSD 0 0 0 0 200 400 600 800 1000 0 0.5 1 1.5 2 2.5 0 0.2 0.4 0.6 0.8 1 Total line segment length (pixels) 10 4 Number of line segments Recall (a) (b) (c) Fig. 15: Performance of MCMLSD compared with the state of the art on the YorkUrban Dataset using pixel level evaluation. (a) Recall as a function of number of segments returned. (b) Recall as a function of the total length of segments returned. (c) Precision-Recall. 1 1 1 0.8 0.8 0.8 Precision 0.6 0.6 0.6 Recall Recall 0.4 0.4 SSWMS 0.4 LSD Linelet 0.2 0.2 WireframeParser 0.2 Attraction Field MCMLSD 0 0 0 0 200 400 600 800 1000 0 0.5 1 1.5 2 0 0.2 0.4 0.6 0.8 1 Total line segment length (pixels) 10 4 Number of line segments Recall (a) (b) (c) Fig. 16: Performance of MCMLSD methods compared with the state of the art on the Wireframe dataset using pixel level evaluation. (a) Recall as a function of number of segments returned. (b) Recall as a function of the total length of segments returned. (c) Precision- Recall. 1 1 1 SSWMS LSD 0.8 0.8 0.8 Linelet WireframeParser Attraction Field Precision 0.6 0.6 0.6 Recall MCMLSD Recall MCMLSD2 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 200 400 600 800 1000 0 0.5 1 1.5 2 2.5 0 0.2 0.4 0.6 0.8 1 Number of line segments Total line segment length (pixels) 104 Recall (a) (b) (c) Fig. 17: Summary of results. (a) Recall as a function of number of segments returned. (b) Recall as a function of the total length of segments returned. (c) Precision-Recall. 14 (a) 640×480 pixels (b) 1280×960 pixels Fig. 18: Top 90 segments for MCMLSD on an example image at low and high resolutions. About 63% of our run time is taken by the probabilistic Hough state-of-the-art by a substantial margin. The code for MCMLSD method for line extraction [18], which we believe could be sped up and our evaluation method is available at www.elderlab.yorku.ca/ considerably with more efficient coding practices and implementa- resources. tion in C or C++. There are also many opportunities for mapping to parallel hardware, as edge detection is dominated by convolutions ACKNOWLEDGEMENTS and in the dynamic programming line segment detection stage, lines separated by more than 4 pixels are processed independently. This research was supported by an NSERC Discovery grant and by the NSERC CREATE Training Program in Data Analytics & Visualization. TABLE 2: Average number of segments returned and run time per image for the six systems evaluated. R EFERENCES Algorithm # Segments Run Time (sec) [1] C. Schmid and A. Zisserman, “Automatic line matching across views,” SSWMS 391 0.30 in Proc. IEEE Conference on Computer Vision and Pattern Recognition, LSD 537 0.27 1997, pp. 666–671. 1 Linelet 967 34.5 [2] J. Košecká and W. Zhang, “Video compass,” in Proceedings of the Wireframe 228 0.446 European Conference on Computer Vision (ECCV), 2002, pp. 476–490. Attraction Field 303 0.152 1 MCMLSD 488 2.81 [3] P. Parodi and G. Piccioli, “3D shape reconstruction by using vanishing points,” IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 18, no. 2, pp. 211–217, 1996. 1 [4] L. Zhang and R. Koch, “Structure and motion from line correspondences: representation, projection, initialization and sparse bundle adjustment,” 11 C ONCLUSIONS Journal of Visual Communication and Image Representation, vol. 25, no. 5, pp. 904–915, 2014. 1 We have developed and evaluated a novel method for line segment [5] M. Hofer, M. Maurer, and H. Bischof, “Efficient 3D scene abstraction detection called MCMLSD that combines the advantages of global using line segments,” Computer Vision and Image Understanding, vol. 157, pp. 167–178, 2017. 1 probabilistic Hough methods for line detection with spatial anal- [6] M. Boldt, R. Weiss, and E. Riseman, “Token-based extraction of straight ysis in the image domain to identify segments. The key insight lines,” IEEE Transactions on Systems, Man and Cybernetics, vol. 19, is that limiting segment search to Hough-detected lines leads no. 6, pp. 1581–1594, 1989. 1, 2 naturally to a Markov chain formulation that allows maximum [7] M. Nieto, C. Cuevas, L. Salgado, and N. Garcı́a, “Line segment detection using weighted mean shift procedures on a 2D slice sampling strategy,” probability solutions to be computed exactly in linear time. Our Pattern Analysis and Applications, vol. 14, no. 2, pp. 149–163, 2011. 1, method also has the advantage that it can detect multiple segments 6 lying on the same line, a common scenario for images of the [8] X. Lu, J. Yao, K. Li, and L. Li, “Cannylines: A parameter-free line seg- built environment. This formulation leads to a natural probabilistic ment detector,” in IEEE International Conference on Image Processing, 2015, pp. 507–511. 1 measure for ranking segments based upon the sum over point [9] X. Liu, Z. Cao, N. Gu, S. Nahavandi, C. Zhou, and M. Tan, “Intelligent marginals, which maximizes the expected number of correctly line segment perception with cortex-like mechanisms,” IEEE Transac- labelled points on detected lines. tions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 12, pp. 1522–1534, 2015. 1 A second contribution is our new methodology for evaluating [10] D. Guru, B. Shekar, and P. Nagabhushan, “A simple and robust line detec- line segment detectors on an incomplete labelled dataset. By tion algorithm based on small eigenvalue analysis,” Pattern Recognition constraining matches between ground truth and detector out- Letters, vol. 25, no. 1, pp. 1–13, 2004. 1 put to be 1:1 at the segment level, we show that under- and [11] D. Liu, Y. Wang, Z. Tang, and X. Lu, “A robust and fast line segment detector based on top-down smaller eigenvalue analysis,” in Fifth Inter- over-segmentation are penalized appropriately. Using this new national Conference on Graphics and Image Processing. International evaluation methodology we find that MCMLSD outperforms the Society for Optics and Photonics, 2014, pp. 906 916–906 916. 1, 2 15 [12] R. G. von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “LSD: A fast [35] J.-P. Tardif, “Non-iterative approach for fast and accurate vanishing point line segment detector with a false detection control,” IEEE Transactions detection,” in IEEE International Conference on Computer Vision, 2009, on Pattern Analysis & Machine Intelligence, no. 4, pp. 722–732, 2008. pp. 1250–1257. 5 1, 2, 6 [36] O. Barinova, V. Lempitsky, and P. Kholi, “On detection of multiple [13] A. Desolneux, L. Moisan, and J.-M. Morel, “Meaningful alignments,” object instances using Hough transforms,” IEEE Transactions on Pattern International Journal of Computer Vision, vol. 40, no. 1, pp. 7–23, 2000. Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1773–1784, 2012. 1 5 [14] J. H. Elder and S. W. Zucker, “Local scale control for edge detection and [37] H. Kuhn, “The Hungarian method for the assignment problem,” Naval blur estimation,” IEEE Transactions on Pattern Analysis and Machine Research Logistics Quarterly, vol. 2, pp. 83–97, 1955. 5 Intelligence, vol. 20, no. 7, pp. 699–716, 1998. 1, 3, 9 [38] N. Cho, A. Yuille, and S. Lee, “A novel linelet-based representation [15] C. Akinlar and C. Topal, “EDLines: A real-time line segment detector for line segment detection,” IEEE Transactions on Pattern Analysis and with a false detection control,” Pattern Recognition Letters, vol. 32, Machine Intelligence, vol. 40, no. 5, pp. 1195–1208, May 2018. 6 no. 13, pp. 1633–1642, 2011. 1, 2 [39] P. Dollár and C. L. Zitnick, “Fast edge detection using structured forests,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, [16] J. Kim and S. Lee, “Extracting major lines by recruiting zero-threshold no. 8, pp. 1558–1570, 2014. 9, 11 Canny edge links along Sobel highlights,” IEEE Signal Processing [40] D. Martin, C. Fowlkes, and J. Malik, “Learning to detect natural image Letters, vol. 22, no. 10, pp. 1689–1692, 2015. 1 boundaries using local brightness, color and texture cues,” IEEE Trans. [17] M. Brown, D. Windridge, and J. Guillemaut, “A generalisable framework on Pattern Analysis and Machine Intelligence, vol. 26, no. 5, pp. 530– for saliency-based line segment detection,” Pattern Recognition, vol. 48, 549, May 2004. 9 pp. 3993–4011, 2015. 1, 2, 5, 6 [41] S. Ramalingam and M. Brand, “Lifting 3D Manhattan lines from a single [18] R. Tal and J. H. Elder, “An accurate method for line detection and image,” in International Conference on Computer Vision, 2013, pp. 497– Manhattan frame estimation,” in Asian Conf. Comput. Vis. Workshops, 504. 12 2013, pp. 580–593. 2, 3, 5, 6, 14 [42] Y. Qian, S. Ramalingham, and J. Elder, “LS3D: Single-view Gestalt 3D [19] N. Guil, J. Villalba, and E. L. Zapata, “A fast Hough transform for surface reconstruction from manhattan line segments,” in Proceedings of segment detection,” IEEE Transactions on Image Processing, vol. 4, the Asian Conference on Computer Vision (ACCV), 2018, pp. 399–416. no. 11, pp. 1541–1548, 1995. 2 12 [20] J. Matas, C. Galambos, and J. Kittler, “Robust detection of lines using the progressive probabilistic Hough transform,” Computer Vision and Image Understanding, vol. 78, no. 1, pp. 119–137, 2000. 2, 6 [21] V. Kamat-Sadekar and S. Ganesan, “Complete description of multiple line segments using the Hough transform,” Image and Vision Computing, vol. 16, no. 9, pp. 597–613, 1998. 2 [22] Y. Furukawa and Y. Shinagawa, “Accurate and robust line segment extraction by analyzing distribution around peaks in Hough space,” Computer Vision and Image Understanding, vol. 92, pp. 1–25, 2003. James H. Elder received the BASc degree 2 in electrical engineering from the University of [23] Z. Xu, B.-S. Shin, and R. Klette, “Accurate and robust line segment British Columbia in 1987 and the PhD degree in extraction using minimum entropy with Hough transform,” IEEE Trans- electrical engineering from McGill University in actions on Image Processing, vol. 24, no. 3, pp. 813–822, 2015. 2 1995. From 1995 to 1996, he was a senior re- [24] Z. Xu, B.-S. Shin, and Klette, “Closed form line-segment extraction using search associate at the NEC Research Institute the Hough transform,” Pattern Recognition, vol. 48, no. 12, pp. 4012– in Princeton, New Jersey. He joined the faculty 4023, 2015. 2 of York University, Canada, in 1996, where he is [25] Z. Xu, B. Shin, and R. Klette, “A statistical method for line segment presently Professor and York Research Chair in detection,” Computer Vision and Image Understanding, vol. 138, pp. 61– Human and Computer Vision, jointly appointed 73, 2015. 2 to the Department of Electrical Engineering and Computer Science and the Department of Psychology. His research [26] E. J. Almazen, R. Tal, Y. Qian, and J. H. Elder, “A dynamic programming seeks to improve machine vision systems through a better understand- approach to line segment detection,” in IEEE Conference on Computer ing of visual processing in biological systems, with a current focus on Vision and Pattern Recognition, 2017, pp. 2031–2039. 2, 6, 9, 10, 12 natural scene statistics, perceptual organization, contour processing, [27] K. Huang, Y. Wang, Z. Zhou, T. Ding, S. Gao, and Y. Ma, “Learning shape perception, single-view 3D reconstruction, attentive vision sys- to parse wireframes in images of man-made environments,” in IEEE tems and machine vision systems for dynamic 3D urban awareness. Conference on Computer Vision and Pattern Recognition, June 2018. 2, 5, 6, 10, 11, 12 [28] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in European Conference on Computer Vision, 2016, pp. 483–499. 2 [29] N. Xue, S. Bai, F. Wang, G.-S. Xia, T. Wu, and L. Zhang, “Learning attraction field representation for robust line segment detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2, 6, 7, 10, 11, 12 Emilio J. Almazàn received the BASc degree in [30] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net- computer science and MSc in computer vision works for biomedical image segmentation,” in Medical Image Computing from Rey Juan Carlos University in 2005 and and Computer Assisted Intervention (MICCAI), ser. LNCS, vol. 9351. 2011 respectively, and the PhD in computer vi- Springer, 2015, pp. 234–241. 2 sion from Kingston University in 2014. He was [31] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder- a postdoctoral fellow at the Center for Vision decoder with atrous separable convolution for semantic image segmenta- Research, York University in 2015, since then tion.” in European Conference on Computer Vision, V. Ferrari, M. Hebert, he has been working as a computer vision re- C. Sminchisescu, and Y. Weiss, Eds., 2018, pp. 833–851. 2 searcher in the R&D department at Nielsen. In [32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for 2019 he was appointed lead data scientist of image recognition,” in IEEE Conference on Computer Vision and Pattern the same group. His research interests include Recognition, 2016, pp. 770–778. 2 image categorization, object detection and understanding and interpre- [33] P. Denis, J. H. Elder, and F. Estrada, “Efficient edge-based methods for tation of convolutional neural networks. Recent work has focused on estimating Manhattan frames in urban imagery,” in Proceedings of the combining natural language processing and vision. European Conference on Computer Vision, 2008, pp. 197–210. 3, 4, 5 [34] J. M. Coughlan and A. L. Yuille, “Manhattan world: Orientation and outlier detection by Bayesian inference,” Neural Computation, vol. 15, no. 5, pp. 1063–1088, 2003. 5 16 Yiming Qian received the BASc and MASc de- gree in electrical engineering from Ryerson Uni- versity in 2012 and 2014. He is currently a PhD student in Computer Science at York University. From 2012 to 2015, he was an R&D engineer at Siemens Energy Management. His research interests include single-view 3D reconstruction, image processing, signal processing, manufac- turing optimization and data mining. Ron Tal received the MASc degree in computer engineering from York University in 2011. Fol- lowing his graduate work, Ron worked at Uber, where he was a founding member of Michelan- gelo - Uber’s comprehensive machine learning service. Currently he is a senior software engi- neer in the Machine Learning Platform team at Coinbase Inc.

(PDF) MCMLSD: A Probabilistic Algorithm and Evaluation Framework for Line Segment Detection