Automatic Video Segmentation Employing Object/Camera Modeling Techniques PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op gezag van de Rector Magnificus, prof.dr.ir. C.J. van Duijn, voor een commissie aangewezen door het College voor Promoties in het openbaar te verdedigen op donderdag 15 december 2005 om 16.00 uur door Dirk Sven Farin geboren te Tübingen, Duitsland
Dit proefschrift is goedgekeurd door de promotoren: prof.dr.ir. P.H.N. de With en Prof. Dr.-Ing. W.W.J. Effelsberg CIP-DATA LIBRARY TECHNISCHE UNIVERSITEIT EINDHOVEN Farin, Dirk S. Automatic video segmentation employing object/camera modeling / by Dirk Sven Farin. - Eindhoven : Technische Universiteit Eindhoven, 2005. 2 volumes. Proefschrift. - ISBN 90-386-2381-X NUR 959 Trefw.: videotechniek / digitale beeldverwerking / beeldcodering / beeldherkenning. Subject headings: image segmentation / motion estimation / computer vision / object detection. Schlagwörter: Videosegmentierung / Bewegungsschätzung / Objekterkennung. c Copyright 2005 Dirk Farin All rights are reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission from the copyright owner.
Contents 1 Introduction 1 1.1 Motivation........................... 2 1.1.1 Video editing and scene composition......... 2 1.1.2 Object-oriented video coding............. 3 1.1.3 Automatic video analysis............... 5 1.1.4 3-D analysis and reconstruction............ 7 1.2 The video-object segmentation problem........... 8 1.3 Object-oriented video coding in MPEG-4.......... 11 1.4 Automatic video segmentation system (Thesis Part I)... 13 1.4.1 Design goals....................... 13 1.4.2 Segmentation-algorithm overview........... 14 1.4.3 Framework of the segmentation algorithm...... 14 1.5 Extensions to the segmentation system............ 19 1.5.1 Segmentation using object models (Thesis Part II). 20 1.5.2 From camera motion to 3-D models (Thesis Part III) 21 1.6 Contributions of the author.................. 23 I An Automatic Video Segmentation System 31 2 Projective Geometry 33 2.1 Introduction........................... 34 2.2 Projective spaces........................ 35 2.2.1 Homogeneous coordinates............... 35 2.2.2 Lines in the projective plane............. 37 2.3 Geometric transformations in 2-D............... 39 2.3.1 Projective transformation............... 40 2.3.2 Affine motion...................... 43 2.3.3 Projective motion................... 44 i
ii Contents 2.4 Geometric transformations in 3-D............... 47 2.4.1 Affine motion in 3-D.................. 48 2.4.2 Rotation in 3-D..................... 49 2.4.3 Perspective projection................. 54 2.5 Image acquisition........................ 57 2.5.1 Intrinsic camera parameters.............. 57 2.5.2 Extrinsic camera parameters............. 61 2.5.3 Camera motion in a static environment....... 64 2.5.4 Inter-image transformation.............. 67 2.6 Summary and notational conventions............. 70 3 Feature-based Motion I: Point-Correspondences 71 3.1 Introduction........................... 72 3.1.1 Basics of feature-based motion estimation...... 73 3.1.2 From feature-points to motion parameters...... 75 3.2 Interest-point detectors.................... 76 3.2.1 Moravec interest-point detector............ 78 3.2.2 Shi-Tomasi detector.................. 79 3.2.3 Harris corner detector................. 81 3.2.4 SUSAN corner detector................ 84 3.2.5 Evaluation....................... 86 3.3 Computing feature-correspondences.............. 90 3.3.1 Fast greedy algorithm................. 91 3.3.2 Evaluation....................... 95 3.4 Summary............................ 102 4 Feature-Based Motion II: Parameter Estimation 105 4.1 Introduction........................... 106 4.2 Computing motion model parameters............ 106 4.2.1 One-dimensional affine motion............ 106 4.2.2 Two-dimensional affine motion............ 108 4.2.3 One-dimensional projective motion.......... 109 4.2.4 Two-dimensional projective motion.......... 110 4.2.5 Non-linear least-squares estimation.......... 113 4.3 Robust estimation algorithms................. 115 4.3.1 Breakdown of least-squares fit on data with outliers 116 4.3.2 Robust estimation using RANSAC.......... 116 4.3.3 Robustness of the RANSAC algorithm........ 121 4.4 Summary............................ 126
Contents iii 5 Background Reconstruction 131 5.1 Introduction........................... 132 5.2 Frame alignment........................ 135 5.2.1 Motion models for sprite generation......... 135 5.2.2 Geometry of background image generation...... 136 5.2.3 Long-term motion estimation............. 138 5.3 Background estimation..................... 146 5.3.1 Introduction and previous work............ 146 5.3.2 The SimMat background-estimation algorithm... 149 5.3.3 Results......................... 157 5.4 Summary of the background reconstruction module..... 159 6 Multi-Sprite Backgrounds 165 6.1 Introduction........................... 166 6.2 Limitations of the single-sprite approach........... 167 6.3 Detecting degenerated transforms............... 171 6.4 Examples of single-sprite inefficiencies............ 173 6.4.1 Example case: camera zoom-out........... 173 6.4.2 Example case: horizontal camera pan........ 174 6.4.3 Example case: camera zoom-in............ 177 6.5 Sprite cost definitions..................... 179 6.5.1 Bitstream length.................... 179 6.5.2 Coded sprite area.................... 179 6.5.3 Sprite buffer size.................... 181 6.5.4 Adding a resolution preservation constraint and limiting sprite buffer requirements............ 182 6.6 Multi-sprite partitioning algorithm.............. 183 6.6.1 Cost matrix calculation and reference frame placement183 6.6.2 Optimal sequence partitioning............ 184 6.7 Experiments and results.................... 186 6.8 Integration into the segmentation system.......... 189 6.9 Online calculation of constrained sprites........... 190 6.10 Coding multi-sprites in MPEG-4 streams.......... 191 6.11 Conclusions........................... 192 7 Background Subtraction 199 7.1 Introduction........................... 200 7.2 Pixel-based classification.................... 201 7.2.1 Distance metrics.................... 201 7.2.2 Influence of the color-space.............. 202 7.2.3 Classes of errors.................... 205 7.2.4 Evaluation method................... 205
iv Contents 7.2.5 Results......................... 207 7.3 Multi-pixel based significance tests.............. 208 7.3.1 Classification using a χ 2 test............. 209 7.3.2 Extension to color images............... 211 7.3.3 Fast implementation.................. 211 7.3.4 Evaluation....................... 212 7.4 Classification using Markov random fields.......... 215 7.4.1 MRF model for segmentation masks......... 216 7.4.2 Obtaining a MAP estimate.............. 217 7.4.3 Extension to color images............... 219 7.4.4 Optimization algorithm................ 220 7.4.5 Evaluation....................... 222 7.5 Sources of errors and robustness improvements....... 223 7.5.1 Map of misregistration risk.............. 224 7.5.2 Map of interpolation errors.............. 225 7.5.3 Integrating risk maps into the segmentation process 228 7.6 Postprocessing the object mask................ 230 7.6.1 Filling holes in the object............... 230 7.6.2 Heuristics for removing clutter in the mask..... 230 7.7 Overview of the segmentation process............ 231 8 Results and Applications 235 8.1 Algorithm modules....................... 236 8.2 Variants of the segmentation system............. 237 8.2.1 Surveillance with a static camera........... 237 8.2.2 Surveillance with a moving camera.......... 238 8.2.3 Offline video analysis.................. 239 8.2.4 Online video segmentation and transmission..... 240 8.3 Implementation......................... 243 8.4 Segmentation results...................... 244 8.5 Applications of the segmentation system........... 255 8.5.1 MPEG-4 video coding................. 255 8.5.2 Video editing...................... 259 8.5.3 Pseudo 3-D video generation............. 260 8.5.4 Video-object recognition................ 262 8.6 Extensions............................ 262 8.6.1 MPEG-4 coding with sprite-mode detection..... 263 8.6.2 Camera auto-calibration................ 263 8.6.3 Absolute coordinate transfer............. 264 8.6.4 Object models..................... 264
Contents v II Segmentation Using Object Models 265 9 Object Detection based on Graph-Models I: Cartoons 267 9.1 Introduction........................... 268 9.2 Principle of region-based graph matching.......... 269 9.3 Model editor.......................... 272 9.4 Automatic color segmentation................. 275 9.5 Feature extraction and matching criteria........... 276 9.5.1 Color.......................... 276 9.5.2 Size........................... 276 9.5.3 Distance......................... 277 9.5.4 Shape.......................... 278 9.5.5 Orientation....................... 278 9.5.6 Node and edge costs.................. 278 9.5.7 Generalization of costs for 1 : N-matching...... 279 9.6 Matching algorithm...................... 280 9.6.1 Candidate-region selection............... 281 9.6.2 Matching algorithm.................. 282 9.6.3 1 : N-matching..................... 285 9.7 Results.............................. 288 9.8 Conclusions........................... 288 10 Object Detection based on Graph Models II: Natural 295 10.1 Introduction........................... 296 10.2 Segmentation system architecture............... 297 10.3 Step 1: motion detection.................... 300 10.4 Step 2: model matching.................... 300 10.4.1 Model editor...................... 301 10.4.2 Model detection.................... 304 10.5 Step 3: spatial segmentation.................. 307 10.5.1 Spatial segmentation algorithm............ 307 10.5.2 Merging criterion.................... 308 10.6 Experiments and results.................... 309 10.7 Conclusions........................... 310 10.8 Appendix: notes on ellipse processing............ 316 11 Manual Segmentation and Signature Tracking 319 11.1 Introduction........................... 320 11.2 From Intelligent to Corridor Scissors............. 321 11.2.1 Intelligent Scissors algorithm............. 321 11.2.2 Problems of the Intelligent Scissors tool....... 323 11.2.3 The Corridor Scissors tool............... 323
vi Contents 11.2.4 Experiments and results with Corridor Scissors... 325 11.3 Shortest circular paths..................... 326 11.3.1 Definition of circular paths.............. 328 11.3.2 Computation of shortest circular paths........ 331 11.3.3 Computational complexity............... 339 11.4 Signature tracking....................... 342 11.4.1 A first tracking algorithm............... 343 11.4.2 Signature tracking algorithm............. 343 11.4.3 Circular-path search with object signatures..... 345 11.4.4 Tracking results..................... 347 11.5 Summary and conclusions................... 348 11.6 Discussion on the signature tracking technique....... 349 11.7 Appendix: step-by-step examples............... 353 III From Camera Motion to 3-D Models 359 12 Estimation of Physical Camera Parameters 361 12.1 Introduction........................... 362 12.1.1 Geometry of background image generation...... 363 12.1.2 Global motion estimation............... 365 12.2 Previous work.......................... 367 12.2.1 Estimation of focal length............... 367 12.3 Linear camera calibration................... 371 12.3.1 Calibration using the image of the absolute conic.. 371 12.3.2 Integration of multi-sprite motion estimation.... 374 12.4 Non-linear camera calibration................. 375 12.4.1 Parameterization.................... 376 12.4.2 Generalizing to multi-sprites............. 378 12.4.3 Optimization algorithm................ 379 12.4.4 Recovering rotation angles............... 381 12.5 Experimental results...................... 382 12.6 Conclusions........................... 385 13 Camera Calibration for the Analysis of Sport Videos 391 13.1 Introduction and previous work................ 392 13.2 Calibration-algorithm principle................ 394 13.3 Overview of the calibration system.............. 395 13.4 Court-line pixel detection................... 397 13.4.1 Filter 1: luminance threshold............. 398 13.4.2 Filter 2: non-flat regions................ 399 13.4.3 Filter 3: linear structure................ 399
Contents vii 13.5 Line-parameter estimation................... 401 13.5.1 Line detection with the Hough transform...... 401 13.5.2 Line detection with RANSAC............. 402 13.5.3 Line-segment boundary detection........... 403 13.6 Court-model fitting....................... 405 13.6.1 Fast fitting method................... 406 13.6.2 Robust fitting method................. 407 13.6.3 Fast calibration-parameter rejection test....... 408 13.7 Model tracking......................... 411 13.8 Experiments........................... 413 13.9 Conclusions........................... 414 14 Panoramic Video and Floor Plan Reconstruction 419 14.1 Introduction........................... 420 14.1.1 From background sprites to panoramic images... 420 14.1.2 Visualization of panoramic images.......... 421 14.1.3 Floor plan reconstruction............... 422 14.1.4 Chapter outline..................... 423 14.2 Capturing panoramic images and video........... 423 14.2.1 Panoramic image generation.............. 424 14.2.2 Cameras for recording panoramic videos....... 425 14.3 Visualization of panoramic videos............... 427 14.4 Reconstruction of rectangular rooms............. 432 14.4.1 The circular arc of possible camera locations.... 433 14.4.2 Searching for the camera position........... 434 14.4.3 Creating a virtual room visualization......... 435 14.5 Reconstruction of floor plans................. 436 14.5.1 Previous work...................... 436 14.5.2 Reconstruction algorithm concept.......... 437 14.5.3 Modeling the floor plan geometry........... 438 14.5.4 Estimating the floor plan parameters......... 440 14.5.5 Improving the convergence behaviour........ 441 14.5.6 Initialization of the floor plan layout......... 446 14.5.7 Obtaining wall textures from the panoramic images 447 14.6 Experimental Results...................... 448 14.7 Conclusions........................... 448 15 Conclusions 453 15.1 Discussion on the individual chapters............. 454 15.1.1 Chapter 3 and 4: camera-motion estimation..... 454 15.1.2 Chapter 5: background estimation.......... 455 15.1.3 Chapter 6: multi-sprites................ 455
viii Contents 15.1.4 Chapter 7: background subtraction.......... 457 15.1.5 Chapter 9 and 10: graph-based object models.... 457 15.1.6 Chapter 11: Corridor Scissors and circular paths.. 460 15.1.7 Chapter 12: physical camera-parameter extraction. 460 15.1.8 Chapter 13: camera calibration for sport videos... 461 15.1.9 Chapter 14: floor plans from panoramic images... 462 15.2 Explicit vs. implicit models.................. 462 15.3 Future of segmentation..................... 463 IV Appendices 465 A Video-Summarization with Scene Preknowledge 467 A.1 Introduction........................... 467 A.2 Summarization algorithm................... 469 A.2.1 Feature extraction................... 469 A.2.2 Determining segment boundaries........... 470 A.2.3 Clustering........................ 472 A.2.4 Integration of domain-knowledge........... 473 A.3 Evaluation............................ 474 A.4 Conclusions........................... 476 B Efficient Computation of Homographies From Four Correspondences 479 C Robust Motion Estimation with LTS and LMedS 483 D Additional Test Sequences 487 E Color Segmentation Using Region Merging 489 E.1 Introduction........................... 489 E.1.1 The region-merging algorithm............. 489 E.2 Merging criteria......................... 491 E.2.1 Mean luminance difference.............. 491 E.2.2 Ward s criterion.................... 492 E.2.3 Mean/Ward mixture.................. 492 E.2.4 Linear-luminance model................ 492 E.2.5 Border criterion..................... 492 E.3 Criteria properties....................... 493 E.3.1 General behaviour................... 493 E.3.2 Comparison....................... 494 E.4 Multi-stage merging...................... 495
Contents ix E.4.1 Applying a watershed presegmentation........ 497 E.5 Results and conclusions.................... 498 F Shape-Based Analysis of Object Behaviour 501 F.1 Classification of object shapes................. 502 F.2 Simple model for object behaviour.............. 503 F.3 Behaviour analysis....................... 503 References 507 Summary 527 Samenvatting 533 Zusammenfassung 539 Acknowledgments 545 Biography 547
x Contents