Applied Computer Vision
David Vernon
Carnegie Mellon University Africa in Rwanda
vernoncmu.edu
Course Description  | 
Learning Objectives  | 
Content  | 
Lecture Notes  | 
Course Textbook  | 
Recommended Reading | 
Software
Course Description
This course provides students with a solid foundation in the key elements of computer vision, emphasizing the practical application of the underlying theory. It focusses mainly on the techniques required to build robot vision applications but the algorithms can also be applied in other domains such as industrial inspection and video surveillance. A key focus of the course is on effective implementation of solutions to practical computer vision problems in a variety of environments using both bespoke software authored by the students and standard computer vision libraries.
The course covers optics, sensors, image formation, image acquisition & image representation before proceeding to the essentials of image processing and image filtering. This provides the basis for a treatment of image segmentation, including edge detection, region growing, and boundary detection, colour-based segmentation, as well as more sophisticated techniques such as snakes and graph-cuts.
Building on this, the course then proceeds to deal with object detection and recognition in 2D, addressing interest point operators, gradient orientation histograms, the SIFT descriptor, colour histogram intersection and back-projection, the Hough transform, template matching, and Bayesian classification.
Video image processing focusses on the detection and tracking of moving object using a variety of techniques, ranging from several types of background subtraction, optical flow, and the Kalman filter.
The problem of recovery of 3D information is then addressed, introducing homogeneous coordinates and transformations, the perspective transformation, camera model, inverse perspective transformation, stereo vision, and epipolar geometry, as well as other depth cues.
The course finishes by addressing the important role played by machine learning in computer vision, focussing on the practical application of deep learning and convolutional neural networks, such as VGGNet and ResNet, to object classification, using Keras and Tensorflow with Python.
Back to Top
Learning Objectives
After completing this course, students should be able to:
- Apply their knowledge of image acquisition, image processing, and image analysis to extract useful information from visual images.
- Design, implement, and document appropriate, effective, and efficient software solutions for a variety of real-world computer vision problems.
- Exploit standard computer vision software libraries in the development of these solutions.
Back to Top
Course Content
Overview of human and computer vision.
OpenCV and software development tools for course work.
Optics, sensors, and image formation.
Image acquisition and image representation
- Sampling and quantization
- Shannon's sampling theorem
- Nyquist frequency and Nyquist sampling rate
- Aliasing
- Resolution
- Space-variant sampling
- Log-polar images
- Dynamic range, colour spaces (HIS, HLS, HSV)
Image processing
- Point & neighbourhood operations
- Image filtering
- Convolution
- Fourier transform
- Morphological operations
- Geometric operations
Segmentation
- Region-based approaches
- Binary thresholding
- Connected component analysis
- Edge detection
- Colour-based approaches and k-means clustering
- Graph cuts, normalized cuts, energy-based graph cuts, grab-cut
Image features
- Harris interest point operator
- Difference of Gaussian interest point operators
- SIFT feature descriptor
Object recognition
- Template matching
- Normalized cross-correlation
- Chamfer matching
- 2D shape features
- statistical pattern recognition
- Hough transform for parametric curves: lines, circles, and ellipses
- Generalized Hough transform and extension to codeword features
- Colour histogram matching and back-projection
- Haar features, boosted classifiers, and face detection
- Histogram of Oriented Gradients (HOG) feature descriptor
Video image processing
- Moving object detection, motion detection issues, difference images, background models
- Object tracking - exhaustive search, mean shift, optical flow (dense and feature-based)
- Object tracking - Kalman filter
3D vision
- Homogeneous coordinates and transformations
- Perspective transformation
- Camera model and inverse perspective transformation
- Stereopsis, stereo correspondence, epipolar geometry
- Depth cues
- Structured light
Computer vision and deep learning
- Supervised vs. unsupervised
- Classification, regression, and clustering
- Parametric models
- Shallow vs. deep learning
- Support vector machines (SVM)
- Overfitting
- Neural networks
- Multi-layer perceptrons
- Deep learning
- Convolutional neural networks (CNN)
- Example CNNs: ShallowNet, MiniVGGNet, VGG16, VGG19, ResNet, Inception V3, Xception
- Dropout
- Transfer learning
- Deconvolution CNN
- Non-classification applications
- General adversarial networks
- CNN resources
Lecture Notes
Lecture 1. Overview of human and computer vision; software tools 
Lecture 2. Optics, sensors, and image formation
Lecture 3. Image acquisition and image representation 
Lecture 4. Image processing: point & neighbourhood operations, image filtering, convolution, Fourier transform 
Lecture 5. Image processing: morphological operations 
Lecture 6. Image processing: geometric operations 
Lecture 7. Segmentation: simple region-based approaches, binary thresholding, connected component analysis 
Lecture 8. Segmentation: boundary-based approaches, edge detection; boundary detection; snakes 
Lecture 9. Segmentation: region-based approaches, simple colour segmentation; k-means clustering 
Lecture 10. Segmentation: region-based approaches, graph cuts, normalized cuts, energy-based graph cuts, grab-cut 
Lecture 11. Image features: Harris and Difference of Gaussian interest point operators 
Lecture 12. Image features: Scale-invariant interest point operators 
Lecture 13. Image features: SIFT feature descriptor 
Lecture 14. Object recognition: template matching; normalized cross-correlation; chamfer matching 
Lecture 15. Object recognition: 2D shape features; statistical pattern recognition 
Lecture 16. Object recognition: Hough transform for parametric curves: lines, circles, and ellipses 
Lecture 17. Object recognition: generalized Hough transform; extension to code-word features 
Lecture 18. Object recognition: colour histogram matching and back-projection 
Lecture 19. Object recognition: Haar features, boosted classifiers, and face detection 
Lecture 20. Object recognition: Histogram of Oriented Gradients (HOG) feature descriptor, people detection 
Lecture 21. Video image processing: moving object detection, motion detection issues, difference images, background models 
Lecture 22. Video image processing: object tracking - exhaustive search, mean shift, optical flow (dense and feature-based)
Lecture 23. Video image processing: object tracking - Kalman filter  
Lecture 24. 3D vision: homogeneous transformations, camera model and inverse perspective transformation 
Lecture 25. 3D vision: stereopsis, stereo correspondence, epipolar geometry, depth cues, structured light 
Lecture 26. Computer vision and machine learning: shallow learning vs. deep learning 
Lecture 27. Computer vision and deep learning: optimization methods and regularization, neural network basics, multi-layer perceptrons 
Lecture 28. Computer vision and deep learning: convolutional neural networks (CNNs): VGG16, VGG19, ResNet, Inception V3, Xception  
Back to Top
Course Textbook
Szeliski, R. Computer Vision: Algorithms and Applications, Springer, 2010.
Back to Top
Recommended Reading
Dawson-Howe, A Practical Introduction to Computer Vision with OpenCV, Wiley, 2014.
Trucco, E. and Verri, A. Introductory Techniques for 3-D Computer Vision, Prentice-Hall, 1998.
Vernon, D. Machine Vision: Automated Visual Inspection and Robot Vision, Prentice-Hall, 1991.
Rosebrock, A. Deep Learning for Computer Vision with Python, PyImageSearch, 2017.
Back to Top
Software Development Environment
Click here for a step-by-step guide to downloading, installing, and using the software required to run examples and complete the assignments.
Acknowledgments
The syllabus and content of this course derives from several sources. These include the following.
- Course VO 4.0 376.054 Machine Vision and Cognitive Robotics given by Markus Vincze, Michael Zillich, and Daniel Wolf at Technische Universitat Wien.
- Course 4BA10 Computer Vision given by Kenneth-Dawson Howe at Trinity College Dublin.
- Course 4BA10 Computer Vision given by David Vernon at Trinity College Dublin.
- Course on Computer Vision given by Francesca Odone, University of Genova, at VVV2017.
- Tutorials on Machine Learning with Computer Vision given by Toby Breckon, Durham University, at BMVA Summer Schools in 2016 and 2017.
- Deep Learning for Computer Vision with Python, A. Rosebrock, PyImageSearch, 2017.
David Vernon's Personal Website
|