David Vernon - www.vernon.eu

Applied Computer Vision

David Vernon
Carnegie Mellon University Africa in Rwanda
vernoncmu.edu

This course provides students with a solid foundation in the key elements of computer vision, emphasizing the practical application of the underlying theory. It focusses mainly on the techniques required to build robot vision applications but the algorithms can also be applied in other domains such as industrial inspection and video surveillance. A key focus of the course is on effective implementation of solutions to practical computer vision problems in a variety of environments using both bespoke software authored by the students and standard computer vision libraries.

The course covers optics, sensors, image formation, image acquisition & image representation before proceeding to the essentials of image processing and image filtering. This provides the basis for a treatment of image segmentation, including edge detection, region growing, and boundary detection, colour-based segmentation, as well as more sophisticated techniques such as snakes and graph-cuts.

Building on this, the course then proceeds to deal with object detection and recognition in 2D, addressing interest point operators, gradient orientation histograms, the SIFT descriptor, colour histogram intersection and back-projection, the Hough transform, template matching, and Bayesian classification.

Video image processing focusses on the detection and tracking of moving object using a variety of techniques, ranging from several types of background subtraction, optical flow, and the Kalman filter.

The problem of recovery of 3D information is then addressed, introducing homogeneous coordinates and transformations, the perspective transformation, camera model, inverse perspective transformation, stereo vision, and epipolar geometry, as well as other depth cues.

The course finishes by addressing the important role played by machine learning in computer vision, focussing on the practical application of deep learning and convolutional neural networks, such as VGGNet and ResNet, to object classification, using Keras and Tensorflow with Python.

Learning Objectives

After completing this course, students should be able to:

Apply their knowledge of image acquisition, image processing, and image analysis to extract useful information from visual images.

Design, implement, and document appropriate, effective, and efficient software solutions for a variety of real-world computer vision problems.

Exploit standard computer vision software libraries in the development of these solutions.

Course Content

Overview of human and computer vision.

OpenCV and software development tools for course work.

Optics, sensors, and image formation.

Image acquisition and image representation

Sampling and quantization

Shannon's sampling theorem

Nyquist frequency and Nyquist sampling rate

Aliasing

Resolution

Space-variant sampling

Log-polar images

Dynamic range, colour spaces (HIS, HLS, HSV)

Image processing

Point & neighbourhood operations
Image filtering
Convolution
Fourier transform
Morphological operations
Geometric operations

Segmentation

Region-based approaches
Binary thresholding
Connected component analysis
Edge detection
Colour-based approaches and k-means clustering
Graph cuts, normalized cuts, energy-based graph cuts, grab-cut

Image features

Harris interest point operator
Difference of Gaussian interest point operators
SIFT feature descriptor

Object recognition

Template matching
Normalized cross-correlation
Chamfer matching
2D shape features
statistical pattern recognition
Hough transform for parametric curves: lines, circles, and ellipses
Generalized Hough transform and extension to codeword features
Colour histogram matching and back-projection
Haar features, boosted classifiers, and face detection
Histogram of Oriented Gradients (HOG) feature descriptor

Video image processing

Moving object detection, motion detection issues, difference images, background models
Object tracking - exhaustive search, mean shift, optical flow (dense and feature-based)
Object tracking - Kalman filter

3D vision

Homogeneous coordinates and transformations
Perspective transformation
Camera model and inverse perspective transformation
Stereopsis, stereo correspondence, epipolar geometry
Depth cues
Structured light

Computer vision and deep learning

Supervised vs. unsupervised
Classification, regression, and clustering
Parametric models
Shallow vs. deep learning
Support vector machines (SVM)
Overfitting
Neural networks
Multi-layer perceptrons
Deep learning
Convolutional neural networks (CNN)
Example CNNs: ShallowNet, MiniVGGNet, VGG16, VGG19, ResNet, Inception V3, Xception
Dropout
Transfer learning
Deconvolution CNN
Non-classification applications
General adversarial networks
CNN resources

Lecture Notes

Lecture 1. Overview of human and computer vision; software tools
Lecture 2. Optics, sensors, and image formation
Lecture 3. Image acquisition and image representation
Lecture 4. Image processing: point & neighbourhood operations, image filtering, convolution, Fourier transform
Lecture 5. Image processing: morphological operations
Lecture 6. Image processing: geometric operations
Lecture 7. Segmentation: simple region-based approaches, binary thresholding, connected component analysis
Lecture 8. Segmentation: boundary-based approaches, edge detection; boundary detection; snakes
Lecture 9. Segmentation: region-based approaches, simple colour segmentation; k-means clustering
Lecture 10. Segmentation: region-based approaches, graph cuts, normalized cuts, energy-based graph cuts, grab-cut
Lecture 11. Image features: Harris and Difference of Gaussian interest point operators
Lecture 12. Image features: Scale-invariant interest point operators
Lecture 13. Image features: SIFT feature descriptor
Lecture 14. Object recognition: template matching; normalized cross-correlation; chamfer matching
Lecture 15. Object recognition: 2D shape features; statistical pattern recognition
Lecture 16. Object recognition: Hough transform for parametric curves: lines, circles, and ellipses
Lecture 17. Object recognition: generalized Hough transform; extension to code-word features
Lecture 18. Object recognition: colour histogram matching and back-projection
Lecture 19. Object recognition: Haar features, boosted classifiers, and face detection
Lecture 20. Object recognition: Histogram of Oriented Gradients (HOG) feature descriptor, people detection
Lecture 21. Video image processing: moving object detection, motion detection issues, difference images, background models
Lecture 22. Video image processing: object tracking - exhaustive search, mean shift, optical flow (dense and feature-based)
Lecture 23. Video image processing: object tracking - Kalman filter
Lecture 24. 3D vision: homogeneous transformations, camera model and inverse perspective transformation
Lecture 25. 3D vision: stereopsis, stereo correspondence, epipolar geometry, depth cues, structured light
Lecture 26. Computer vision and machine learning: shallow learning vs. deep learning
Lecture 27. Computer vision and deep learning: optimization methods and regularization, neural network basics, multi-layer perceptrons
Lecture 28. Computer vision and deep learning: convolutional neural networks (CNNs): VGG16, VGG19, ResNet, Inception V3, Xception

Course Textbook

Szeliski, R. Computer Vision: Algorithms and Applications, Springer, 2010.

Recommended Reading

Dawson-Howe, A Practical Introduction to Computer Vision with OpenCV, Wiley, 2014.

Trucco, E. and Verri, A. Introductory Techniques for 3-D Computer Vision, Prentice-Hall, 1998.

Vernon, D. Machine Vision: Automated Visual Inspection and Robot Vision, Prentice-Hall, 1991.

Rosebrock, A. Deep Learning for Computer Vision with Python, PyImageSearch, 2017.

Software Development Environment

Click here for a step-by-step guide to downloading, installing, and using the software required to run examples and complete the assignments.

Acknowledgments

The syllabus and content of this course derives from several sources. These include the following.

Course VO 4.0 376.054 Machine Vision and Cognitive Robotics given by Markus Vincze, Michael Zillich, and Daniel Wolf at Technische Universitat Wien.

Course 4BA10 Computer Vision given by Kenneth-Dawson Howe at Trinity College Dublin.

Course 4BA10 Computer Vision given by David Vernon at Trinity College Dublin.

Course on Computer Vision given by Francesca Odone, University of Genova, at VVV2017.

Tutorials on Machine Learning with Computer Vision given by Toby Breckon, Durham University, at BMVA Summer Schools in 2016 and 2017.

Deep Learning for Computer Vision with Python, A. Rosebrock, PyImageSearch, 2017.

David Vernon's Personal Website