This specialization presents the first comprehensive treatment of the foundations of computer vision. It focuses on the mathematical and physical underpinnings of vision and has been designed for learners, practitioners and researchers who have little or no knowledge of computer vision. The program includes a series of 5 courses. Any learner who completes this specialization has the potential to build a successful career in computer vision, a thriving field that is expected to increase in importance in the coming decades.
Course 1: Camera and Imaging - Offered by Columbia University. This course covers the fundamentals of imaging – the creation of an image that is ready for consumption or ... Enroll for free.
Course 2: Features and Boundaries - Offered by Columbia University. This course focuses on the detection of features and boundaries in images. Feature and boundary detection is ... Enroll for free.
Course 3: 3D Reconstruction - Single Viewpoint - Offered by Columbia University. This course focuses on the recovery of the 3D structure of a scene from its 2D images. In particular, we are ... Enroll for free.
Course 4: 3D Reconstruction - Multiple Viewpoints - Offered by Columbia University. This course focuses on the recovery of the 3D structure of a scene from images taken from different ... Enroll for free.
Course 5: Visual Perception - Offered by Columbia University. The ultimate goal of a computer vision system is to generate a detailed symbolic description of each image ... Enroll for free.
The ultimate goal of a computer vision system is to generate a detailed symbolic description of each image shown. This course focuses on the all-important problem of perception.
We first describe the problem of tracking objects in complex scenes. We look at two key challenges in this context. The first is the separation of an image into object and background using a technique called change detection. The second is the tracking of one or more objects in a video. Next, we examine the problem of segmenting an image into meaningful regions. In particular, we take a bottom-up approach where pixels with similar attributes are grouped together to obtain a region.
Finally, we tackle the problem of object recognition. We describe two approaches to the problem. The first directly recognize an object and its pose using the appearance of the object. This method is based on the concept of dimension reduction, which is achieved using principal component analysis. The second approach is to use a neural network to solve the recognition problem as one of learning a mapping from the input (image) to the output (object class, object identity, activity, etc.). We describe how a neural network is constructed and how it is trained using the backpropagation algorithm.
This course focuses on the recovery of the 3D structure of a scene from images taken from different viewpoints. We start by first building a comprehensive geometric model of a camera and then develop a method for finding (calibrating) the internal and external parameters of the camera model. Then, we show how two such calibrated cameras, whose relative positions and orientations are known, can be used to recover the 3D structure of the scene. This is what we refer to as simple binocular stereo. Next, we tackle the problem of uncalibrated stereo where the relative positions and orientations of the two cameras are unknown. Interestingly, just from the two images taken by the cameras, we can both determine the relative positions and orientations of the cameras and then use this information to estimate the 3D structure of the scene.
Next, we focus on the problem of dynamic scenes. Given two images of a scene that includes moving objects, we show how the motion of each point in the image can be computed. This apparent motion of points in the image is called optical flow. Optical flow estimation allows us to track scene points over a video sequence. Next, we consider the video of a scene shot using a moving camera, where the motion of the camera is unknown. We present structure from motion that takes as input tracked features in such a video and determines not only the 3D structure of the scene but also how the camera moves with respect to the scene. The methods we develop in the course are widely used in object modeling, 3D site modeling, robotics, autonomous navigation, virtual reality and augmented reality.
This course covers the fundamentals of imaging – the creation of an image that is ready for consumption or processing by a human or a machine. Imaging has a long history, spanning several centuries. But the advances made in the last three decades have revolutionized the camera and dramatically improved the robustness and accuracy of computer vision systems. We describe the fundamentals of imaging, as well as recent innovations in imaging that have had a profound impact on computer vision.
This course starts with examining how an image is formed using a lens camera. We explore the optical characteristics of a camera such as its magnification, F-number, depth of field and field of view. Next, we describe how solid-state image sensors (CCD and CMOS) record images, and the key properties of an image sensor such as its resolution, noise characteristics and dynamic range. We describe how image sensors can be used to sense color as well as capture images with high dynamic range. In certain structured environments, an image can be thresholded to produce a binary image from which various geometric properties of objects can be computed and used for recognizing and locating objects. Finally, we present the fundamentals of image processing – the development of computational tools to process a captured image to make it cleaner (denoising, deblurring, etc.) and easier for computer vision systems to analyze (linear and non-linear image filtering methods).
This course focuses on the detection of features and boundaries in images. Feature and boundary detection is a critical preprocessing step for a variety of vision tasks including object detection, object recognition and metrology – the measurement of the physical dimensions and other properties of objects. The course presents a variety of methods for detecting features and boundaries and shows how features extracted from an image can be used to solve important vision tasks.
We begin with the detection of simple but important features such as edges and corners. We show that such features can be reliably detected using operators that are based on the first and second derivatives of images. Next, we explore the concept of an “interest point” – a unique and hence useful local appearance in an image. We describe how interest points can be robustly detected using the SIFT detector. Using this detector, we describe an end-to-end solution to the problem of stitching overlapping images of a scene to obtain a wide-angle panorama. Finally, we describe the important problem of finding faces in images and show several applications of face detection.
This course focuses on the recovery of the 3D structure of a scene from its 2D images. In particular, we are interested in the 3D reconstruction of a rigid scene from images taken by a stationary camera (same viewpoint). This problem is interesting as we want the multiple images of the scene to capture complementary information despite the fact that the scene is rigid and the camera is fixed. To this end, we explore several ways of capturing images where each image provides additional information about the scene.
In order to estimate scene properties (depth, surface orientation, material properties, etc.) we first define several important radiometric concepts, such as, light source intensity, surface illumination, surface brightness, image brightness and surface reflectance. Then, we tackle the challenging problem of shape from shading - recovering the shape of a surface from its shading in a single image. Next, we show that if multiple images of a scene of known reflectance are taken while changing the illumination direction, the surface normal at each scene point can be computed. This method, called photometric stereo, provides a dense surface normal map that can be integrated to obtain surface shape.
Next, we discuss depth from defocus, which uses the limited depth of field of the camera to estimate scene structure. From a small number of images taken by changing the focus setting of the lens, a dense depth of the scene is recovered. Finally, we present a suite of techniques that use active illumination (the projection of light patterns onto the scene) to get precise 3D reconstructions of the scene. These active illumination methods are the workhorse of factory automation. They are used on manufacturing lines to assemble products and inspect their visual quality. They are also extensively used in other domains such as driverless cars, robotics, surveillance, medical imaging and special effects in movies.