Computer Vision: How Can Our Computers See and Make Sense Out of It?

Pranav Mendiratta
The Startup
Published in
7 min readSep 18, 2020

--

Computer Vision

Definition: Computer Vision is the field of computer science that helps the computers to gain high-level information from digital images and videos by analyzing them using various machine learning algorithms. From an engineering point of view, its main objective is to automate the human visual system and to lessen the human effort.

As a scientific discipline, computer vision is related to artificial systems. Computer vision image data can take many forms such as views from multiple cameras, multidimensional data from medical scanner or video frames etc.

History:

Computer vision began in the late 1960s at universities which were pioneering in the field of artificial intelligence. It was developed to mimic the human visual system and was the next stepping stone towards building robots with their own human-like intelligence. This was first achieved in 1966 when a camera was attached to a computer and the computer was made to describe what it saw.

At that time, what they wanted to achieve through computer vision was digital image processing that could create a 3 dimensional structure of the full scene to improve understandings. These studies from 1970s formed the very early basis of the modern computer vision algorithms that include labelling of line, edge extraction from images, polyhedral and non-polyhedral modelling, motion estimation and optical flow.

The next decade saw further research and development in the theories and concepts of computer vision such as scale-space, the deduction of shape from various prompts such as shading, contour models called snakes, focus and textures.

By the 1990s, some of the previous topics of research were getting more attention and helped in getting better results for making practical models for this technology. These include projective 3D reconstruction to get a better understanding of the camera calibrations. Various graph cut techniques were used to solve image calibrations. This decade also achieved the milestone in computer science history when facial recognition came into existence using computer vision technologies.

At the end of this decade, a significant leap had been taken in this technology that included developments in image morphing, panoramic image stitching and image based rendering.

Transition from machine learning to advanced deep learning algorithms brought more complex optimization frameworks in practice that helped us to make use of the real potential in this technology.

Technical terms used here-

1. Optical flow: It is the pattern formed due to apparent motion of edges, objects and surfaces caused due to relative motion between an observer and an object.

Figure : Optical Flow experienced by a rotating observer. Magnitude and direction of optical flow at each point is represented by length and direction of arrow.

2. Optical flow sensor: It is a vision sensor that has an image sensor chip connected to a processor that helps in measuring the optical flow and thus gives a output based on it.

3. Motion estimation: Motion estimation can be defined as the study of shift in the 2D motion vectors usually by studying the differences between adjacent image frames.

4. Polyhedron modelling: Polyhedron modelling is a physical construction of a polyhedron using various solid materials. Polyhedron is Greek word meaning “many bases”. Polyhedron models can be used in computer vision technologies to make 3D models that helps in training machine learning algorithms.

5. Camera Calibration: Camera calibration is done to estimate the intrinsic and the extrinsic parameters of a camera that helps in achieving a high accuracy and low distortion rate from the camera.

6. 3D reconstruction from multiple images: 3D reconstruction from multiple images is basically creating three-dimensional models from sets of images.

7. Contour models called snakes: It is a framework in computer vision that helps in depicting an object outline from a possibly noisy 2D image.

Related Fields:

1. Artificial Intelligence: Artificial intelligence is the field of computer science that deals with granted machines human like capabilities of thinking and doing specific tasks.

2. Information Engineering: This field of computer science came into limelight in the 21st century itself. It deals with generation, distribution, analysis and use of that data, knowledge, information in systems.

3. Neurobiology: Neurobiology, specifically the study of human vision system in neurobiology plays a vital role in the development of computer vision technology. It has helped us understand about how a real vision system works using neurons and has helped us to mimic it using the concepts of artificial intelligence and machine learning.

4. Solid State Physics: Computer vision systems depend on image sensors which are used to detect electro-magnetic radiation mostly of the form infra-red or visible light. These sensors are designed using concepts of quantum mechanics and quantum physics. Physics deals with the behavior of optics that acts as the core concept for this technology.

5. Signal Processing: Due to the multi-dimensionality of image signals, there might be requirement of conversion of one variable signal to multi variable or vice-versa thus image processing is also a related field to computer vision.

Working:

To understand the working of computer vision we will take the example of this greyscale image buffer of Abraham Lincoln. Each pixel brightness is represented by a single 8 bit number. As we know that each bit can be represented by two characters 0 and 1 thus for 8 bits, 28 = 256 giving us a range from 0 (black) to 255 (white).

Figure: Pixel data diagram

All the pixel position values are universally stored at the hardware level in a one dimensional array. Thus we get a long list of unsigned characters as shown below:

{157, 153, 174, 168, 150, 152, 129, 151, 172, 161, 155, 156,
155, 182, 163, 74, 75, 62, 33, 17, 110, 210, 180, 154,
180, 180, 50, 14, 34, 6, 10, 33, 48, 106, 159, 181,
206, 109, 5, 124, 131, 111, 120, 204, 166, 15, 56, 180,
194, 68, 137, 251, 237, 239, 239, 228, 227, 87, 71, 201,
172, 105, 207, 233, 233, 214, 220, 239, 228, 98, 74, 206,
188, 88, 179, 209, 185, 215, 211, 158, 139, 75, 20, 169,
189, 97, 165, 84, 10, 168, 134, 11, 31, 62, 22, 148,
199, 168, 191, 193, 158, 227, 178, 143, 182, 106, 36, 190,
205, 174, 155, 252, 236, 231, 149, 178, 228, 43, 95, 234,
190, 216, 116, 149, 236, 187, 86, 150, 79, 38, 218, 241,
190, 224, 147, 108, 227, 210, 127, 102, 36, 101, 255, 224,
190, 214, 173, 66, 103, 143, 96, 50, 2, 109, 249, 215,
187, 196, 235, 75, 1, 81, 47, 0, 6, 217, 255, 211,
183, 202, 237, 145, 0, 0, 12, 108, 200, 138, 243, 236,
195, 206, 123, 207, 177, 121, 123, 200, 175, 13, 96, 218};

This way of storing data might appear to be 2 dimensional but actually inside a computer it is stored in one dimensional array only, since the computer memory is just an ever increasing linear list of address spaces. This example below will help you understand how the addresses of a multidimensional photo are stored in a one dimension array in the computer memory.

Figure :How pixels are stored in memory

Coming back to the Abraham Lincoln example, if we want to add color to it then it becomes very complicated. Computers only read colors in an image as a series of three colors:

Red, Green and Blue (RGB color series)

All of these are stored in the same 0–255 scale, thus the computer has to store 3 extra color values for each pixel which makes it very complicated. If we want to color the Lincoln photo above, we need 12 x 16 x 3 = 576 values.

Figure: RGB colors

Thus, now we have got the basic understanding of how images are stored and pixels work. A lot of pixel values need to be stored and iterated over for every single image stored in the system which require advanced algorithms.

Computer vision earlier required a lot of manual coding by the developers but the evolution of computer industry has brought machine learning and deep learning algorithms into existence that have automated many of our tasks.

Evolution of Computer Vision:

Certainly, computer vision mostly involves finding specific shapes and patterns in the image data provided to the computer.

Earlier most of this work had to be done by the developers themselves, they had to do all the measurements and find patterns and then input data into the machine. The accuracy rate for identifying objects and patterns by the computers was low back then because getting access to some considerable amount of data to train the machine was difficult .

Then came machine learning algorithms that automated much of the process. It provided a different approach to computer vision problems, we could then use smaller programmed applications to identify specific patterns in images.

These were statistical learning algorithms that included logistic regression, decision trees, linear regression and Support Vector Machines (SVM).

Introduction of deep learning concepts to the world was when this industry actually started to boom because it has helped computers to identify objects and patterns with a 99% accuracy rate. It uses artificial neural networks to do measurements of edges, vertices etc and to identify all the patterns and objects in the images with high accuracy. The neural networks are very fast and can measure and analyze huge amounts of data in short periods of time. This helps to train the machine such that its recognition accuracy increases. One such neural network that has led to great advancements in the field of computer vision is the Convolution Neural Network.

--

--

Pranav Mendiratta
The Startup

Software Engineer @SAP Labs | Open Source | Web Developer | ML