Computer Vision based Mouse using OpenCV

Pranav Mendiratta
13 min readJan 17, 2021

Technology used:

· Jupyter Notebook which is a part of Anaconda3–2020.02 for Windows_x86 64 bit version was used for compilation purposes for this project with Python 3.7.6 installed on the device.

· The code was written in Python which is a high level, interpreted programming language. Libraries/ modules used will be mentioned below.

Assumptions for this project:

· The camera is assumed to be static.

· It does not have any automatic regulations such as auto focus etc.

· The user is not moving in the frame. (he can sit anywhere near the laptop but shouldn’t appear in the detection frame)

· Color constraints for the background are not there but presumably it should be static. (no moving objects or colors changing in the background of the frame.

Python Libraries/ Modules used:

· Opencv:

Opencv is an open source computer vision library that was originally developed by Intel aimed at real time computer vision. This library is free for use under the open-source BSD license and is a cross platform library.

Opencv was officially launched in 1999 by Intel research, it was a part of series of projects by Intel that included 3D display walls and real-time ray trancing.

First alpha version of opencv was made public in 2000 followed by five beta releases between 2001 and 2005. 1.0 version was made available to public in 2006. After that a major release for this library was made in 2009 introducing new features.

In 2012 it was taken over by Opencv.org which is a non-profit organization and continues the development of this module.

This library was originally written in C++ but has bindings for several other languages such as Python, Java etc.

Opencv functions such as imshow(),rectangle(),cvtColor() and objects such as videocapture etc were used :-

  • cv2.imshow(windowName,image)- used to display an image in a window such that it automatically fits the window size.
  • cv2.rectangle(image, startPoint, endPoint, colour, thickness)- this method can be used to draw rectangle on any image.
  • videocapture: Videocapture is an opencv object that can be used to capture the video frame by frame. It usually has the device index to open up the camera or the name of a video file.
  • cv2.cvtColor(image_source, color_code)- This method is used to convert one colour space to another. There are about 150 colour codes available in opencv. For example cv2.COLOR_BGR2GRAY is a color code used to convert images to grey color.
  • cv2.threshold(source, threshold_Value, max_Value, thresholding_Technique)- Thresholding is done on grey scale images where the pixel values dis-similar to the threshold are set to 0 otherwise set to maximum value or vice-versa. This can be very useful in cases when we want to identify and separate out an object from its background.
  • cv2.THRESH_BINARY_INV: This is a type of thresholding technique that was used for this project. It is just opposite to the binary type threshold. In this technique, if the pixel values are greater than the threshold then it is set to 0 otherwise to max value(255 in this case).
  • cv2.boundingRect: It gives the co-ordinates of a bounding-box that has been made over an object in the image.
  • cv2.waitkey( [delay] ): This is a function that waits for a key event infinitely and has to be called periodically. Since we know that when a click is detected, the OS switched threads immediately. This function helps to delay that by specifying a delay time in milli-seconds(passed as argument). If no click is detected it returns -1 otherwise executes the code for click.
  • cv2.destroyAllWindows() : This function destroys the specific or all windows at the end of the program.
  • cv2.putText(image, text, Org, font, font_Scale, color[, thickness[, line_Type[, bottom_Left_Origin]]]) : This function helps to put a text string in the image. For this project it is used to put left-click or right-click messages in the image window.
  • cv2.flip( ): Used to flip the image to reverse lateral inversion.
  • cv2.circle(image, center_co-ordinates, radius, colour, thickness): Used to draw circles in the image window, for this project it helped to mark the finger tips and convexity defects.

· Numpy:

Numpy is a library in python that provides support for large multi-dimensional arrays and matrices. Also provides large number of mathematical functions to operate on these arrays.

In 2005, Travis Oliphant wanted to combine the Numarray and the Numeric library to combine their functionalities of working on both larger and smaller arrays but initially these were a part of the larger SciPy package thus Numpy 1.0 was released as a separate package in 2006.

It can be used in computer vision to make advanced mathematical calculations and to identify hand by using angles between fingers etc.

· Time:

This module is used to complete time related task. It also has a time() function that returns time elapsed since last epoch. Answer returned is in seconds.

For computer vision it is used to record the click time and perform a click operation at the particular time.

· PyAutoGUI:

PyautoGUI is module/ library in python help to control the keyboard and mouse to automate the interactions with other applications. It can run on Python 2 and 3.

For this project it is mainly used to control the mouse movement and the mouse clicks. It also provides a fail-safe function. If the mouse is moving around then is very difficult to close the program, this is where the fail-safe function comes into play.

  • Pyautogui.FAILSAFE: It is called by default when a pyautogui function is called. If the mouse cursor is in left corner or for some cases any of the four corners of the primary monitor, then it will raise a pyautogui.FailSafeException. There might be one-tenth second delay to trigger the fail-safe such that time is there for the user to slam the cursor in the corner after pyautogui function is called. It can be disabled by typing pyautogui.FAILSAFE=False.
  • pyautogui.click(): Used to perform click operations in the computer. You can also specify right or left click to perform those specific operations.
  • Pyautogui.moveRel(x , y): This functions instead of using top left corner as the reference point, takes the current position of the mouse cursor as the reference and moves in relation to its current position.

· Math:

As we have imported the numpy library, we also need the math library to perform advanced mathematical operations to compute the rgb, pixel and threshold values etc in the neural networks. It also helps in debugging and tuning of the data and the program itself.

Line Detection in Opencv:

Line detection in opencv can be done by two ways but for this project we are mainly going to use the second method.

1. Houghline Method:

A line can be represented as y=mx+c where m is the slope of the line and x,y are the co-ordinates of the line and c represents the constant value. It can also be represented in a parametric form r=x cosθ + y sinθ where θ is the angle formed by this line with the horizontal axis and r is the perpendicular distance from origin. This is when we see the coordinate system counter-clockwise as seen by opencv.

Figure: Line Of the form (r, θ)

· Firstly, we create a 2D array and initialize it with zero. This called an accumulator.

· Columns are denoted by θ and the rows are denoted by r.

· Size of the array solely depends on the accuracy required, if an accuracy of 1 degree is required then an array of 180 columns is required.

· r is determined by the diagonal length of the image. Thus, taking a one pixel accuracy we get number of rows as the diagonal length of the image.

In simple words, the system identifies the starting of a line. Using the (x, y) values, different θ values are plotted and values of r corresponding to them are taken out. Indexes corresponding to these values are incremented in the accumulator. Thus the one with the highest incremented values gives us the co-ordinates from origin of a line that has been detected.

Figure: Image
Figure: Line detection
Figure: Line Detection

2. Canny Detection:

This is the one that was actually used for this project. Canny is the base algorithm for image processing and this method can be used for line, edge as well as contour detection in an image. It uses Sobel and Houghline method in the background to extract information from “cannied” images. It has 5 steps:

* It is important to convert the images to greyscale(done by the system itself) before any detection is done because grey scale helps distinguish better between our target object and the surroundings by classifying them as black or white.

  • Noise reduction:

If we apply edge detection algorithm on an image with high resolution then we might detect too many objects which we might not be interested in or else if we blur the images too much, we might lose the data.

Blurring is one way of reducing the noise in the image. But blurring can also be of many types such as Average, Bilateral, Median or Gaussian Blur. For this project we use the gaussian blur.

  • Gaussian Blurring is nothing but applying a kernel (filter) to the image whose values have been generated by a gaussian function thus requiring sigma value as a parameter.
Figure: Gaussian blur plotted

As you can see in the image that the values go higher in the middle and lower on the side thus helping to amplify the target object such as a hand.

Figure: After blur
  • Gradient Calculation:

Detects the edges intensity and direction of the given image with the help of edge detection algorithms. Edge relates to change of pixel intensity. The most simple way to do that is by applying filters/ kernels to highlight changes in both directions: vertical (y) and horizontal (x).

Figure: After gradient calculation

· Non Maximum Suppression:

This is done to thin out the edges after gradient calculation has been done. The algorithm goes through all the points on the gradient intensity matrix to find the pixels with maximum values in edge direction.

Figure: non maximum suppression

In this image, an intensity matrix is being processed. We can understand this by looking at the single pixel value in the upper left corner that is being processed.

Figure: Upper left corner red box

As we can see in this image, the edge direction that is being represented using the orange dotted line above is checking the pixels values in the edge direction. If the pixel values in the edge direction are more intense, then they are kept otherwise discarded to thin out the edges. Similar work is done on the pixel vertically and diagonally.

· Double Threshold:

  • High threshold helps to identify the strong pixels in the image. (intensity higher than the high threshold)
  • Low threshold helps to identify the non-relevant pixels. (intensity lower than the low pixels)
  • The pixels having intensities between high and low are labelled as weak thresholds.
  • Edge tracking by Hysteresis:

Based on the results of double threshold, Hysteresis works upon the pixels again. It converts the pixel from weak to strong if and only if it has at-least one strong pixel around it.

Figure: Hysteresis
Figure: Hand detection

Edge Detection in Opencv:

Edge detection is one of the most important aspects of image processing since it helps us reduce amount of data (pixels) and amplify the structural aspect of the image.

· Sobel Edge Detection:

This edge detection method is based on gradient detection that we saw earlier. It is based on calculating first order derivatives of the image separately between the x and the y axis.

The operator uses a 3X3 kernel (filter) for convolution with the original image to calculate approximations of both the horizontal and the vertical derivatives.

· Laplace Edge Detection:

This uses only one filter (kernel). It is capable of calculating second order derivatives in a single pass.

Figure: Edge detection

Contour Detection:

Contour is a closed curve joining all the continuous points having the same intensity or color. It can be very useful in object detection and shape analysis. Steps to perform contour detection for this project-

1. Binarization (which should be a result of edge detection or threshold image):

In binarization, 1 is set to all the pixels between high and low threshold and 0 to all the pixels below the low threshold.

After binarization, there might be a need to reduce the noise caused by the false positives. Thus, we can clean the image, remove false positives and apply an opening operator along with a 3X3 structuring element.

Figure: Drawing Contours

2. Finding contours using the “cv2.findContours()” function in opencv:

As we know that the image is already converted to grayscale beforehand and binarization is performed on the image, thus amplifying the target objects by making them white and the rest of the background gets blacked out. This very helpful while finding the contours in the image using the cv2.findContours() function.

3. Showing the contours:

The contours are detected and shown using the “cv2.drawContour()” and the “imShow()” function.

Hand Identification:

After the hand contours are taken out using the “cv2.findContours()” function, we try to look for the smallest convex set containing hand contour using the “convexHull” function.

Figure: Drawing Contours

Then we try to build a rectangle around the convex hull that helps us the locate the center of the hand and thus helps in hand identification.

Technical terms used here-

· Convex Hull: A convex is an object that is having no interior angles greater than 180 degrees. Hull means an exterior boundary to an object or shape.

Therefore Convex Hull means applying a tight fitting convex boundary around a shape or an object. It uses various algorithms such as Gift wrapping algorithm, Chan’s algorithm, Graham scan etc.

Finger identification:

  • Finger identification has been done using the “convexityDefects” function in this project. (Convexity defects refer to the cavities inside the outer boundary of the convex hull but they do not belong the target object). We use this function to get all the defects of the contour and save them in an array.
  • Usually we get more points than we need thus filtering is required and that is done on the basis of distance of each point from the bounding rectangle obtained through the convex hull. This time the lowest points from the rectangle are only stored in another array.
  • Then the two array (closest and farthest points to the convex hull) are combined to generate each fingertip by choosing an appropriate farthest point and an appropriate closest point that represents the space between the two fingers.

To check that no false fingers are detected, we check the following conditions:

1. Angle between the three points should always be within the specified limit in the program. (b/w the fingertip and lowest concavities)

2. The y co-ordinate of the fingertip should not be lower than the y co-ordinate of the convex hull rectangle as well as the concavity-defects.

3. Distance between the approximate center of the hand (found using the convex hull) and the fingertip should be equal to or greater than the specified limit.

Opencv functions such as “cv2.circle()” and “cv2.line()” are used to draw circles and lines on the image to mark points while hand identification.

-> You can draw gesture of 2 to move the mouse around, 5 for a left click and 4 for a right click.

Figure: Left click using defects in hand

Future applications for this technology:

a) Virtual Reality: Demand for gesture based input systems for augmented reality applications has risen over the years. Virtual reality interactions use hand gestures to enable almost realistic manipulation of virtual objects or to simulate any 3D interaction.

b) Robotics and Telepresence: Telepresence and Telerobotic applications that are typically placed within the sphere of space explorations and military based research projects. Hands and arms are used to control the robots using live video feed in a virtual reality environment however in this case the robots are full immersed in the real world doing what is being inputted.

c) Desktop and PC applications: This technology can be used to replace the traditional mouse and keyboard with gesture based input systems to make the human computer interactions feel more natural for everyone.

d) Games: Gesture based computer games have been on the rise for quite some time now and has now turned into a multi-billion dollar industry. Microsoft Xbox’s Kinect games and Nintendo’s Wii provides gesture based gaming experience.

e) Sign Language: Sign language are highly logical for these gesture based input systems. These can helps to train the vision systems since these languages are highly structural. They can also help disabled and deaf people to interact with humans and computers.

Disclaimer - No copyrights intended on any images. Only used for educational purposes.

--

--

Pranav Mendiratta

Software Engineer @SAP Labs | Open Source | Web Developer | ML