How Machine Vision is Advancing Automation Now

By Jody Muelaner

Machine vision is a collection of technologies that give automated equipment (industrial or otherwise) high-level understanding of the immediate environment from images. Without machine-vision software, digital images would be nothing more than simple unconnected pixel collections having various color values and tone intensities to such equipment. Machine vision lets computers (typically connected to machine controls) detect edges and shapes within such images to in turn let higher-level processing routines identify predefined objects of interest. Images in this sense aren’t necessarily limited to photographic images in the visible spectrum; they can also include images obtained using infrared, laser, X-ray, and ultrasound signals.

Image of use of machine vision for more sophisticated robotics applicationsFigure 1: Use of machine vision for more sophisticated robotics applications is on the rise. (Image source: John6863373 | Dreamstime.com)

One fairly common machine-vision application in industrial settings is to identify a specific part in a bin containing a randomly arranged (jumbled) mix of parts. Here, machine vision can help pick-and-place robots automatically pick up the right part. Of course, recognizing such parts with imaging feedback would be relatively straightforward if they were all neatly arranged and oriented the same way on a tray. However, robust machine vision algorithms can recognize objects at different distances from the camera (and therefore appearing as different sizes at the imaging sensor) as well as in different orientations.

The most sophisticated machine vision systems have enabled new and emerging designs far more sophisticated than bin picking — perhaps no more recognizable than in autonomous vehicles, for example.

Image of machine vision gives systems high-level understanding of an environmentFigure 2: Machine vision gives systems (industrial or otherwise) high-level understanding of an environment setting from images. (Image source: Wikimedia)

Technologies related to machine vision

The term machine vision is sometimes reserved to reference more established and efficient mathematical methods of extracting information from images. In contrast, the term computer vision typically describes more modern and computationally demanding systems — including black-box approaches using machine learning or artificial intelligence (AI). However, machine vision can also serve as a catch-all term encompassing all methods of high-level information extraction from images; in this context, computer vision describes its underlying theories of operation.

Technologies to extract high-level meaning from images abound. Within the research community, such technologies are often considered as distinct from machine vision. However, in a practical sense, all are different ways of achieving machine vision … and in many cases, they overlap.

Digital image processing is a form of digital-signal processing involving image enhancement, restoration, encoding, and compression. Advantages over analog image processing include minimized noise and distortion as well as the availability of far more algorithms. One early image-enhancement use was correction of the first close-range images of the lunar surface. This used photogrammetric mapping as well as noise filters and corrections for geometric distortions arising from the imaging camera’s alignment with the lunar surface.

Image of Texas Instruments DLPC350 integrated circuit (IC) controllerFigure 3: The DLPC350 integrated circuit (IC) controller provides input and output trigger signals for synchronizing displayed patterns with a camera. It works with digital micromirror devices (DMDs) designed to impart 3D machine vision to industrial, medical, and security equipment. In fact, applications include 3D scanning as well as metrology systems. (Image source: Texas Instruments)

Digital image enhancement often involves increasing contrast and may also make geometric corrections for viewing angle and lens distortion. Compression is typically achieved by approximating a complex signal to a combination of cosine functions — a type of Fourier transform known as a discrete cosine transform or DCT. The JPEG file format is the most popular application of DCT. Image restoration may also use Fourier transforms to remove noise and blurring.

Photogrammetry employs some kind of feature identification to extract measurements from images. These measurements can include 3D information when multiple images of the same scene have been obtained from different positions. The simplest photogrammetry systems measure the distance between two points in an image employing a scale. Including a known scale reference in the image is normally required for this purpose.

Feature detection lets computers identify edges and corners or points in an image. This is a required first step for photogrammetry as well as the identification of objects and motion. Blob detection can identify regions with edges that are too smooth for edge or corner detection.

Pattern recognition is used to identify specific objects. At its simplest, this might mean looking for a specific well-defined mechanical part on a conveyor.

3D reconstruction determines the 3D form of objects from 2D images. It can be achieved by photogrammetric methods in which the height of common features (identified in images from different observation points) are determined by triangulation. 3D reconstruction is also possible using a single 2D image; here, software interprets (among other things) the geometric relationships between edges or regions of shading.

Image of 3D scanners capture 2D images of an objectFigure 4: 3D scanners capture 2D images of an object to create a 3D model of it. In some cases, the digital models are then employed to 3D print copies. (Image source: Shenzhen Creality 3D Technology Co.)

A human can mentally reconstruct a cube from a simple line-art representation with ease — and a sphere from a shaded circle. Shading gives indication of the surfaces’ slopes. However, the process of such deduction is more complicated than it seems because shading is a one-dimensional parameter while slope occurs in two dimensions. This can lead to ambiguities — a fact demonstrated by art depicting physically impossible objects.

Image of computerized determination of a workpiece’s 3D form from a 2D imageFigure 5: Computerized determination of a workpiece’s 3D form from a 2D image is fraught with challenges.

How machine-vision tasks are ordered

Many machine-vision systems progressively combine the above techniques by starting with low-level operations and then advancing one by one to higher-level operations. At the lowest level, all of an image’s pixels are held as high-bandwidth data. Then each operation in the sequence identifies image features and represents information of interest with relatively small amounts of data.

The low-level operations of image enhancement and restoration come first, followed by feature detection. Where multiple sensors are used, low-level operations may therefore be carried out by distributed processes dedicated to individual sensors. Once features in individual images are detected, higher-level photogrammetric measurements can occur — as can any object identification or other tasks relying on the combined data from multiple images and sensors.

Direct computations and learning algorithms

A direct computation in the context of machine vision is a set of mathematical functions that are manually defined by a human programmer. These accept inputs such as image pixel values to yield outputs such as an object’s edges’ coordinates. In contrast, learning algorithms aren’t directly written by humans but are instead trained via example datasets associating inputs with desired outputs. They, therefore, function as black boxes. Most all such machine learning now employs deep learning based on artificial neural networks to make its calculations.

Image of Banner Engineering iVu series image sensorsFigure 6: Image sensors from the iVu series can identify workpieces by type, size, location, orientation, and coloring. The machine-vision components can accept configuration and monitoring an integrated screen, remote HMI, or PC. Camera, controller, lens, and light are all pre-integrated. (Image source: Banner Engineering Corp.)

Simple machine learning for industrial applications is often more reliable and less computationally demanding if based on direct computation. Of course, there are limits to what can be achieved with direct computation. For example, it could never hope to execute the advanced pattern recognition required to identify individuals by their faces, especially not from a video feed of a crowded public space. In contrast, machine learning deftly handles such applications. No wonder then that machine learning is increasingly being deployed for lower-level machine-vision operations including image enhancement, restoration, and feature detection.

Improving teaching approaches (not algorithms)

The maturing of deep-learning technology has made apparent that it’s not learning algorithms themselves needing improvement but the way they’re trained. One such improved training routine is called data-centric computer vision. Here, the deep-learning system accepts very large training sets made of thousands, millions, or even billions of images — and then stores resultant information its algorithms extract from each image. The algorithms effectively learn by practicing worked examples and then referring to an “answer book” to verify whether they arrived at the right values.

An old story about the early days of digital pattern recognition serves as a cautionary tale. The U.S. military intended to use machine vision for target recognition, and defense-contractor demonstrations reliably identified U.S.-made and Russian-made tanks. Various tanks were all correctly differentiated from the supplier’s aerial photographs, one after the other. But when tested again with the Pentagon’s own library of pictures, the system kept giving wrong answers. The problem was that the defense contractor’s images all depicted U.S. tanks in deserts and Russian tanks in green fields. Far from recognizing different tanks, the system was instead recognizing different-colored backgrounds. The moral? Learning algorithms need to be presented with carefully curated training data to be useful.

Conclusion: vision for robotic workcell safety

Machine vision is no longer a niche technology. It’s seeing the most increased deployment in industrial applications. Here, the most dramatic development is how machine vision now complements industrial-plant safety systems that sound alarms or issue audio announcements when plant personnel enter a working zone without a hard hat, mask, or other correct protective equipment. Machine vision can also complete systems that announce when mobile machinery such as forklifts get too close to people.

These and similar machine-vision systems can sometimes replace hard guarding around industrial robots to enable more efficient operations. They can also replace or enhance safety systems based on light guards that simply stop machinery if a plant worker enters a workcell. When machine vision monitors the factory floor surrounding the workcell, it is possible for robots in such cells to gradually slow down as people approach.

As the designs of industrial settings evolve to accommodate collaborative robots and other workcell equipment that are safe for plant personnel to move around (even while that equipment operates) these and other systems based on machine vision will become a much more common part of factory processes.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of DigiKey or official policies of DigiKey.

About this author

Image of Dr. Jody Muelaner

Jody Muelaner

Dr. Jody Muelaner is an engineer who has designed sawmills and medical devices; addressed uncertainty in aerospace manufacturing systems; and created innovative laser instruments. He has published in numerous peer-reviewed journals and government summaries … and has written technical reports for Rolls-Royce, SAE International, and Airbus. He currently leads a project to develop a e-bike detailed at betterbicycles.org. Muelaner also covers developments related to decarbonization technologies.