A brief survey of the various algorithm-based capture and processing techniques that underpin most digital image processing, and what might be next.

The term ‘computational photography’ is relatively recent but you can expect to hear more of it in the future – along with the related terms ‘AI’ and ‘deep learning’. These terms all refer to processes that use algorithms and computer processing instead of the optical and chemical processes that dominated photography throughout the 20th century.

Some computational technologies have been with us from the time digital cameras were first developed. They include basic in-camera processes like interpolation, demosaicing, JPEG compression and colour management.  Algorithms also play important roles in more recent additions like video compression, the production of digital panoramas and the application of ‘artistic’ effects to JPEG files.

JPEG compression is just one of the computational functions that has been with us since the beginning of digital photography.

But that’s just the start. Developments continue to occur in all these areas as processing platforms are extended, sensor resolutions increase and processor chips gain increasing speed and power.

Key drivers for innovation have been faster sensors and processors that offload data much more rapidly. These have enabled developments in burst capture, autofocusing and the ways in which image data is handled in the camera.

The increase in cameras’ burst speeds is significant, even though it’s been limited  to some degree by the resolution of the camera and the size of the sensor. Burst speeds tend to decrease as resolutions rise, a natural consequence of the amount of data that has to be handled.

Nonetheless, we’re seeing 20-megapixel cameras go from burst speeds of fewer than five frames/second (fps) less than a decade ago to 18-20 fps at full image resolution in some of the latest models. Buffer capacities have had to be increased to accommodate such huge data flows and processing has had to become faster.

These factors have stimulated additional developments. With more processing power and more in-camera storage for programs, new applications are appearing continuously, some of them with exciting potential to change the ways in which we take pictures.

Some of these developments have been driven by smartphones, whose arrival on the market has decimated sales of fixed-lens cameras. However, ‘intelligent’ processing is being adopted all the way up the imaging hierarchy. 

Flow-on from Smartphones
Developments in smartphones have largely resulted from the need to overcome the limitations of the small sensors and tiny lenses in smartphone cameras.  Manufacturers are competing fiercely to produce devices with ‘the best’ image quality.  Computational photography is playing a key role in this race.

Adding processor power is the first critical step. This can come in the form of integrating a camera chip with RAM, a trend that is expected to grow in the future. Google’s latest phones use a separate camera processor that can process HDR+ images five times faster than phones with a single CPU.

The latest Pixel phones can also record multiple shots of the same object or scene and filter out the blurry ones before combining the sharp ones to create a detailed, properly lit image. The algorithm has been trained on a data set containing millions of photos.  The camera also uses machine learning (a subset of AI) to predict which areas should stay sharp and which need to be blurred. This processing is all done ‘seamlessly’ so users are unaware of it.

Sony has developed a three-layer CMOS sensor with a dedicated DRAM chip to speed up the processing of high-resolution photos. The latest Asus ZenFone 5Z uses an AI-powered scene mode that will select a scene, based on user preferences and then adjust the settings accordingly. The Samsung Galaxy S9 uses a multi-frame noise-reduction system that captures 12 frames in quick succession and combines them to create better low-light shots.

Many smartphone cameras can now record depth information as an additional data channel. This provides information about where the objects in a scene were located in three-dimensional space.

Custom chips, advanced Depth Range Masking software and new algorithms allow users to make precise selections and isolate subjects from their backgrounds.

Multi-frame capture is used to create an extended depth of field when shooting close-ups. This illustration shows how frames are stacked, with the in-focus areas selected out and combined to make the final image. 

Sony is using AI technology to enable its smartphones to determine what kind of scene the sensor is detecting in order to adjust the capture settings for the best possible shot. Dual cameras are being used by some smartphone manufacturers to record data that can be manipulated to enhance depth of field, add more optical zoom, capture wide-angle shots and/or permit the recording of true monochrome photos. The ability to process data rapidly is critical to success.

Interestingly, many of these technologies are starting to be used in regular cameras, where they can also support in-camera combination of multiple frames for obtaining a shallow depth of field as well as creating an extended depth of field for shooting macro images. Multi-frame capture and frame combination are also used for producing image files with higher resolution than the camera’s native resolution. Most of these processes have long been available in post-capture editing software through general-purpose applications like Photoshop, Lightroom, PaintShop Pro and other similar applications.

Face and eye tracking are well-established technologies that make it easier to capture people in complex lighting or positions. Subject tracking, an evolution of face tracking, enables the focus to track moving subjects by adjusting the AF point to a target moving across the frame.

Modern AF tracking systems can lock onto and track fast-moving subjects thanks to algorithms that analyse the scene and identify the positions of key elements in the frame.

This type of technology has been involved in the creation of pre-set recording modes that narrow the range of adjustments to allow rapid frame capture for specific types of subjects. Some of these systems take metadata from the image and use it to actively track subjects, making the work of sports and action photographers easier and ensuring higher success rates when they shoot.

In the latest mirrorless cameras, computational photography has led to significant expansions of the in-camera autofocusing accuracy and flexibility. Notable recent releases include the Sony α6400, which features AI-powered autofocus and the Olympus OM-D E-M1X, which offers AF Target modes with ultra-fast focus and tracking based upon pre-programmed parameters.

This sequence of shots was taken using the Airplanes setting in the AF Target modes in the Olympus OM-D E-M1X. It was captured with the M.Zuiko Digital 300mm f/4 PRO lens, which has an angle of view of 4.1 degrees

Digital image stabilisation (IS) is another instance of the use of computational photography. Also known as electronic image stabilisation (EIS), this technique shifts the video frame image to compensate for camera movements. Because it uses pixels outside the border of the visible frame to provide a buffer for the motion, the frame is cropped to some degree.

This technique does not change the noise levels in the image, except at the extreme borders of the frame where the image has to be extrapolated. It has no effect on existing motion blur, which can result in an apparent loss of focus as the motion is compensated.

Digital IS usually requires the image frame to be cropped to allow for adjustments to ensure the multiple frames are correctly aligned. Data around the periphery of the frame will be sacrificed.

Some stills cameras use digital signal processing (DSP) for shake reduction. It works by sub-dividing the exposure into several shorter exposures recorded in rapid succession. The blurred ones are discarded, while the sharpest sub-exposures are re-aligned and added together. A gyroscope in the camera is used to detect the best time to take each frame.

Computational processing techniques are also vital for video. The multi-shot capture technologies that underlie Panasonic’s 4K Photo modes (which other manufacturers are beginning to emulate) rely on algorithms, coupled with fast image processing chips to support rapid in-camera image creation and output. These modes record video clips for output as still images.

Pixel binning, pixel skipping and oversampling are all technologies used to process video clips recorded with cameras whose sensor resolution is larger than the output frame resolution. They also depend on computational algorithms.

Pixel binning is the simplest and involves combining values from groups of sensors (usually  a 2×2 grid) as they are read but before they are subjected to demosaicing. This reduces the effective frame resolution to one quarter of that of the sensor, which is required for 4K video output.

Pixel binning combines the raw data from four pixels to produce a single pixel with averaged hue and brightness values.

Pixel skipping involves literally skipping over some sensor values and producing the image frame from only those pixels sampled. In this process, data is discarded, although not in quite the same way as it is during JPEG compression.

Most cameras use pixel skipping because it enables them to maintain the same lens field of view for stills  shots and video without requiring additional processing. Without it, video frames would be cropped, resulting in an effective extension of the focal length of the lens.

The higher the resolution of the sensor the more pixels are skipped, which can increase the chances of moiré appearing. Most camera manufacturers add blurring to suppress moiré but there’s a fine line between achieving just enough blurring to be effective without reducing the image quality.
Pixel binning has one clear advantage over pixel skipping because in low light levels, the sensor sensitivity can be increased without  reducing image quality. Effectively,  combining raw data from four pixels into one means more light is captured without increasing image noise.

The resulting images will have lower noise, a slightly better dynamic range and smoother gradation between colours, although the differences are relatively small.  But both pixel binning and pixel skipping will have to deal with similar moiré and anti-aliasing issues.

Oversampling involves reading all the image data. This is normally more pixels than the final output format supports so processing must also involve  downsampling. This is done after demosaicing. Oversampling is seldom applied inside cameras because the computational complexity requires very high processing speeds and capacities. It can also emit a lot of heat, which is difficult to disperse from the camera body.

Where next?
Physically, we are close to the technological limits of both lenses and mechanical components like shutter mechanisms. Glass isn’t getting any clearer, and our vision isn’t becoming any more acute.  Essentially, the future of photography must be computational.

Future developments will likely be done on a cost/benefit basis, based on the size/price/performance triangle. It will be important to keep prices affordable while producing equipment that delivers high performance while remaining portable enough for photographers to use.

This might mean the new cameras and lenses will have the same specifications as last year’s models when it comes to megapixel counts, ISO ranges, f-numbers, etc. That’s fine; the market is already mature  so it is to be expected (and the advantage for photographers is that you won’t need to upgrade your gear as frequently).

That creates more problems for manufacturers than it does for camera users, since many of the former have tended to produce a stream of products in recent years, with each new generation differing only marginally from its predecessor. Features may tend to flow down from high-end models, where they first appeared, until they become ubiquitous. But that doesn’t guarantee better cameras for end users.

With the focus swinging to computational photography, what cameras and lenses can DO with light will become the main focus of both buyers’ and manufacturers’ attention. This will place an increased importance on the ability to upgrade existing equipment through firmware updates.

Building new firmware to increase the capabilities of existing equipment may not counteract the on-going slide in camera sales as shown in the graphs below.  But it could help to keep loyal customers in a manufacturer’s ‘stable’ and is likely to increase sales of lenses and accessories for that particular brand.

This graph shows the dramatic fall in shipments of fixed-lens cameras since the introduction of smartphones with capable built-in cameras. (Source: CIPA.)

This graph depicts the movement of interchangeable-lens camera buyers from DSLRs to mirrorless models. (Source: CIPA.)

With slowing shipments of cameras, you can expect to see more firmware updates being released as manufacturers develop new applications and algorithms that can enhance the enjoyment you get from using your camera equipment. And keep an eye out for new lenses and accessory items, both areas where there is still potential for future development – particularly with respect to mirrorless cameras.