Our high-magnification imaging method has been described previously [3
3. A. R. Wade and F. W. Fitzke, “In-vivo imaging of the human cone photoreceptor mosaic using a confocal LSO,” Lasers and Light in Ophthalmology 1998. (In Press).
]. For autofluorescence imaging, the patients’ pupils were dilated using 1% Tropicamide and imaged using the 488nm Argon laser of the Zeiss prototype cSLO with a scan angle of 40°. Images were recorded on a S-VHS video system and digitised in groups of 32 frames using a real-time frame grabber (Pulsar, Matrox Electronic Imaging Inc., Montreal, Canada) with an intensity resolution of 10 bits/pixel. Reflectance images of the region of interest were recorded first and a 521nm low-pass filter was then placed in the path of the photodetector to eliminate directly reflected light and pass only the light emitted by the auto-fluorescent substances in the retina. Images obtained with this filter in place are referred to here as autofluorescence (AF) image. All image processing software was written in-house using the Matrox Image processing Library (MIL) supplied with the digitising board and runs under Windows NT4 on a desktop Pentium PC at 133MHz. No additional image processing hardware was used.
The main stages in the image acquisition and processing procedure are summarised in Fig.1
2.1 Image alignment and averaging
A wide variety of techniques for image alignment exist. [4
4. L. G. Brown, “A survey of image registration techniques,” Computing Surveys 24, 325–376 (1992). [CrossRef]
] Many of them involve finding peaks in the cross-correlation between representation of two images, either in the spatial domain or in some transform that isolates components such as rotation and scaling. To date, most of these techniques have proved to be computationally expensive, although rapid image alignment techniques based upon them will soon fall into the domain of high-end workstations or desktop PCs using dedicated signal-processing hardware. In addition, registering very noisy images is a hard task because of the difficulty in accurately locating reference or ‘landmark’ features from one image to the next and algorithms designed for use with low-noise images often fail in these circumstances.
2.1.1 Pattern matching algorithm
The pattern-matching algorithm we use is part of the Matrox Imaging Library (Matrox Electronic Imaging, Quebec, Ca.) It is a variation of the well-known normalized cross-correlation peak detection algorithm which includes a hierarchical search strategy that dramatically increases its speed of operation.
At its core, a spatial cross-correlation pattern-detector that is searching for a model M in an image I must compute
Where X and Y are the height and width of the model and (p,q) is a position in the image I. This function is evaluated at all locations in I (width P, height Q) and the location which yields the maximum value of r is deemed to be the location of the model in the image. In order to eliminate the effect of linear intensity changes (offset and gain) in the image and model, the MIL algorithm evaluates a slightly more complicated normalised correlation function
With all summations being taken over X and Y in the model and P and Q in the image. In addition, to increase calculation speed the value of r2 rather than r is calculated.
For video-sized images, say 768x576 pixels, and a 200x200 pixel model, the calculation of r at every location presents a formidable computational task (of the order of 20 billion operations per image). The MIL algorithm derives its speed from its iterative, hierarchical search strategy. Low-resolution versions of the image and model are generated and the cross correlation is calculated to identify regions likely to contain matches. The resolution is then doubled and the search is performed again, concentrating only on those regions identified in the previous stage. Because halving the horizontal and vertical resolution reduces the search time by a factor of 16, the early stages of the search, which are performed at a resolution of perhaps 1/16 that of the original image, are extremely fast and the entire matching operation takes less than 100ms on our 133MHz Pentium PC. The danger in this approach is that true matches for small or finely structured models may be overlooked in the low-resolution stage of the search. However the algorithm is adaptive in this respect and adjusts its start resolution to take account of the size of the model and characteristics of the image in which it lies. In practice, with both IFA and high-magnification images we have found that the spatial structure of retinal images is such that large, unique features such as major blood vessels, the optic disk and the fovea provide excellent models for the pattern-matching algorithm to work with and searches almost invariably begin at the lowest resolution.
Two further refinements of the MIL algorithm are notable. Firstly, it contains an option to pre-analyse the likely shape of cross-correlation peaks in all images of a given type. This calculation takes approximately 200ms and needs to be performed only at the start of the alignment procedure when the model is allocated. If the likely shape of the correlation peaks is found to be several pixels wide (as is often the case in natural images), the search algorithm may use this information to restrict its search space once it finds a region which lies on the slope of a peak. Secondly, once a peak is found at single-pixel resolution, the algorithm can fit a curve to the peak shape and estimate a true position of the peak that will lie at some non-integer pixel value. The accuracy of this sub-pixel pattern-location depends on the amount of noise in the image and the feature can be disabled when dealing with noisy images.
2.1.2 Image pre-processing
A typical image from a high-magnification imaging session is shown in Fig. 2
Fig. 2. Raw SLO video frame
Raw images such as Fig. 2
contain extremely high levels of noise: a combination of normally-distributed electronic and thermal noise from the SLO photo-diode and Poisson-distributed photon noise. When presented with images of this type, the MIL algorithm fails, either finding false matches to the model or failing to find a match at all. One solution to this problem is to low pass filter the images to attenuate the high spatial frequencies where most of the noise is concentrated.
Our image alignment program begins each search stage by making copies of the frame to be aligned and convoluting it in the spatial domain with a broad (12x12 pixel) uniform averaging function. Smaller convolution kernels can be used for images with a strong signal or prominent landmark features but we have found that the 12x12 kernel guarantees a successful pattern matching operation in nearly all the image sequences so far analysed. Performing the convolution in the Fourier domain would result in a small speed increase and we intend to implement this in future versions of the program. The low-pass filtered copies are passed to the MIL algorithm whilst the actual alignment is carried out on the original frames. In this way, high-frequency information which may be present in the original raw frames is retained in the final, averaged image whilst the noise is attenuated for the purposes of pattern matching. Most of the execution time of the program is spent performing the blurring convolutions. Registration of standard, reflectance SLO images is much faster as these contain far less noise and require no low-pass filtering.
2.1.3 Model allocation
The first stage in the alignment process is the allocation of a model. A model can either be defined by the operator moving a selection box over a suitable region of the reference image or else the program can automatically allocate a model by using a MIL library routine to search the reference image for what it considers to be an distinctive pattern. Models are defined in the low-pass filtered versions of the original frames. The optimum size of the model region depends on the size of the potential ‘landmark’ features but it also influences the speed of the search. Small models set a limit on the minimum resolution at which the search algorithm can start and for this reason large models tend to be found faster than small ones.
We have found that a model size of 200x200 pixels works well with our 40° reflectance fundus images (where, for example, the pigmented macular region is around 150 pixels across) and this is the size of model we use for auto-fluorescence images. With high-magnification images, the choice of landmark features is usually more limited and the model size can be selected by the operator. Occasionally the operator will choose a model which cannot be detected in subsequent images; usually because the subject’s eye moves so far in a drift or saccade that the field of view no longer contains the original model region. For this reason, it is essential that the operator monitors the alignment process until it has finished.
2.1.4 Localised search
To gain a further increase in speed, the model search is split into two stages. Firstly, a small region of the video frame centered on the last-known position of the model is extracted and then a search is made for the model in this area. If no match is found, the entire frame is used. This strategy eliminates the need to search within the whole frame except in rare cases where a large translational movement occurs between frames. A significant improvement in speed results from this method - not only is the search algorithm faster within the small regions but the time taken to low-pass filter these regions is much reduced. Once the model is found in the target image, its position is compared with the original position of the model in the reference image and the program calculates the amount of translation required. The translated, unfiltered image is added to an accumulator buffer and the whole process is repeated until all of the images have been registered. The data in the accumulator buffer is then retrieved and scaled to produce the averaged image.
Our method assumes that only translational changes have occurred between the reference and target images. Cross-correlation peak detection in the spatial domain can fail when a rotated version of the original image is presented. The MIL routine tolerates between 10 and 12 degrees of rotation when the model has no obvious rotational symmetry. This has been found to be sufficient to ensure the pattern matching procedure works with all subjects encountered so far. Even greater tolerance to rotation can be achieved if a model with some rotational symmetry (for example, the fovea in an IFA image) is used. The MIL library contains additional routines to find patterns with an unknown amount of rotation as well as translation. However, we have not found that rotational compensation significantly improves the quality of averaged images and the images shown here were aligned using translational displacements only.