1. Introduction
Tip-enhanced fluorescence microscopy (TEFM) is a type of apertureless near-field scanning optical microscopy (ANSOM) that utilizes fluorescence to generate an image. By aligning the sharp tip of an atomic force microscope (AFM) probe into the focus of a laser beam with axial polarization, enhanced fields are generated at the apex of the tip [
1
L. Novotny, R. X. Bian, and X. S. Xie, “Theory of Nanometric Optical Tweezers,” Phys. Rev. Lett.
79, 645–648 (1997). [CrossRef]
], as shown in
Fig. 1. This field enhancement is tightly confined to the vicinity of the tip apex and has been shown to decay rapidly as
r
-6 with distance
r from the tip apex [
2
Z. Ma, J. M. Gerton, L. A. Wade, and S. R. Quake, “Fluorescence Near-Field Microscopy of DNA at Sub-10 nm Resolution,” Phys. Rev. Lett.
97, 260801 (2006). [CrossRef]
]. These enhanced local fields can be used to beat Abbe’s diffraction limit, and various scattering processes (e.g. one- and two-photon fluorescence, Raman scattering, infrared spectroscopy, and Rayleigh scattering) have been used to image a range of samples with nanoscale resolution [
2–12
Z. Ma, J. M. Gerton, L. A. Wade, and S. R. Quake, “Fluorescence Near-Field Microscopy of DNA at Sub-10 nm Resolution,” Phys. Rev. Lett.
97, 260801 (2006). [CrossRef]
]. Much of the work with ANSOM to date has been on samples composed of isolated particles/molecules (e.g. fluorophores, quantum dots, nanotubes) due to the fact that ANSOM suffers from a relatively large background signal that arises from direct (non-enhanced) scattering from the laser beam. Thus, high density samples are challenging for ANSOM analysis since the background signal increases with the number of particles in the laser spot, while the tip-enhanced signal does not. This has so far prohibited the application of ANSOM to biological samples composed of a high density, heterogeneous ensemble of fluorescently-tagged biomolecules, including proteins, lipids, and nucleic acids.
Recently, a number groups have investigated various means of increasing the degree of field enhancement, including optimizing the shape of the tip to leverage plasmon and antenna resonances. These efforts have already been fruitful for increasing the enhancement, and will impact both ANSOM and sensor applications [
13
L. Novotny and B. Hecht, [i]Principles of Nano-Optics[/i] (Cambridge, 2006).
]. To complement these studies, it is also important to understand how much enhancement is required to image high-density samples with sufficient contrast to resolve individual molecules within the ensemble. It has been pointed out that for dense samples, the minimum (intensity) enhancement needed to achieve sufficient image contrast ultimately depends on the
nth
root of the ratio of the area of the illuminated spot to the area under the tip, where
n is the order of the scattering process being employed [
13
L. Novotny and B. Hecht, [i]Principles of Nano-Optics[/i] (Cambridge, 2006).
]. Naturally for linear scattering processes such as one-photon fluorescence, larger enhancement factors are needed compared to higher-order processes, such as two-photon fluorescence or Raman spectroscopy. In this paper we specifically investigate the limits of TEFM with regard to its potential for imaging high-density samples. In particular, we use a theoretical model based on experimental measurements to show that sufficient contrast can be obtained even for the relatively simple case of commercially available silicon tips and one-photon fluorescence.
Fig. 1. (Color Online) Experimental setup for TEFM. Labeled elements are as follows: He-Ne Laser — helium-neon laser (λ=543 nm); Mask — laser-beam mask; RPC — radial polarization converter; DM — dichroic mirror; OBJ — microscope objective; Probe — AFM probe; PZT — piezoelectric transducer; SF — spectral filters; APD — avalanche photodiode; LA — lock-in amplifier; DDS — digital synthesizer; PC — personal computer. The arrows indicate the polarization of the laser beam. Axial polarization at the sample plane can be achieved either by simply focusing a radially polarized laser beam, or by placing a laser beam mask before the microscope objective such that only super-critical rays are allowed to propagate. This focused total internal reflection fluorescence (TIRF) set-up is sometimes used because of its broadband capabilities and its large focal spot (~1.5 µm×0.5 µm) lends itself to easy tip alignment, while radial polarization is preferred for smaller focal spots, ~(250 nm)2.
2. Contrast in TEFM
In TEFM, the laser stimulates two distinct fluorescence signals: the far-field signal,
Sff
, resulting from direct illumination of fluorophores within the laser focus, and the near-field signal,
Snf
, resulting from field enhancement at the tip apex. The resolution of
Sff
is at best diffraction limited, while
Snf
has resolution given primarily by the sharpness of the tip [
4
J. M. Gerton, L. A. Wade, G. A. Lessard, Z. Ma, and S. R. Quake, “Tip-enhanced fluorescence microscopy at 10 nanometer resolution,” Phys. Rev. Lett.
93, 180801 (2004). [CrossRef] [PubMed]
].
Figure 2 shows a cartoon image composed of the superposition of
Sff
and
Snf
as well as a simulated profile through its center. While not shown, we also assume some noise in the far-field signal. Within this context, contrast (
C) and signal-to-noise ratio (
SNR) are defined as:
where σff
is the standard deviation (noise) in the far-field background. The near-field signal originates from a small area on the sample surface (atip
) given by the near-field interaction zone, which is determined mostly by the tip sharpness, while the far-field background originates from a much larger area (A) given by the size of the laser focus. The total fluorescence signal for a given pixel of the raster-scanned image, Speak
, is simply the sum of all photons collected during the pixel acquisition time (τ). The far-field signal Sff
is proportional to the number of fluorophores in the focal area of the excitation beam, NFA
, and also to a dimensionless parameter k that characterizes the total efficiency of the system: Sff
=kNFA
.
Fig. 2. Cartoon of a fluorescent particle imaged by TEFM and the corresponding signal profile.
The probability of an illuminated fluorophore emitting a photon follows a Poisson distribution, such that the expected average number of counts in the time interval τ is simply Sff
. The standard deviation is given by
. In the limit of a single fluorophore in the near-field zone, Snf
=f k, where f characterizes the fluorescence signal enhancement induced by the tip, and is a function of several parameters related primarily to its geometry and material properties. In this limit, the peak signal is given by Speak
=(f+NFA
)k. The overall system efficiency k is given by
where I
0=P
0/A is the intensity of the laser beam with power P
0 in a focal spot of area A; σ
0 is the absorption cross-section of the fluorophore; τ is the pixel acquisition time; Q is the quantum yield of the fluorophore; CE is the collection efficiency of the detection system; and hc/λ is the energy of a photon with wavelength λ. A green He-Ne laser (λ=543 nm) was used for these experiments due to its low cost and the availability of fluorescent dyes and quantum dots with strong absorption at this wavelength. Although we have not done careful studies of tip-enhancement as a function of excitation wavelength, we do not expect a strong dependence since the dielectric function of silicon is fairly flat over visible wavelengths.
The lower limit for detection of a near-field signal arises from the requirement that the signal-to-noise ratio (SNR) be larger than unity,
Below this limit, the near-field signal is indistinguishable from stochastic fluctuations of the far-field background. On the other hand, to produce an image that can be interpreted visually dictates a more stringent requirement, namely that the contrast (C) be larger than unity,
In this model, it is straightforward to evaluate the minimum enhancement required for sensitivity to a single fluorophore within a dense ensemble. A practical limit on density arises from the requirement that the average spacing between fluorophores be no smaller than the microscope resolution, which is given by the near-field interaction zone. In this limit,
NFA
=
A/
atip
, where
A is the area of the laser focus. Using the focused-TIRF scheme described above,
A=0.75
µm
2 and
atip
=100 nm
2, which suggests a signal enhancement of
f>7500 is needed to achieve contrast greater than unity. Employing a radially polarized laser beam yields a smaller focus spot,
A=(250 nm)
2 [
14
R. Dorn, S. Quabis, and G. Leuchs, “Sharper Focus for a Radially Polarized Light Beam,” Phys. Rev. Lett.
91, 233901 (2003). [CrossRef] [PubMed]
], thus reducing the required enhancement to
f~600. Silicon tips are only capable of producing an enhancement factor of
f~20 [
4
J. M. Gerton, L. A. Wade, G. A. Lessard, Z. Ma, and S. R. Quake, “Tip-enhanced fluorescence microscopy at 10 nanometer resolution,” Phys. Rev. Lett.
93, 180801 (2004). [CrossRef] [PubMed]
], well below these requirements. Simple, non-optimized metal tips have been predicted to yield enhancement factors of
f~3000 [
1
L. Novotny, R. X. Bian, and X. S. Xie, “Theory of Nanometric Optical Tweezers,” Phys. Rev. Lett.
79, 645–648 (1997). [CrossRef]
] and optimized metal tips that leverage antenna resonances may yield even larger enhancement factors. Although metal tips can produce much larger field enhancements than silicon, they also strongly quench fluorescence, leading to an overall reduction in the fluorescence signal and an associated decrease in the contrast. In several previous reports, silicon tips were found to yield the largest net contrast since no quenching was observed [
2
Z. Ma, J. M. Gerton, L. A. Wade, and S. R. Quake, “Fluorescence Near-Field Microscopy of DNA at Sub-10 nm Resolution,” Phys. Rev. Lett.
97, 260801 (2006). [CrossRef]
,
4
J. M. Gerton, L. A. Wade, G. A. Lessard, Z. Ma, and S. R. Quake, “Tip-enhanced fluorescence microscopy at 10 nanometer resolution,” Phys. Rev. Lett.
93, 180801 (2004). [CrossRef] [PubMed]
,
5
C. Xie, C. Mu, J. R. Cox, and J. M. Gerton, “Tip-enhanced fluorescence microscopy of high-density samples,” Appl. Phys. Lett.
89, 143117 (2006). [CrossRef]
].
At first glance, the required signal enhancements predicted above cast a shadow on the potential application of TEFM to biological systems. As discussed below, however, the contrast can be improved dramatically by oscillating the AFM probe, which induces an associated modulation in the fluorescence signal, and by the subsequent application of a phase sensitive demodulation algorithm, such as lock-in amplification. Modulation/demodulation schemes are used widely in many areas of small signal processing and have also been used before in near-field microscopy [
2
Z. Ma, J. M. Gerton, L. A. Wade, and S. R. Quake, “Fluorescence Near-Field Microscopy of DNA at Sub-10 nm Resolution,” Phys. Rev. Lett.
97, 260801 (2006). [CrossRef]
,
4
J. M. Gerton, L. A. Wade, G. A. Lessard, Z. Ma, and S. R. Quake, “Tip-enhanced fluorescence microscopy at 10 nanometer resolution,” Phys. Rev. Lett.
93, 180801 (2004). [CrossRef] [PubMed]
,
5
C. Xie, C. Mu, J. R. Cox, and J. M. Gerton, “Tip-enhanced fluorescence microscopy of high-density samples,” Appl. Phys. Lett.
89, 143117 (2006). [CrossRef]
,
15–18
B. Knoll and F. Keilmann, “Enhanced dielectric contrast in scattering-type scanning near-field optical microscopy,” Opt. Commun.
182, 321–328 (2000). [CrossRef]
]. The analysis below demonstrates the limits of this approach for TEFM.
3. Improving contrast via phase sensitive demodulation
To calculate contrast and signal-to-noise ratio for the case of an oscillating tip, Eqs. (
4) and (
5) must be modified to account for the fact that the tip only intermittently contacts the sample at a particular phase of its oscillation cycle. To discuss the dependence of the near-field signal on the instantaneous height of the oscillating probe, it is useful to consider the arrival of each photon in a phase-space picture. In this scenario, each photon is assigned an angle
θi
corresponding to the instantaneous phase of the sinusoidal tip-oscillation function at the time of detection (
Fig. 3). The photon phases can be mapped to the corresponding tip-sample separation if desired.
Since the sample remains under direct laser illumination whether the tip is oscillating or not, the far-field signal for an oscillating tip is unchanged,
Multiple scattering of far-field photons between the tip and sample can lead to variations in the background intensity as a function of the tip height. However, these variations have been measured to be very small (<5%) for the tip-oscillation amplitudes employed here, and are thus neglected. Therefore, we assume that the far-field signal for an oscillating tip is unchanged compared to an absent tip or one which is in constant contact with the surface.
In phase-space, the maximum near-field signal occurs at a preferred phase θp
corresponding to tip-sample contact, and the photons are approximately Gaussian distributed around θp
. To find the total number of near-field photons for a given pixel, Sosc
nf
, the ratio γ defined as the number of photons collected in one oscillation cycle relative to the number that would have been collected had the tip been at the surface the entire time is calculated:
where
θσ
is the standard deviation of the photon-phase distribution, which can be obtained experimentally and is a function of oscillation amplitude. The approximation in Eq. (
7) holds in the limit that the integration limits are extended to ±∞, or equivalently when
θσ
<
π/3. The near-field signal for an oscillating tip is then given by
Fig. 3. (Color Online) Phase-space plot showing how photon arrivals (vertical lines) are correlated to tip-oscillation phase. Squiggly arrows represent photons emitted from fluorophores within the laser focus. Higher photon count rates occur at a preferred phase θp
corresponding to tip-sample contact, resulting in the strongest near-field signal.
Using the definitions for the oscillating signals in Eqs. (
6) and (
8), both the contrast and
SNR for images produced by an oscillating tip (tapping mode TEFM) can now be calculated
where the subscript “sum” indicates a direct sum of the photon signals. Not surprisingly, without demodulation the contrast and SNR have been reduced by a factor of γ compared to the non-oscillating scenario since the total number of near-field photons has decreased.
Lock-in amplification is a particularly powerful phase-sensitive demodulation technique that decomposes a modulated signal into real and imaginary components that are proportional to the cosine and sine projections in phase space, respectively. In TEFM, each detected fluorescence photon can be viewed as a unit vector pointing in the direction
θi
equal to the instantaneous phase of the tip oscillation at the time of detection (
Fig. 4). In this picture, a lock-in amplifier simply performs a vector addition of the detected photons transmitted through its internal bandpass filter. If the resultant lock-in vector
L is divided into near-field (
NF) and far-field (
FF) components, both of which are vector sums, then the lock-in signal is simply the magnitude |
L|=|
NF+
FF|.
Fig. 4. (Color Online) Expected phase dependency of lock-in signal. Each detected photon is considered as a unit vector with a direction corresponding to the instantaneous oscillation phase of the tip. A lock-in amplifier performs the vector addition of all such unit vectors. The near-field photon phases are Gaussian distributed around θp
, which corresponds to tip-sample contact. Far-field background photons are detected randomly at all phases so the corresponding vector addition is simply a random walk.
The far-field component of the lock-in vector FF results from an unbiased two-dimensional random walk with unit steps, and follows the probability distribution originally derived by Lord Rayleigh
where
r is the final end-to-end distance of the walk, and
Nsteps
is the number of steps in the walk [
19
J. W. Strutt, “On The Resultant of a Large Number of Vibrations of the Same Pitch and of Arbitrary Phase,” Philos. Mag.
X, 73–78 (1880).
]. This distribution has a mean
µr
and standard deviation
σr
given by
In our case, Nsteps
is given by the number of detected far-field photons that are transmitted by the lock-in bandpass filter, Nsteps
=β×Sff
, where β<1. This gives
for the average length of the far-field component |FF| and its uncertainty σ|FF|, respectively. The near-field component NF comes from a biased random walk about θp
. The average value of its magnitude |NF| can be estimated by projecting the unit vectors corresponding to each near-field photon onto the θp
axis and then summing the result:
where the sum runs over all the near-field photons, i=1→Sosc
NF
. For simplification we define α=〈cos(θi
-θp
)〉. Since the phase of each photon θi
is Gaussian distributed, the normalized expectation value is
Combining this result with the definition of
γ from Eq. (
7), the average magnitude of the near-field component |
NF| is then approximated by
When using a lock-in amplifier to demodulate the signal, an image is constructed one pixel at a time, where the value of each pixel is the magnitude of the lock-in vector, |L|=|NF+FF|. The near-field component NF points along θp
, but the far-field component FF points in a random direction. Performing the vector addition of NF+FF and averaging over all directions for FF, the peak lock-in signal is given by
The contrast CLI
and signal-to-noise ratio SNRLI
in the lock-in signal can now be found.
Equation (
20) can be used to calculate the minimum signal enhancement factor required to achieve contrast greater than unity:
As before, we consider the case where there is only one fluorophore in the near-field zone (~10,000 fluorophores/
µm
2) and the far-field illumination area is ~(0.5
µm×1.5
µm) corresponding to focused-TIRF illumination. Using typical experimental values for
k=10 and
β=0.15 as well as optimized values for
γ=0.4 and
α=0.6 (see below) gives a required signal-enhancement factor of
f>65 to achieve a contrast greater than unity. Using radial polarization reduces the required enhancement to
f>18 which is very realistic for silicon tips and in fact has already been demonstrated in the case of isolated spherical quantum dots [
4
J. M. Gerton, L. A. Wade, G. A. Lessard, Z. Ma, and S. R. Quake, “Tip-enhanced fluorescence microscopy at 10 nanometer resolution,” Phys. Rev. Lett.
93, 180801 (2004). [CrossRef] [PubMed]
].
Figure 5 demonstrates how the lock-in demodulation scheme can be used to improve the contrast and
SNR for samples with a high density of rod-shaped quantum dots (4 nm×9 nm). These images were obtained using a silicon tip oscillating with an optimized amplitude of ~30 nm peak-to-peak (see below) and focused-TIRF illumination (
λ=543 nm). Approach curve measurements where the tip is lowered onto isolated quantum dots and the fluorescence rate is measured as a function of tip-sample separation (data not shown) indicate an enhancement factor of only
f~4 for these data. The small enhancement in this case results from the fact that the elongated shape of the quantum dots leads to a somewhat small spatial overlap with the region of enhanced field at the tip apex. Furthermore, the absorption dipole for these nanorods should lie predominantly along the sample surface, while the enhanced field is strongest under the tip where it is vertically polarized. This leads to relatively weak near-field excitation of the nanorods.
Our model assumes that the fluorophores, whether quantum dots or fluorescent molecules, do not blink or photobleach. In reality, both quantum dots and molecular fluorophores blink and photobleach, which alters the contrast observed in experimental images. In particular, the background signal
Sff
will be reduced for a blinking or photobleaching sample compared to an ideal one. Interestingly, this has the effect of increasing the contrast in experimental images in the limit of large fluorophore densities where the fluctuations in the far-field signal caused by blinking and bleaching are small compared to the total far-field signal
Sff
. However, the probability of the tip encountering a particular fluorophore that is “on” (i.e. not in a dark or photobleached state) is reduced by the same factor as the far-field signal
Sff
. The consequence of this is difficult to predict without knowledge of the blinking and photobleaching rates corresponding to the particular fluorophores of interest. This issue is highlighted by
Fig. 5, which shows the topographic image of a collection of quantum dots in addition to the undemodulated and demodulated near-field fluorescence images. The total quantum dot density as observed by the AFM topography is ~50
µm
-2, however, many of the quantum dots do not fluoresce. The bright quantum dot density for this image is ~14
µm
-2 and there is clearly sufficient contrast to increase the density further; Eq.
22 predicts that a density as high as 26 bright dots/
µm
2 can be achieved for
f~4.
Fig. 5. TEFM images of a high-density quantum dot sample. Panel (a) shows the AFM topography (~50 total dots/µm2). Panel (b) shows the scalar photon sum (~14 bright dots/µm2). Panel (c) shows the same image after lock-in demodulation. The scale bar is 200 nm.
4. Optimizing tip oscillation amplitude
The lock-in contrast and signal-to-noise ratio given in Eqs. (
20) and (
21) are strongly influenced by the amplitude of oscillation of the AFM tip, which determines the width of the Gaussian photon-phase distribution,
θσ
, and thus the values of
γ and
α. Thus, to optimize the lock-in contrast, the product
γ×
α must be maximized with respect to
θσ
:
where the approximations in Eq. (
18) have been used. Solving Eq. (
23) for
θσ
gives an optimal value of
θopt
σ
=1 radian. The optimal oscillation amplitude,
Aopt
, can now be found using the equation of motion for the tip oscillation,
z=
A(1-
cos(
θ)). To relate
θopt
σ
to an optimal amplitude
Aopt
, we define
zσ
as the value of tip-sample separation
z in an approach curve such that the integrated area under the approach curve from 0→
zσ
contains 68% of the near-field photons. The value of
zσ
depends on the sharpness of the tip and the size and shape of the fluorescent object: sharp tips and small objects yield the smallest values of
zσ
. Substituting
z=
zσ
and
θ=
θopt
σ
=1 into the equation of motion for the tip we obtain:
When the approximations made in Eq. (
18) are used, a value of
Aopt
=2.18
zσ
is obtained compared to a value of
Aopt
=2.11
zσ
when complete numerical integrations are performed.
Experimental values for the contrast and signal-to-noise ratio as a function of the peak-to-peak oscillation amplitude of the tip are shown in
Fig. 6, along with the theoretical predictions developed above. Isolated (
NFA
=1) CdSe/ZnS nanorods (4 nm×9.4 nm) were imaged with different amplitudes using many different tips from the same fabrication wafer. Each data point was computed from the measured values of
Speak
,
Sff
, and
σff
, as used in Eqs. (
1) and (
2) for ~15 different quantum dots [
6
H. F. Hamann, M. Kuno, A. Gallagher, and D. J. Nesbitt, “Molecular fluorescence in the vicinity of a nanoscopic probe,” J. Chem. Phys.
114, 8596–8609 (2001). [CrossRef]
]. The values of
f=3.7±1.3,
k=11±5,
β=0.15±0.15, and
zσ
=7.5±2 nm were all obtained from a statistical analysis of image and approach curve data. Subsequently,
θσ
was computed from Eq. (
24) using the measured value
zσ
=7.5±2 nm to obtain
γ and
α for each oscillation amplitude from Eqs. (
7) and (
17). Thus, the theoretical curves shown in
Fig. 6 contain no free parameters whatsoever. The predicted peak-to-peak amplitude of 32±9 nm agrees with the experimental value of 32±4 nm. This good agreement between the predictions of this theoretical model and experimental measurements lends confidence to the calculated values of the signal enhancement factors
f requisite for imaging high fluorophore densities found above.
Fig. 6. TEFM image contrast, panel (a), and signal-to-noise ratio, panel (b), for isolated quantum dots as a function of the tip oscillation amplitude. Data were obtained using BudgetSensors Multi-75 silicon tips. Data points correspond to the average value of ~15 measurements for the lock-in demodulation signal (closed symbols) and the scalar sum (open symbols). Dashed and dotted lines are the corresponding theoretical predictions.