The light reflected from the window was collected and detected using optical interferometer and the sounds were extracted [1
1. Peter Yapp, “Who’s Bugging You? How Are You Protecting Your Information?,” Information Security Technical Report 5, 23–33 (
]. The idea was very simple: the speech sounds vibrate the window and those small vibrations are sufficient to perform detectable phase modulation in range of the optical wavelength. This configuration suffers from 4 major disadvantages. First, all sounds are detected together and in order to separate them one needs to apply digital blind source separation post processing algorithmic. Second, the projection laser and the detection interferometer module must be placed in very specific positions such that indeed the reflected beam should be directed towards the detection module. Third, the detection module is complicated and sensitive to errors as all interferometer based configurations. Fourth, it requires a window to be positioned near the voice source.
In this paper we propose a new patented approach [4
4. Z. Zalevsky and J. Garcia, “Motion detection system and method,” Israeli Patent Application No. 184868 (July
] overcoming all the four disadvantages that we have described. Our configuration includes projection of laser beam and observation of the movement of the secondary speckle pattern that are created on top of the target. The speckles are self interference random patterns [5
5. J. C. Dainty, Laser Speckle and Related Phenomena, 2nd ed. (Springer-Verlag, Berlin,
] and have the remarkable quality that each individual speckle serves as a reference point from which one may track the changes in the phase of the light that is being scattered from the surface [5
5. J. C. Dainty, Laser Speckle and Related Phenomena, 2nd ed. (Springer-Verlag, Berlin,
Because of that, speckle techniques such as electronic speckle-pattern interferometry (ESPI) have been widely used for displacement measuring and vibration analysis (amplitudes, slopes and modes of vibration) as well as characterization of deformations [6
6. H. M. Pedersen, “Intensity correlation metrology: a comparative study,” Opt. Acta 29, 105–118 (
12. K. Uno, J. Uozumi, and T. Asakura, “Correlation properties of speckles produced by diffractal-illuminated diffusers,” Opt. Commun. 124, 16–22 (
]. In case of an object deformations measurement, one subtracts the speckle pattern before the deformation has occurred (due to change in loading, change in temperature, etc.) from the pattern after loading has occurred. This procedure produces correlation fringes that correspond to the object’s local surface displacements between the two exposures. From the fringe pattern both the magnitude and the direction of the object’s local surface displacement is determined [7
7. J. A. Leedertz, “Interferometric displacement measurements on scattering surfaces utilizing speckle effects,” J. Phy. E. Sci. Instrum. 3, 214–218 (
8. P. K. Rastogi and P. Jacquot, “Measurement on difference deformation using speckle interferometry,” Opt. Lett. 12, 596–598 (
1987). [CrossRef] [PubMed]
Usage of speckles was also applied for improving the resolving capabilities of imaging sensors [13
13. J. García, Z. Zalevsky, P. García-Martínez, C. Ferreira, M. Teicher, Y. Beiderman, and A. Shpunt, “3D Mapping and Range Measurement by Means of Projected Speckle Patterns,” Appl. Opt. 47, 3032–3040 (
2008). [CrossRef] [PubMed]
] as well as ranging and 3D estimation [14
14. J. Garcia, Z. Zalevsky, and D. Fixler, “Synthetic aperture superresolution by speckle pattern projection,” Opt. Express 13, 6073–6078 (
2005). [CrossRef] [PubMed]
In our configuration the detection is obtained via fast imaging camera that observes the temporal intensity fluctuations of the imaged speckles pattern and their trajectory. In order to allow correlating the trajectory with the movement of the speckle patterns we had to properly defocus our imaging lens.
Since the speckles are spatially small spots their diffraction occurs in wide angle (close to 2π Ste-Radians) and thus no matter where the camera is placed the speckles pattern may be imaged. Therefore no constrain exists any more regarding the location of the detector or the reflecting object.
The speckles are self interfering pattern and thus the detection is done only by simple imaging so the detection module is not an interferometer and thus it is less sensitive to noises.
Since the detection is realized with an imaging module, the temporal variations of any pixel in the image can be associated with different sound source and therefore one may realize blind source separation of sounds just due to the spatial separation without the need for digital post processing.
The bandwidth of speech signals is approximately 4KHz [15
15. D. Bansal, B. Raj, and P. Smaragdis, “Bandwidth expansion of narrowband speech using non negative matrix factorization,” paper TR2005-135, 9th European Conference on Speech Communication (Eurospeech)
] and thus sampling at rate of 8Kfps is enough for the reconstruction. Nowadays digital cameras can allow even higher sampling frame rate at predefined spatial regions of interest of e.g. 256 by 256 pixels (this region for instance can potentially separate 256×256 different sound sources) [16
The proposed approach was experimentally proven not only to detect acoustic and speech signals but also was capable of tapping cellular phones as well as detecting the heart beats temporal signature (resembles the ECG signals in medicine) of subjects positioned in noisy environmental scenario.
Section 2 presents the theoretical explanation. Experimental results for voice detection, cellular phones taping and remote heart beats signature extraction are presented in section 3. The paper is concluded in section 4.
2. Theoretical explanation
Speckles are self interfered random patterns. Speckle pattern can be generated by illuminating an object through a diffuser or a ground glass. Speckle patterns are generated due to the roughness of the surface of the object when illuminated by a spot of laser beam. When spatially coherent beam is reflected from the object whose roughness generates random phase distribution, in the far field we may obtain the self interfering speckle pattern.
In the proposed configuration we propose not to focus the camera on the object but rather to have the camera focused on the far or close field such that the object itself is defocused. Doing that makes the movement of the object (its vibrations) to cause to a lateral shift of the speckles pattern. Actually due to this defocusing, the movement of the object instead of constantly changing the speckle pattern creates a situation in which we see the same speckle pattern which is only moving or vibrating in the transversal plane. This is very important feature since it allows, by tracking the maxima intensity spots, the extraction of the trajectory movement. As to be shown in the experimental part, the speech signals are vibrations around the trajectory of the entire object. The suggested approach allows not only extraction of the temporal speech and heart beat information but also estimating the 3D trajectory of the object.
Let us now prove that indeed when slightly defocusing, instead of changing, the speckles pattern is moving. We will refer to Fig. 1
in order to explain our considerations. We will denote by (x
) the coordinates of the transversal plane while the axial axis will be denoted by Z. Laser spot with diameter of D
is illuminating normally a diffusive object. λ
is the optical wavelength. The reflected light is imaged by a lens onto a detector giving a speckle pattern. This random amplitude and phase pattern is generated due to the random phase of the surface of the diffusive object. In the regular case the imaging system is imaging the plane close to the object determined by distance Z1
and in this case the amplitude distribution of the speckles equals to the Fresnel integral performed over the random phase ϕ
that is created by the surface roughness.
where paraxial approximation has been assumed, as well as uniform reflectivity over the object’s illuminated area. This distribution is viewed by the imaging device and the intensity of the obtained image equals to:
where h is the spatial impulse response and M is the inverse of the magnification of the imaging system. h takes into account the blurring due to the optics as well as due to the size of the pixels in the sensor and it is computed in the sensor plane (xs,ys).
F is the focal length of the imaging lens. In the case of remote inspection, typically the distance object-lens is much longer that any other distance involved in the process.
Assuming a rigid body movement, the movement of the object can be classified into three types of movements which can not be separated and they occur simultaneously: transverse, axial and tilt. Under transversal movement the amplitude distribution of the speckles pattern Tm
will simply shift in x
and in y
by the same distance as the movement of the object, as can be checked in Eq. 1
. Under normal imaging conditions and small vibrations, this movement will be demagnified by the imaging systems resulting in barely detectable shifts on the image plane. The second type of movement is axial movement in which the speckles pattern will remain basically the same since the variations in Z1
(which will only scale the resulted distribution) are significantly smaller in comparison to the magnification of the camera:
The third type of movement is the tilt of the object which may be expressed as follows:
The angles αx
are the tilt in the x
axes respectively and the factor of 2 accounts for the back and forth axial distance change. In this case and it is well seen from the last Eq., the resulting speckle pattern will change completely. Especially after the blurring of the small speckles with the impulse response of the imaging system having the large magnification factor (M
can be a few hundreds) as described by Eq. 2
Since the three types of movements can not be separated, basically the speckles pattern is varied randomly. It should be noted that for small Z1 values the size of the speckles pattern at the Z1 plane will be very small and will not be visible in the sensor after imaging with large demagnification. Under these conditions the speckles associated with the aperture of the lens (the blurring width of λF# which is properly magnified when transferred to Z1 plane) is dominant (rather than the under-sampling due to the detector).
Assuming now that we strongly defocus the image captured by the camera. Defocusing brings the plane of the imaging from position at distance of Z1
into a plane positioned at distance of Z2
. In this case several changes occur. First, the magnification factor M
is relatively small (it is reduced at least by one order of magnitude). Second, the plane at which the speckles pattern that are imaged by the camera is formed is in the far field approximation regime (the relevant speckle plane is far from the object). Therefore, the equivalent of Eqs. 1
Note that now also:
Therefore in the case of transversal movement the speckles pattern is almost unchanged since shift does not affect the amplitude of the Fourier transform and because the magnification of the blurred function h is much smaller. Axial movement does not affect the distribution at all as well since Z2
is much larger than the shifts of the movement (only a constant phase is added in Eq. 6
Tilting is expressed as shifts of the speckles pattern. It is clearly seen in Fig. 1
as well as understood from Eq. 9
(linear phase is shift in the far field):
The three types of movement are not separated but since now two of them produce negligible variations in the imaged speckle pattern, the overall effect of the three of them is the only the pure shift which may easily be detected by spatial pattern correlation.
The resolution or the size of the speckle patterns that is obtained at Z2 plane and imaged to the sensor plane equals to:
This is of course assuming that this size of δx is larger (and therefore is not limited by) than the optical as well as the geometrical resolution of the imaging system.
The conversion of angle to the displacement of the pattern on the camera is as follows:
Note that assuming that Δx is the size of the pixel in the detector then the requirement for the focal length is (we assume that every speckle in this plane will be seen at least by K pixels):
Note that Z2 fulfills the far field approximation:
The number of speckles in every dimension of the spot equals to:
where ϕ is the diameter of the aperture of the lens. F# is the F number of the lens. Mδx is the speckle size obtained at plane of Z2. This relation is obtained since the spot of the lens is covered by the light coming from the reflecting surface of the object.
Fig. 1. Schematic description of the system.
determine the requirements for the focal length of the imaging system. There contradictory requirements on both Eqs. On one hand, from Eq. 12
it is better to have small F
since then the speckles are large in the Z2
plane (especially for large Z2
) and it is preferred to increase the demagnification factor such that it will be easier to see the speckles with the pixels of the detector. In Eq. 14
we prefer larger F
in order to have more speckles per spot. Therefore, a point of optimum may be found. This point limits the sound detection performance to few hundreds of meters. Let us make a small computation: Assuming D
one obtains: F
. It is clear that K
can not be too large since K
is the region of interest and since we sample at high rate, this window should be as small as possible. Usually when sampling at rates of 8KHz or so (for recovering speech signals) the window should not exceed about 100 pixels in every direction and therefore the range of Z2
is close to the theoretical limit of detection.
3.2 Full outdoors testing
In this subsection we present measurements in which we were able to detect direct speech signals in noise environment of standing and walking subjects, to hear their breathing and hear beats and to tape a cellular phone conversation.
Fig. 3. (a).–(b). Two consequential speckle patterns. (c). The defocused image of the target with the projected spot on top of it. (d). The extraction of the temporal speech signal (a scream). (e). The spectrogram of the signal of 3d.
All results were obtained by applying basic detection algorithm without any real post processing for noise removal. Therefore much better results can be extracted after proper post processing. All recordings with the visible laser were performed outside at noon and in the summer with strong turbulence effects and in an extremely noise environment between the recording system and the target.
Fig. 4. One of the experimental setups for far range detection.
The processing included correlation of the defocused speckles patterns and tracking the change in the position of the peak. This change in the position is the acoustic signal we aim to extract.
In Fig. 5
we tape to a cellular phone conversation. The range is 60 meters and the sampling was performed at 2480 fps. The recording is of counting in English of 1,2,3,4,5,6. In Fig. 5
(a) we present the image captured with our imaging system while in Fig. 5
(b) one may hear the reconstructed taped signal.
Taping cellular phone. The person on the other side of the line is counting 1,2,3,4,5,6… (a). Image of the cellular phone. (b). (Media 1
) The reconstructed taped signal.
In Fig. 6
we demonstrate the reconstruction of speech signal from the back part of the head without seeing the face of the speaker. The range was about 30 meters. The recording was at 2480 fps. The subject is saying in English: 5,6,7. In Fig. 6
(a) we present the scenario while in Fig. 6
(b) we show the reconstructed signal.
Listening to talks from the back part of the neck. The person is counting 5,6,7… (a). Image of the subject. (b). (Media 2
) The reconstructed signal.
Next we performed a recording at range of 100 meters by observing the speckles pattern reflected from the face. The recording was performed across noisy constriction site. The sampling was at 2480 fps. The voice file contains counting in English saying 5,6… Figure 7
(a) presents the scenario while in Fig. 7
(b) we show the reconstructed voice signal.
Listening to talks from the profile of the face. The person is counting 5,6…. (a). Image of the subject. (b). (Media 3
) The reconstructed signal.
As a clarifying example, in Fig. 9
(a) we present the reconstruction of speech signal containing a counting in English (obtained in the previous experiments) and in Fig. 9
(b) we show its spectrogram.
Listening to heart beats. (a). Image of the subject. (b). (Media 4
) The reconstructed signal.
The units of the temporal axis in Fig. 9
(a) are 1/2480 sec (each 2480 pixels are 1 sec). The horizontal units in Fig. 9
(b) are in seconds. The vertical units are in Hz. One may clearly see both in the temporal signal as well as in the spectrogram how the words are well separated in time and well visible.
Fig. 9. (a). The temporal voice signal. (b). The spectrogram.
The temporal signal and the spectrogram of the heart beats signals is seen in Fig. 10
(a) and 10
(b) respectively. Once again the beats are very visible. The units in Fig. 10
(a) are in 2.5/10 [sec]. The horizontal units in Fig. 10
(b) are in seconds and the vertical are Hz.
Fig. 10. Experimental results for heart beats detection of remote subject. (a). The temporal signal. (b). The spectrogram.
Now we performed recording through a glass window. The range was about 30 meters. The recording was at 2480 fps. The recording is from the forehead.
Fig. 11. Experimental results for recording through a window at 30 meters across very noise construction site of talking subject. Recording from forehead. (a). The temporal signal. (b). The spectrogram. (c). The scenario of the experiment.
In Fig. 11
we present experimental results for recording through a window at 30 meters across very noise construction site of talking subject. The recording is done from the forehead.
Fig. 12. Experimental results for speech detection with IR laser. (a). The temporal signal. (b). The spectrogram.
In Fig. 11
(a) we present the temporal signal each pixel is 1/2480 of a second. In Fig. 11
(b) we present the spectrogram of the signal while the scenario of the experiment is seen in Fig. 11
(c). The subject appearing in the left part of Fig. 11
(c) is illuminated with a laser spot (marked with red arrow).
The last experiment contained recording with infra red laser at 915nm (it is important due to eye safety issues). It was done in the laboratory at range of 3 meters. The recording was done from the forehead. Figure 12
(a) presents the temporal signal and Fig. 12
(b) is its spectrogram. One can clearly see the separate words of the counting in English in the temporal signal as well as in its spectrogram. Each temporal scale of one is equivalent to 1000/2480 of a second.
Fig. 13. Spectral components of OCG signature. (a)-(d) Reconstruction after projecting on hand joints. (e)-(f) Reconstruction after projection on the throat. (a). Subject #1 at rest sampled at 20Hz. (b). Subject #1 at physical strain while sampled at 20Hz. (c). Subject #2 at rest while sampled at 100Hz. (d). Subject #2 at physical strain while sampled at 100Hz. (e) Subject #3 at rest while sampled at 100Hz. (f). Subject #3 at physical strain while sampled at 100Hz.
3.3 Optical cardio-gram measurement
As previously mentioned and demonstrated the system and the operation principle which we used for detecting tilted movement can also be used to extract not only speech signals but also optical signals corresponding to ECG. We will coin those signals as Optical Cardio-Gram (OCG). Those signals can indicate the pressure condition and the physical stress of an individual as well as to separate the temporal signature of a certain individual from another one. In this subsection, we measure those signals and show their dependency on different subjects and their physical condition as well test the repeatability of such measurements with time.
We used the same digital camera: PixelLink model number of A741. We took spatial region of 128 by 128 pixels. The camera and the laser were positioned side by side. The distance between the camera and the subject was about 1m. The camera was focused at far range behind the subject. We used Nd:YAG laser with wavelength of 532nm.
Fig. 14. Temporal OCG signature. (a). Temporal signature of subject #4. (b). Temporal signature of subject #5 and correlation peaks designating that indeed its temporal signature is repeatable. (c). Temporal signature of subject #6. (d). Temporal signature of subject #6 recorded in a different day. The signature is repeatable. (e). Temporal signature of subject #7. (f). Temporal signature of subject #8. Different subjects have different signature.
Some results are presented in Fig. 13
. In Fig. 13
(b) the laser beam was projected on the hand joints and performed measurements at 20Hz. We took 500 samples and therefore the spectral resolution should be 1/(500/20)=0.04Hz. Control measurement was performed with Polar Clock heart rate monitor and the result was 1.33pulses/sec. Since the Fourier was performed over a set of 490 pixels the expected position for the spectral peak is at pixel number 245+1.33/0.04=278.
As one may see the peak was obtained at pixel 279 which corresponds with the external measurement by the Polar Clock. The same measurement was repeated for the same subject at physical strain. This time the external measurement was 1.783 (pulses per second). Therefore the anticipated peak should appear at pixel number 245+1.783/0.04=289. The peak was obtained at pixel 287, at only 0.7% from the external measurement.
The measurements at Fig. 13
(f) were performed at rate of 100Hz. 1000 temporal images were taken (i.e. time window of 10 seconds). For Fig. 13
(c) the Polar Clock measurement gave the result of 1.033 pulses per second. In this case the spectral resolution is 1/(1000/100)=0.1Hz and therefore since we took in the spectral computation only 990 pixels, the anticipated peak should appear at pixel number 1.033/0.1+495=505.3 while it was obtained at 506. In Fig. 13
(d) the Polar Clock measurement gave 1.433 pulses per second and therefore we anticipate having peak at pixel number: 1.433/0.1+495=509.3. The peak was obtained at 509. In Fig. 13
(e) the Polar Clock gave result of 1.216 and therefore the peak is anticipated at pixel 1.216/0.1+495=507.2. The peak was obtained at pixel number 507. In Fig. 13
(f) the Polar Clock gave 1.5 pulses per second and therefore the peak is anticipated at pixel: 1.5/0.1+495=510 and indeed we have received it at pixel 510.
Fig. 15. Using OCG for identifying individuals. (a). The measurement configuration. (b). Identification of subjects from an existing pool. (c). Percentages of success and error.
In Fig. 14
we tested the temporal signature of the OCG signals. In Fig. 14
(a) one may see the temporal signature of a subject while the signature i.e. a single period is enlarged. The parameters of the experiment are identical to the one of Fig. 13
while the sampling was at 100Hz. In Fig. 14
(b) one may see the temporal signature of different subject with correlation peaks appearing on top of it indicating the preservation of its unique temporal shape. In Fig. 14
(c) and 14
(d) we present the temporal signature of subject #6 at rest in different days. The signature is repeatable. In Figs. 14
(e) and 14
(f) we present the temporal signature of subjects #7 and #8. Different subjects have indeed different signatures.
In short range OCG measurement stand that was constructed (depicted in Fig. 15
(a)), we have tried to perform preliminary statistics on the capability to use the optically detected signature for identification of individuals. The measurements were done using Nd:YAG laser at 532nm which was installed and fixed as part of the measurement stand.
The measurement was performed on a group of 30 subjects part of which were considered as a pool of signatures. Then, the OCG of the subjects was measured again and we have compared each one of the subjects to the signatures in the pool. In some cases the correlation was only for the subject with itself. Subjects in the pool were identified as such and those which are not were identified as such as well. In some cases measurement of a given subject in the pool that was identified resembled not only to his own signature but also to one, two or more different subjects in the pool. Sometimes subject that was not in the pool was identified as such etc.
The statistics is summarized in the chart of Fig. 15
(b) and 15
(c). Although the algorithms that were applied for identification (simple correlation and thresholding) as well as the configuration were very preliminary we saw that more than 86% of the subjects we properly identified.
Note that although all measurements in this work were performed using visible green laser at 532nm, a simple upgrade of the system can allow working with infra red light source which is also safe to the eyes.