A methodology is described for addressing the binary composite hypothesis (CH) testing problem. With each hypothesis is associated a probability density function (pdf) that can depend on parameters with unknown values. The two hypotheses are here called “target” and “clutter,” with parameter collections t
, respectively. For any N
-dimensional test vector x
, the goal is to produce a useful detection algorithm, based on these pdfs. The algorithm is usually described by a scalar-valued statistic
, along with some threshold value λ. These are used according to the convention
otherwise “declare clutter.”
If the values of all target parameters t
and clutter parameters c
are known, then using the likelihood ratio (LR)
in (1) produces optimal performance, according to a variety of metrics [1
1. Louis Scharf, Statistical Signal Processing (USA, Addison-Wesley, 1990).
]. Continuum fusion
refers to a methodology for integrating a collection of parameter-dependent detectors, such as in (2), when the value of at least one of the target or clutter parameters is unknown, in which case the LR cannot be implemented. The standard algorithm for addressing such problems has been for decades the generalized likelihood ratio (GLR) test. The GLR technique uses the maximum likelihood principle [2
2. K. Pearson, “Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia,” Phil. Trans. 187, 253–318, (1895), http://www.jstor.org/stable/90707.
] of parameter estimation in conjunction with the form in (2), according to the prescription
This paper defines an alternative framework for addressing the same set of composite hypothesis testing problems to which the GLR recipe applies.
2. Continuum fusion
A binary detector can be defined by the N-dimensional regions of feature space (the domain of x) where it declares either “target” or “clutter.” The equality form of (1), for example, defines the (N-1)-dimensional boundary that separates two such regions. The fundamental principle of continuum fusion proposed here is to combine the declaration regions of all optimal detectors, one for each theoretically allowed parameter value. The particular fusion logic depends on which type of parameter is the subject of the fusion procedure:
(A) When fusing over target parameters, “target declaration” regions in the feature space are to be combined by Union, in the set-theoretic sense. [This prescription is equivalent to forming the Intersection of all “clutter declaration” regions, as proven by the relation , where A and B are target declaration sets and their complements and are, consequently, clutter declaration sets.]
(B) When fusing over clutter parameters, clutter declaration regions are to be combined by the union operation.
Because the LR is a robust form of optimal detector, this paper focuses on creating “fused likelihood ratio” (FLR) detectors. For every CH problem, several flavors of FLR detector can be formulated, each characterized by the constraints imposed on the fusion process. Because defining a detector requires a threshold value as well as a statistic, any fusion method must include some mechanism for relating the continuum of thresholds associated with the constituent detectors. This paper focuses on two types of fusion constraint: constant false alarm rate (CFAR) and constant likelihood ratio (CLR). A complementary flavor to the CFAR method called constant probability of detection (CPD) fusion is also discussed.
2.1. Motivating example: hyperspectral anomaly detection with shadows
If a model pdf pC of clutter can be constructed, then anomaly detection consists of declaring a test value x to indicate “target” whenever pC(x) is below some threshold. For example, in hyperspectral remote sensing applications, often the parameters of an N-dimensional Gaussian clutter model—a (vector) mean and a nonsingular covariance matrix M—are estimated from field measurements. M can then be used to “whiten” the sampled data by the transformation , an operation that is assumed always to have been performed in the following development. The clutter covariance matrix in the whitened space is then the identity matrix. This paper treats only cases in which the various target and clutter covariance matrices are either: (1) identical (homoskedastic), or (2) scaled versions of each other (ampliskedastic). Whitening, therefore, makes all covariance matrices considered here proportional to the identity matrix.
Assume that a core clutter model has been constructed, which is represented in its whitened form by a spherically symmetric Gaussian pdf
whose mean value is the vector μ
[We use the notation v
]. Anomaly detection then equates to declaring as a target any test value x
that lies outside a sphere of radius r
Fig. 1 A clutter distribution in whitened feature space with anomaly detector decision boundary.
), whose size (indirectly) defines the detector threshold.
However, in some detection problems a few test values x
might be drawn from a scaled version of the core clutter model. For example, in many hyperspectral applications, pixels that are shadowed versions of those used to generate the core covariance matrix could lie well outside the decision boundary in Fig. 1
and be declared targets, because they are statistically anomalous.
To avoid such an undesirable classification, an extension of the core Gaussian model of (4) can be considered:
where a scale factor c
models (crudely) an unknown level of illumination (c
< 1 for shade) of the core detector.
A Bayesian approach [3
3. A. Schaum, Hyperspectral Target Detection using a Bayesian Likelihood Ratio Test, Proceedings 2002 IEEE Aerospace Conference, Vol. 3, Pages 3–1537 to 3–1540, 9–16 March 2002.
] might treat c
as a random variable with uniform distribution, integrate the pdf of (5) over c
, and use the result in an LR test (with the target pdf pT
) modeled as constant, corresponding to maximum target ignorance, for anomaly detection). However, this is an arbitrary procedure and would produce, for example, a different answer were c
treated as the clutter parameter. Also, the integration is often difficult to execute in closed form, as in the present case. The more common approach to such a CH problem is to rely on the GLR test [1
1. Louis Scharf, Statistical Signal Processing (USA, Addison-Wesley, 1990).
], which treats the parameters as deterministic but unknown.
The alternative fusion concept proposed here is most easily explained geometrically. Each value of the clutter parameter (0 < c
< 1) can be assigned a separate detector, described by a shrunken version of the decision boundary in Fig. 1
. If the radius is scaled proportionately with c
Fig. 2 CFAR decision boundaries for different values of a clutter parameter.
), then the integrated probability outside the boundaries remains constant (shown in §4.1). Therefore, each such detector would operate at the same false alarm rate as that of the “seed” algorithm (c
= 1) in Fig. 1
, were the corresponding clutter distribution realized. The approach adopted here is to fuse all such detectors into a single one. Specifically, the “declare clutter” region of the fused detector is defined as the union of the “declare clutter” regions (the balls enclosed by the spheres) of all the constituent detectors (see §2(B)). Because the false alarm rate is constant across constituent detectors, this particular flavor of fusing is called CFAR fusion.
The union of all the balls associated with the allowed values of c
has a “SNO-CONE” shaped boundary (see Fig. 3
Fig. 3 SNO-CONE decision boundaries from CFAR fusion of anomaly detectors, compared to standard results from GLR test.
), with vertex at the origin and axis traversing the core clutter mean μC
. This simple shape for a decision boundary conforms to one’s intuitive expectations of a solution to this “shadowed anomaly detection” problem.
The above problem also has a GLR solution. To compute it, insert the clutter pdf from (5) into (3), with pT
taken to be constant. The maximization calculation is straightforward and results in a GLR detection statistic
also illustrates the type of decision boundary this detector generates (for feature vector dimension N
= 2). It differs considerably from the intuitive conical decision boundary which, as argued earlier, also results from a principled procedure, CFAR fusion. Notice that the fusion algorithm always declares as “clutter” every clutter mean (as c
varies), whereas at low thresholds GLR classifies the core and other clutter means as “target.” The CFAR-fused algorithm also has a convenient invariance property. When any single value of c
is realized in the clutter distribution, the CFAR fusion detector produces a false alarm rate independent of c
(if the cone is extended, corresponding to a c
→ ∞ model).
This idea of fusing anomaly detectors generalizes naturally to problems in which some prior target knowledge is available. The constant in the LR test is replaced by a model pT(x; t), in which target parameters t, with possibly unknown values, are allowed. Then one finds optimal detectors for all parameter values and fuses them by forming the union of the “declare target” regions (see §2(A)) while imposing some restriction, such as the CFAR condition, to define the exact forms of the constituent detectors.
3. Optimal limit property of a CH solution
A desirable property of any method contending to solve CH problems is that it generate a known optimal LR form when the latter happens not to depend on parameters appearing in a problem statement. A prime example of this phenomenon is the additive target model, for which the clutter model is (4), and the target pdf is modeled with the same distribution, except for a difference in mean value.
If the coordinate origin is selected to coincide with the clutter mean, the pdfs for the additive target model are
If the mean target vector T
is specified, then the Neyman-Pearson
(NP) detection statistic corresponding to (7) follows from an LR test, and is equivalent to the linear matched filter
The value of λ fixes the false alarm probability, and the LMF satisfies the NP optimality criterion: to maximize the detection probability for a specified maximum false alarm probability.
Notice that the detector in (8) depends on the direction of T, but not on its amplitude. Therefore, with , the above problem could have been posed and solved optimally with an LR test, without prior specification of the target amplitude parameter t. The additive target model is, therefore, an example of a CH problem that admits an optimal solution (for a specified false alarm rate) that is independent of its one parameter. One should demand of any general method of addressing CH problems that it produce the LMF for this additive target problem.
To apply the CFAR fusion principle in this case, first select an arbitrary positive target parameter value of t and construct the optimal (LR) detector, along with some arbitrarily chosen threshold value, which sets a seed false alarm rate. Second, for every other value of t, construct a new detector with the same false alarm rate. Finally, form the fused detector by defining its “declare target” region as the union of the corresponding regions of the constituent detectors.
For this additive target problem, however, the CFAR constraint makes every constituent detector the same. The distances of the hyperplane decision boundaries described by (8) from the background mean must be identical, if the false alarm rates are to be equal. Thus the CFAR fusion procedure is trivial, operating on replicas of the seed detector, which is an LMF.
One should expect the GLR also to produce the LMF for the above model. In this the GLR nearly succeeds, but not for all false alarm rates. [The GLR-derived statistic reduces to a constant for all x with negative projection into the target subspace, owing to the nonnegativity constraint on t.] In this sense, the GLR is a less desirable means of generating the LMF than CFAR fusion. The latter method is also more direct, in that all fused detectors are clones of the seed detector.
We note finally that another form of fusion, complementary to the CFAR method, is also easy to apply in the above example. Constant probability of detection (CPD) fusion also produces a series of hyperplanes as t varies, but they are at fixed distances from the point defining the target mean value. Fusing the corresponding detectors produces the LMF once again, but not with the seed false alarm rate. In this application then, CPD fusion also produces the optimal detector, but less directly than CFAR fusion.
4. The fusion relations
Most fusion problems cannot be solved with simple geometrical arguments, as above. Therefore, general analytical conditions are desirable for describing the fused detectors in terms of the constituent ones. These can be derived with an analytical geometry approach.
First, represent a decision boundary for a constituent detector implicitly with the form:
(cf. (1)). An alternative choice to (9) for the function f
is any function g
that defines the same surface, and for which a positive value corresponds to the “declare target” region. For example, it is often convenient to take
Notice that, although the threshold k
is constant with respect to the feature vector x
, it is not generally independent of the parameters. In fact, the form of its parameter dependence defines the method of fusion. For example, for the shadowed anomaly detector (Section 2.1), g
can be taken as
in which the threshold term is
, the functional form characterizing CFAR fusion for this model. The test g
> 0 for target declaration produces a false alarm rate for the clutter distribution of (5) that is set by the value of r
, but is independent of c.
This can be verified by examining the clutter pdf (5) integrated the over the region outside a radius rc
from its mean:
. Equation (12)
is easily verified in hyperspherical coordinates.
To arrive at general equations defining fused algorithms, consider next the case of Fig. 4
Fig. 4 Envelopes for fused detectors are either mergers or bounding surfaces.
—either the f
forms can be used—where three (infinitesimally) close values (t
) of a target parameter produce intersecting decision curves for the corresponding three detectors. [Similar arguments apply to higher-dimensional surfaces.]
In the case where the “declare target” regions are below the curves, the envelope of all such curves defines the decision boundary of the fused algorithm (a fact contingent on the relative topology of all three curves, a point elaborated below). At the feature space value x
of Fig. 4
, curves 1 and 2 intersect, and each satisfies a detector equation f
) = 0. Subtracting these produces
].) for any point on an envelope that is derived from the merger of constituent decision boundaries.
Next, consider 2nd differences as the parameter varies. For example, at x
13 the 2nd difference is
In view of (9) and considering that x
lies on both the t3
curves, all terms in (14) are zero except - 2 f
). And because x
lies below curve 2, on the “declare target” side, f
) is positive. Since (dt
is positive, we conclude that
The same conclusion is reached by considering 2nd
differences at either x
Similar equations obtain if the target parameters t are replaced by clutter parameters c, and the region below the curves is the “declare clutter” region for the detectors. However, in this case, f (x
2) is negative [Recall the convention discussed after (1)]. Therefore, the inequality fusion relation for the clutter parameters has a sign difference.
In the general case where the function f
can depend on both types of parameters, we have the Fusion Relations:
which describe the merged decision boundaries of a fused algorithm. That is, the boundaries are described by those values of x
where all the equations are satisfied. f
can be replaced with any function whose positive values define target declaration regions of feature space for the constituent detectors. And more generally, t
can each represent a list of parameters, with corresponding equations for each one.
As an example, in fusing over target parameters with c
fixed, if the middle (first derivative) equation can be solved for t
in terms of x
, then the fused decision boundary is defined by
represents unknown parameters, then the fusion relations can be applied likewise to F
, to derive a function describing the fully fused detectors.
Note that the enveloping decision boundary does not always result from the merger of constituent decision boundaries. To see why the inequalities in (16) are necessary for an envelope to arise from a merger of constituent boundaries, note that if curve 3 in Fig. 4
is replaced by curve 3′ (which intersects curve 1 before curve 2), and 2nd
differences are computed at x
), it is found that the 2nd
derivative with respect to t
is positive. That is, a fusion inequality in (16) is violated. In this case, the fused decision boundary does not come from a merger, but is instead the curve 3′ itself, which we term a bounding surface. It derives not from a partial differential equation, but from a constituent detector’s equation. Its decision boundary is found by resorting to the definition of a fused “declare target” region, using the union operation. It is not uncommon for the enveloping decision boundary of a fused algorithm to be a merger in some places (where the conditions in (16) are satisfied), but a bounding surface in others (where at least one of the inequalities in (16) is violated). An example is described in § 4.1.
Finally, if in Fig. 4
only one assumption is changed, making the regions above
the curves “declare target” regions (but still assuming that curves 1 through 3 refer to different target parameters), then the fusion inequality for the target parameters is again violated. [The sign of f
) is now reversed.] The fusion curve is once more a bounding surface rather than a merger, consisting in this case of segments derived from curves 1 and 3 (see Fig. 4
4.1. Solved example: hyperspectral anomaly detection with shadows
The model of Fig. 2
serves as a useful example of how to use the fusion relations. The first fusion equation (see (16)) for this problem describes the component detector decision boundaries, and is given by (11)). The differential fusion equation (from (16)) yields
The second-derivative is
and so satisfies the inequality in (16), whenever the seed detector’s spherical decision boundary of radius r
(corresponding to c
= 1) is small enough not to encompass the origin (as is depicted in Fig. 2
Finding the merged decision boundary involves first solving (18) for c, so that it can be eliminated from (11). For any value of c, these two equations describe two N - 1 dimensional surfaces, whose intersection defines an N - 2 dimensional region, which forms that part of the fused detector’s boundary associated with the value c. The union of all such regions as c varies over its allowed range creates the N - 1 dimensional boundary of the fused algorithm.
For example, for N
= 3, (11) defines spheres of radius rc
centered at c
(depicted in Fig. 2
), and (18) defines planes perpendicular to μC
. For a given value of c
, the intersection of the two surfaces is a circle, one of the threads comprising the conical surface that is built from the union all such circles, as c
varies from zero to one.
Solving (18) produces
which, substituted into (11), generates the equation for the merged part of the envelope that describes the fused detection algorithm
a cone with half angle θ given by
. The limiting allowed c
value of one requires, according to Eqs. (20)
, the magnitude of vectors x
on this part of the decision boundary to satisfy
. Points on the fusion boundary with larger magnitudes arise not as solutions to the fusion relations, but are part of the bounding surface defined by the constituent seed algorithm, that is the spherical cap defined by (11) with c
= 1. This is the second part of the SNO-CONE boundary, where the differential fusion equation is not satisfied.
As the seed value r approaches the value ||μC||, the cap grows larger until the entire boundary is spherical. For the boundaries of all constituent detectors lie inside the seed boundary, and the differential fusion relations are violated on all parts of the decision boundary, which is a large sphere, the bounding surface described by c = 1.
5. Relation to GLR
For differentiable pdfs, maximizing the terms in the GLR equation ((3
)) means satisfying
These are evocative of the fusion relations (16), considering the definitions (2) and (9). More precisely, if λ is independent of all parameters, then it becomes clear that the GLR equations define a fusion problem
. For example, if the bottom of (22) is satisfied, then so is the bottom of (16), because
Therefore, the GLR method is equivalent to constant likelihood ratio (CLR) fusion. The fusion process that generates the GLR involves parameter-dependent constituent algorithms describable by decision boundaries, on all of which the LR values are identical, equal to λ, and independent of t and c. The GLR is thus revealed as a single point in a larger universe of fusion solutions to CH problems.
This equivalence to CLR-fusion also affords a new, constructive interpretation of any GLR solution. For example, the ovoid shapes in Fig. 3
result from the circumscription of an infinity of circles whose sizes vary in such a way as to keep constant their likelihood ratios (as opposed to the circles of Fig. 2
, which correspond to the CFAR constraint). Natural modifications to any GLR solution are also suggested by the new interpretation, for example, deleting from the fusion process some constituent detectors for which detection probabilities would be too low (i.e. a min/max criterion), if the corresponding parameters were realized by target/background distributions.
6. Summary and Future
A new formalism has been described for generating binary detection algorithms, and it can be applied to any problem traditionally addressed by the generalized likelihood ratio test. Two specific composite hypothesis problems were solved within the new framework. The first is an anomaly detector informed by a simple physical model. It was derived in two ways, geometrically and analytically, and was contrasted to the traditional GLR solution. The second CH problem, the additive target model with Gaussian distributions, served as a sanity check.
A general set of partial differential relations was derived, whose solution defines the enveloping decision boundary for a fused detector, whenever that surface arises from the merger of the constituent detection boundaries. When not all these “fusion relations” can be satisfied, the envelope is a bounding surface, derived from one or more constituent detectors.
It has been shown further that the most popular prior approach to solving CH problems corresponds to a particular flavor of continuum fusion. The new formalism encompasses the GLR technique as a special case.
Research in continuum fusion has just begun, with many fruitful issues already identified. For example, by modifying the details of the constructive fusion interpretation of the GLR, new versions of this older method can be tailored to meet special needs in a way not afforded by the maximization procedure that heretofore defined GLR solutions. Future work will also address linear and affine subspace detection problems, selected non-Gaussian and heteroskedastic problems, and approximate methods of meeting the CFAR constraint. Deeper theoretical issues will also be examined, such as the relationship between the fusion equations and uniformly most powerful algorithms, when they exist, and the connection of the fusion formalism to invariant methods [1
1. Louis Scharf, Statistical Signal Processing (USA, Addison-Wesley, 1990).
This paper has discussed three flavors of fusion: CFAR, CLR, and CPD. But many others may prove valuable, for example, those generated by keeping constant some cost function (of detection and false alarm probabilities). For these the LR form of detector used in this paper is always the optimal constituent detector. However, the fusion principles can be applied to any metric of optimality. Future reports will also show how to associate with a given fusion method parameter-free target or background distributions for which that method is optimal. This insight can inform the choice of fusion flavor in real world applications, for which some limited target data might exist, but reliable distribution models do not.