Statistics & Probability Letters 53 (2001) 283–292
On the asymptotic behaviour of the integrated square error of
kernel density estimators with data-dependent bandwidth
Carlos Tenreiro∗
Departamento de Matematica, Universidade de Coimbra, Apartado 3008, 3000 Coimbra, Portugal
Received January 2000; received in revised form December 2000
Abstract
In this paper, we consider the integrated square error Jn =
∫ {fˆn(x) − f(x)}2 dx; where f is the common density
function of the independent and identically distributed random vectors X1; : : : ; Xn and fˆn is the kernel estimator with a
data-dependent bandwidth. Using the approach introduced by Hall (J. Multivariate Anal. 14 (1984) 1), and under some
regularity conditions, we derive the L2 consistency in probability of fˆn and we establish an asymptotic expansion in
probability and a central limit theorem for Jn. c© 2001 Elsevier Science B.V. All rights reserved
MSC: 62G07; 60F05
Keywords: Kernel estimators; Integrated square error; Asymptotic distribution; U-statistics
1. Introduction
Let X1; : : : ; Xn be independent Rd-valued random vectors, with common density function f, and let fn be
the kernel estimator of f given, for x∈Rd, by (cf. Rosenblatt, 1956 and Parzen, 1962)
fn(x)=
1
n
n∑
i= 1
Khn(x − Xi);
where Kh(·)=K(·=h)=hd; for h¿ 0, K is a kernel on Rd, i.e., an integrable function such that
∫
Rd K(u) du=1,
and (hn) is a sequence of strictly positive real numbers converging to zero as n→ +∞.
In the deCnition of fn, the kernel K and the bandwidth hn enter as unspeciCed parameters. For a Cxed
kernel (it is well known that the performance of fn is not very sensitive to the choice of the kernel), the
bandwidth is usually chosen on the basis of the data. The application of such data-driven procedures leads to
This work was partially supported by CMUC=FCT and by Projecto Praxis=2=2.1=Mat=400=94.
∗ Fax: +351-239-832-568.
E-mail address: tenreiro@mat.uc.pt (C. Tenreiro).
0167-7152/01/$ - see front matter c© 2001 Elsevier Science B.V. All rights reserved
PII: S0167 -7152(01)00072 -4
284 C. Tenreiro / Statistics & Probability Letters 53 (2001) 283–292
an estimator of the form
fˆn(x)=
1
n
n∑
i= 1
KAn(x − Xi); (1)
where An =An(X1; : : : ; Xn) is a sequence of measurable bandwidths.
In this paper we consider the integrated square error
Jn =
∫
{fˆn(x)− f(x)}2 dx; (2)
where the integral is over Rd, as a measure of the global performance of fˆn. Under some regularity condi-
tions on f and on K , the usually considered bandwidth selectors are such that An=hn − 1= op(1) for some
deterministic sequence (hn). Assuming this property, we establish an asymptotic expansion in probability for
Jn in terms of the integrated square error:
In =
∫
{fn(x)− f(x)}2 dx; (3)
studied by Hall (1984) using U -statistics techniques (see also Bickel and Rosenblatt, 1973). A such expansion
enables us to derive the asymptotic behaviour of Jn in terms of the corresponding one of In. In particular,
the L2 consistency in probability of fˆn is derived, and if an appropriate rate of convergence for An=hn − 1
is available, an asymptotic expansion in probability for Jn, similar to that given for In by Hall (1984), is
obtained. Moreover, if An=hn − 1=Op(n−) for some 0¡6 12 , where is close to 12 (but not necessarily
too close), we give a central limit theorem for Jn and we conclude that its asymptotic distribution depends on
(An) only through the deterministic sequence (hn). Bandwidth selectors verifying this condition for the same
sequence (hn) are then indistinguishable from an integrated square error point of view.
These results, which generalize those obtained by Hall (1984), are presented in Section 3. Some notation
and general assumptions on the underlying density function, on the kernel and on the measurable sequence
of bandwidths are introduced in Section 2.
A similar study is considered in Liero (1992) for the kernel regression estimator. However, our approach
is diIerent from the one adopted by Liero (1992) since, following the methods developed by Bickel and
Rosenblatt (1973), this author uses strong approximation principles for empirical processes.
The proofs of all results are given in Section 4.
2. Some notation and general assumptions
The goal of this paper is to describe the asymptotic behaviour of the integrated square error Jn deCned by
(2). Our approach is based on the following equalities which give us alternative expressions for the kernel
estimators fn and fˆn:
fn(x)=
1
nhdn
n∑
i= 1
W
(
x − Xi
hn
; 1
)
(4)
and
fˆn(x)=
1
nhdn
n∑
i= 1
W
(
x − Xi
hn
;
An
hn
)
; (5)
for x∈Rd, where W (u; h)=Kh(u) for u∈Rd and h¿ 0.
Under appropriated conditions on the kernel K and on (An), we shall show that fˆn can be conveniently
approximated by fn and that the same occurs for Jn and In given by (3). Therefore, the asymptotic behaviour
of Jn can be studied from the corresponding one of In.
C. Tenreiro / Statistics & Probability Letters 53 (2001) 283–292 285
For m∈N0, let us introduce the set Kb(m) of bounded kernels K on Rd of order m, i.e., such that∫ ‖u‖m|K(u)| du¡∞ and ∫ ua11 : : : uadd K(u) du=0; whenever 0¡∑di= 1 ai ¡m, with a1; : : : ; ad ∈N0 and u=
(u1; : : : ; ud), where ‖ · ‖ is the euclidian norm in Rd. For ∈N, let us denote by Kb(m) the subset of Kb(m)
of -times continuously diIerentiable kernels K for which there exists ∈ ]0; 1[ such that the functions deCned
by u→‖u‖m sup|h−1|6 |@‘W=@h‘(u; h)|, for ‘=0; 1; : : : ; , are bounded and integrable on Rd.
The standard normal kernel K(u)= (2 )−d=2 exp(−‖u‖2=2), u∈Rd, belongs to Kb(2) for all ∈N, and
every -times continuously diIerentiable kernel in Kb(m) with compact support belongs to Kb(m).
Consider the following assumptions on An denoted by (B).
Assumptions on An (B)
There exists a deterministic sequence (hn) of strictly positive real numbers satisfying hn→ 0 and nhdn →+∞,
when n→ +∞, such that
!n :=
An
hn
− 1= op(1):
If An is chosen such that An =Bn(X1; : : : ; Xn)an; with (an) a deterministic sequence of strictly positive
real numbers satisfying an→ 0 and nadn → +∞, when n→ +∞, and (Bn) a sequence of strictly positive
measurable functions satisfying Bn =Bf + Op(1=
√
n); for some Bf ∈ ]0;+∞[, the previous conditions are
fulClled with hn =Bfan and !n =Op(n−1=2). This is, for example the case of the bandwidth selection methods
using reference distributions or the maximal smoothing principle whenever f has Cnite fourth order moments
(e.g. Terrel, 1990). A choice of the previous form for An was also considered by Fan (1994) in the context
of goodness-of-Ct tests based on the kernel density estimator to obtain a invariant test statistic under the
null hypothesis of normality. In a real context, with the important previous exceptions, the assumption (B)
is satisCed for most bandwidth selectors with hn = hMISE the minimizer of the mean integrated square error
(under some regularity conditions on the underlying density function f and on the kernel K). The distinction
between bandwidth selectors is usually based on the rate of convergence to 0 of the random sequence !n:
!n =Op(n−1=10) for the least squares cross-validation and biased cross-validation methods (see Scott and
Terrel, 1987; Hall and Marron, 1987); !n =Op(n−4=13) for the plug-in method of Park and Marron (1990);
!n =Op(n−5=14) for the plug-in method of Sheather and Jones (1991); and !n =Op(n−1=2) for the plug-in
method of Hall et al. (1991), and for the smoothed cross-validation methods of Hall et al. (1992) and Jones
et al. (1991). Remark that the previous rates of convergence are not directly comparable since the conditions
imposed to f in each result are not coincident. Hall and Marron (1991) have shown that Op(n−1=2) is the
best achievable rate, i.e., it is impossible to use a data-dependent bandwidth which is closer to hMISE than
n−1=2 in the previous sense. See also Jones et al. (1996) and Loader (1999) for a survey and some interesting
comments on bandwidth selection.
3. Asymptotic behaviour of Jn
Let us denote byWb(m) the set of bounded probability density functions on Rd with bounded and continuous
partial derivatives up to order m. For f∈Wb(m) and # a real function on Rd such that
∫ ‖u‖m|#(u)| du¡∞,
let us denote by $m#f the function deCned, for x∈Rd, by
$m#f(x)=
(−1)m
m !
d∑
i1 ;:::; im = 1
∫
ui1 : : : uim#(u) du
@mf
@xi1 · · · @xim
(x):
We present in the following lemma, an asymptotic expansion in probability for the integrated square error
Jn in terms of the integrated square error In. It is proven in Section 4.
286 C. Tenreiro / Statistics & Probability Letters 53 (2001) 283–292
Lemma 1. Let us assume that f∈Wb(m) and K ∈Kb(m) for some m and in N. Under the assumption
(B); if the m-order partial derivatives of f are square integrable; we have
Jn = In + 2(1− %1)!n
(
1
nhdn
∫
K(u)K@(u) du+ h2mn
∫
$mKf(x)$
m
K@f(x) dx
)
+Op
(
!n
(
1
nhd=2n
+
1√
nh−mn
)
+ !n
)
+ op
(
!n
(
1
nhdn
+ h2mn
))
;
where %ij is the kronecker delta and for u=(u1; : : : ; ud)∈Rd,
K@(u)= − dK(u)−
d∑
i= 1
ui
@K
@ui
(u):
It is clear now that the asymptotic behaviour of Jn follows from the corresponding one of In and !n. If
f∈Wb(m) and K ∈Kb(m) for some m∈N, and if there exists '∈ [0;+∞] such that limn→+∞ nhd+2mn = ',
we know from Hall (1984) (see also GouriNeroux and Tenreiro, 1996; Tenreiro, 1997 for geometrically (-mixing
observations), that the convergence in distribution
dn(In − EIn) d→
n→+∞N (0; 1); (6)
occurs with
dn =
{
nhd=2n (2*
2
1 + 4'*
2
2)
−1=2 if '∈ [0;+∞[√
nh−mn (4*
2
2)
−1=2 if '= +∞;
where
*21 =
∫
f2(x) dx
∫ (∫
K(u+ z)K(u) du
)2
dz
and
*22 =
∫
($mKf)
2(x)f(x) dx −
(∫
($mKf)(x)f(x) dx
)2
:
Using Lemma 1 and the fact that In converges in probability to zero as n tends to inCnity, we easily derive
the L2 consistency in probability of the kernel density estimator with data-dependent bandwidth.
Theorem 1. Under the conditions of Lemma 1; if there exists '∈ [0;+∞] such that limn→+∞ nhd+2mn = ';
we have
Jn
p→
n→+∞0:
Under additional assumptions on the rate of convergence of the random sequence (!n), we derive now an
asymptotic expansion in probability for Jn.
Theorem 2. Under the conditions of Theorem 1; if
!n = op
(
1
nhdn
+ h2mn
)
; (7)
then
Jn =
1
nhdn
∫
K2(u) du+ h2mn
∫
($mKf)
2(x) dx + op
(
1
nhdn
+ h2mn
)
:
C. Tenreiro / Statistics & Probability Letters 53 (2001) 283–292 287
Remark that if hn ∼ Cn−( for some C ¿ 0 and 0¡(¡ 1=d, and n!n is bounded in probability for some
0¡6 12 , the condition (7) is veriCed if ¿min((1− (d)=; 2(m=). Most bandwidth selectors satisfy the
assumption (B) with hn = hMISE (see the references given in Section 2). In this case, if
∫
($mKf)
2(x) dx =0
we have
hn = hMISE ∼
(
d
∫
K2(u) du
2m
∫
($mKf)2(x) dx
)1=(d+2m)
n−1=(d+2m)
and the condition (7) is fulClled whenever ¿ 2m=((d+ 2m)). Moreover, from Theorem 2
Jn = J[K;f;m;d]n−2m=(d+2m) + op(n−2m=(d+2m));
with
J[K;f;m;d] = (d+ 2m)
[
1
d
∫
($mKf)
2(x) dx
(∫
K2(u) du
2m
)2m]1=(d+2m)
:
The following theorem follows straightforward from Lemma 1. Note that if condition (10) below is satisCed,
the asymptotic behaviour of Jn does not depend on !n. However, if condition (10) is not satisCed which occurs
when !n converges slowly to zero, the second term Vn of the expansion (9) below is at least as signiCcant
as the Crst one. In this case the asymptotic behaviour of Jn−EIn strongly depends on the bandwidth selector
asymptotic behaviour.
Theorem 3. Under the conditions of Theorem 1; if
!n = op
(
1
nhd=2n
+
1√
nh−mn
)
or !−1n = op
(
1
nhdn
+ h2mn
)
; (8)
for some ∈{2; 3; : : :}; we have
Jn − EIn =d−1n Un + Vn + op(d−1n + Vn); (9)
where
Un =dn(In − EIn) d→
n→+∞N (0; 1)
and
Vn =2!n
(
1
nhdn
∫
K(u)K@(u) du+ h2mn
∫
$mKf(x)$
m
K@f(x) dx
)
:
Moreover; if
dn!n
(
1
nhdn
+ h2mn
)
= op(1); (10)
then
dn(Jn − EIn) d→
n→+∞N (0; 1):
Remark that if n!n is bounded in probability for some 0¡6 12 , and hn ∼ Cn−(, for some C ¿ 0
and 0¡(¡ 1=d, condition (8) is satisCed whenever ¿min((1 − (d=2)=; ( 12 + (m)=; 1 + (1 − (d)=; 1 +
2(m=); and the same is true for condition (10) whenever ¿ 12 − (m for (6 1=(d + 2m) and ¿(d=2
for 1=(d + 2m)¡(¡ 1=d. Moreover, from the expansion (9) we conclude that the probability order of
convergence of Jn−EIn depends on !n through and (. In this case, there exists a function /(; () such that
Jn − EIn =Op
(
n−/(;()
)
. In the following we describe the behaviour of /(; () assuming, for simplicity, that
288 C. Tenreiro / Statistics & Probability Letters 53 (2001) 283–292
Fig. 1. Behaviour of /(; ().
condition (8) is fulClled for all values of and (. This is, in particular, true if K belongs to Kb(m) for all
∈N.
In Fig. 1 above the arrows indicate the increase of /(; () when (; () moves as shown. A double sided
arrow indicate that /(; () is constant along that direction. It is clear that the common practice of considering
bandwidth selectors satisfying (B) with hn = hMISE produces a simultaneous minimization of the “asymptotic
mean” of Jn and maximization of the order of convergence of the “asymptotic variance” of Jn. In this “optimal”
case (=1=(d+ 2m) and
/(; ()=
+
2m
d+ 2m
if 0¡¡
d
2d+ 4m
;
d+ 4m
2d+ 4m
if
d
2d+ 4m
6 6
1
2
:
Moreover, if ¿d=(2d + 4m) the asymptotic behaviour of Jn does not depend on !n. Bandwidth selectors
verifying this condition are then indistinguishable from an integrated square error point of view.
In a real context, if K is the standard normal kernel (m=2), and (B) is satisCed with hn = hMISE, we have
/(; 15 )=
9
10 , for
1
106 6
1
2 . This optimal order of convergence is achieve for all the bandwidth selectors
referred in Section 2 which satisfy (B) with hn = hMISE.
4. Proofs
Proof of Lemma 1. For x∈Rd let us denote
REfˆn(x)=
∫
KAn(x − y)f(y) dy
and consider the following expansion
Jn =
∫
{fˆn(x)− REfˆn(x)}2 dx
+2
∫
{fˆn(x)− REfˆn(x)}{ REfˆn(x)− Efn(x)} dx
+2
∫
{fˆn(x)− REfˆn(x)}{Efn(x)− f(x)} dx
C. Tenreiro / Statistics & Probability Letters 53 (2001) 283–292 289
+
∫
{ REfˆn(x)− Efn(x)}2 dx
+2
∫
{ REfˆn(x)− Efn(x)}{Efn(x)− f(x)} dx
+
∫
{Efn(x)− f(x)}2 dx
=: J 1n + 2 J
2
n + 2 J
3
n + J
4
n + 2 J
5
n +
∫
{Efn(x)− f(x)}2 dx: (11)
Each one of the previous terms will be studied in the following propositions.
Proposition 1. For u∈Rd and h¿ 0; de6ne W (u; h)=K(u=h)=hd. If K ∈Kb(0) for some ∈N, we have:
(a) For x∈Rd;
fˆn(x)=fn(x) +
−1∑
‘= 1
!‘n
1
‘ !
1
n
n∑
i= 1
K@(‘)hn (x − Xi) + !n
1
n
n∑
i= 1
Rhn(x − Xi; An)
and
REfˆn(x)=Efn(x) +
−1∑
‘= 1
!‘n
1
‘ !
∫
K@(‘)hn (x − y)f(y) dy + !n
∫
Rhn(x − y; An)f(y) dy;
where; for u∈R and h¿ 0;
K@(‘)(u)=
@‘W
@h‘
(u; 1); ‘=0; 1; : : : ; − 1
and
R(u; h)=
1
(− 1) !
∫ 1
0
(1− t)−1 @
W
@h
(u; 1 + t(h− 1)) dt:
(b) Moreover; the functions K@(‘) for ‘=0; 1; : : : ; − 1; and u→ sup|h−1|6 |R(u; h)|; where ¿ 0 is given
in the de6nition of K; are bounded and integrable.
Proof. Since K is -times continuously diIerentiable on Rd, we have, from the Taylor expansion,
W (u; h)=
−1∑
‘= 1
(h− 1)‘ 1
‘ !
@‘W
@h‘
(u; 1) + (h− 1) 1
(− 1) !
∫ 1
0
(1− t)−1 @
W
@h
(u; 1 + t(h− 1)) dt;
for u∈Rd and h¿ 0. Thus (a) follows from the equalities (4) and (5) and
REfˆn(x)=
1
hdn
∫
W
(
x − y
hn
;
An
hn
)
f(y) dy:
Since K ∈Kb(0), the boundedness and the integrability of functions given in (b) follows.
In all of the following propositions we assume that the assumption (B) is satisCed.
290 C. Tenreiro / Statistics & Probability Letters 53 (2001) 283–292
Proposition 2. If f is bounded and K ∈Kb(0) for some ∈N; then
J 1n =
∫
{fn(x)− Efn(x)}2 dx + 2(1− %1)!n 1nhdn
∫
K(u)K@(u) du
+Op
(
!n
1
nhd=2n
+ !2n
1
nhdn
+ !n
)
;
where; for u=(u1; : : : ; ud)∈Rd; K@(u)= − dK(u)−
∑d
i= 1 ui@K=@ui(u):
Proof. From Proposition 1(a) and for x∈Rd, we have
fˆn(x)− REfˆn(x) = fn(x)− Efn(x)
+
−1∑
‘= 1
!‘n
1
‘ !
1
n
n∑
i= 1
{K@(‘)hn (x − Xi)− EK
@(‘)
hn (x − X0)}
+ !n
1
n
n∑
i= 1
{
Rhn(x − Xi; An)−
∫
Rhn(x − y; An)f(y) dy
}
: (12)
Therefore, from Proposition 1(b) and the convergence in probability of An=hn to 1, when n→ +∞, we get
J 1n =
∫
{fn(x)− Efn(x)}2 dx +
−1∑
‘;‘′ = 0
‘+‘′¿ 1
!‘+‘
′
n
1
‘ !‘′ !
V‘;‘
′
n + Op(!
n);
where
V‘;‘
′
n =
1
n2
n∑
i; j= 1
∫
{K@(‘)hn (x − Xi)− EK
@(‘)
hn (x − X0)}{K
@(‘′)
hn (x − Xj)− EK
@(‘′)
hn (x − X0)} dx:
Using degenerated U-statistics techniques (see Hall, 1984) and the fact that K@(‘) for ‘=0; 1; : : : ; − 1, are
bounded and integrable, we obtain
V‘;‘
′
n =
1
nhdn
∫
K@(‘)(u)K@(‘
′)(u) du+
1
nhd=2n
U‘;‘
′
n + op
(
1
nhd=2n
)
;
where U‘;‘
′
n is asymptotically normal. Then,
J 1n =
∫
{fn(x)− Efn(x)}2 dx + 2(1− %1)!n 1nhdn
∫
K(u)K@(u) du
+Op
(
!n
1
nhd=2n
+ !2n
1
nhdn
+ !n
)
;
where K@ =K@(1). Finally, it suSces to note that @W=@h(u; h)= − dKh(u)=h −
∑d
i= 1 ui@Kh=@ui(u)=h, for
u=(u1; : : : ; ud)∈Rd and h¿ 0.
Proposition 3. If f∈Wb(m) and K ∈Kb(m) for some m∈N and ∈N; then
J 2n =Op
(
!n
1√
nh−mn
+ !n
)
:
C. Tenreiro / Statistics & Probability Letters 53 (2001) 283–292 291
Proof. From Proposition 1 we deduce
J 2n =
−1∑
‘;‘′ = 1
!‘+‘
′
n
1
‘ ! ‘′ !
1
n
n∑
i= 1
∫
{K@(‘)hn (x − Xi)− EK
@(‘)
hn (x − X0)} n;‘′(x) dx + Op(!n);
where n;‘(x)=
∫
K@(‘)hn (x − y)f(y) dy, for ‘=1; : : : ; − 1.
On the other hand, as for f∈Wb(m) and K ∈Kb(m) we have
sup
n∈N
sup
x∈Rd
| n;‘(x)|=O(hmn ); (13)
the conclusion follows.
Proposition 4. Under the conditions of Proposition 3 we have
J 3n =
∫
{fn(x)− Efn(x)}{Efn(x)− f(x)} dx + Op
(
!n
1√
nh−mn
+ !nh
m
n
)
:
Proof. From (12) and from the equality
sup
n∈N
sup
x∈Rd
|Efn(x)− f(x)|=O(hmn ); (14)
which is valid for f∈Wb(m) and K ∈Kb(m), we get
J 3n =
∫
{fn(x)− Efn(x)}{Efn(x)− f(x)} dx
+
−1∑
‘= 1
!‘n
1
‘ !
1
n
n∑
i= 1
∫
{K@(‘)hn (x − Xi)− EK
@(‘)
hn (x − X0)}{Efn(x)− f(x)} dx + Op(!nhmn )
=
∫
{fn(x)− Efn(x)}{Efn(x)− f(x)} dx + Op
(
!n
1√
nh−mn
+ !nh
m
n
)
:
Proposition 5. Under the conditions of Proposition 3 we have
J 4n =Op(!
2
nh
2m
n + !
+1
n ):
Proof. Follows from Proposition 1 and (13).
Proposition 6. Under the conditions of Proposition 3; if the m-order partial derivatives of f are square
integrable we have
J 5n =(1− %1)!nh2mn
∫
$mKf(x)$
m
K@f(x) dx + op(!nh
2m
n ) + Op(!
nh
m
n ):
Proof. From Proposition 1 and (14) we have
J 5n =
−1∑
‘= 1
!‘n
1
‘ !
Vn;‘ + Op(!nh
m
n );
292 C. Tenreiro / Statistics & Probability Letters 53 (2001) 283–292
where, for ‘=1; : : : ; − 1,
Vn;‘ =
∫ ∫
K@(‘)hn (x − y)f(y) dy{Efn(x)− f(x)} dx
= h2mn
∫
$mKf(x)$
m
K@(‘)f(x) dx + op(h
2m
n ):
Now, the asymptotic expansion for Jn given in Lemma 1 is a trivial consequence of (11) and of the previous
propositions. The proof of Lemma 1 is then completed.
Proof of Theorem 2. From Lemma 1 we have
Jn = In + op
(
1
nhdn
+ h2mn
)
:
The result follows now from the convergence in distribution (6) and the usual expansion (cf. Bosq and
Lecoutre, 1987, p. 80)
EIn =
1
nhdn
∫
K2(u) du+ h2mn
∫
($mKf)
2(x) dx + O
(
1
n
)
+ o(h2mn ):
Acknowledgements
The author is grateful to a referee for helpful comments.
References
Bickel, P.J., Rosenblatt, M., 1973. On some global measures of the deviations of density function estimates. Ann. Statist. 1, 1071–1095.
Bosq, D., Lecoutre, J.-P., 1987. ThNeorie de l’Estimation Fonctionnelle. Economica, Paris.
Fan, Y., 1994. Testing the goodness of Ct of a parametric density function by kernel method. Econom. Theory 10, 316–356.
GouriNeroux, C., Tenreiro, C., 1996. Local power properties of kernel based goodness of Ct tests, Preprint 9617, Departamento de
MatemNatica, Universidade de Coimbra. J. Multivariate Anal., to appear.
Hall, P., 1984. Central limit theorem for integrated square error properties of multivariate nonparametric density estimators. J. Multivariate
Anal. 14, 1–16.
Hall, P., Marron, J.S., 1987. Extent to which least-squares cross-validation minimises integrated square error in nonparametric density
estimation. Probab. Theory Related Fields 74, 567–581.
Hall, P., Marron, J.S., 1991. Lower bound for bandwidth selection in density estimation. Probab. Theory Related Fields 90, 149–173.
Hall, P., Marron, J.S., Park, B.U., 1992. Smoothed cross-validation. Probab. Theory Related Fields 92, 1–20.
Hall, P., Sheather, S.J., Jones, M.C., Marron, J.S., 1991. On optimal data-based bandwidth selection in kernel density estimation.
Biometrika 78, 263–269.
Jones, M.C., Marron, J.S., Park, B.U., 1991. A simple root n bandwidth selector. Ann. Statist. 19, 1919–1932.
Jones, M.C., Marron, J.S., Shearther, S.J., 1996. A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc.
91, 401–407.
Liero, H., 1992. Asymptotic normality of a weighted integrated squared error of kernel regression estimates with data-dependent bandwidth.
J. Statist. Plann. Inference 30, 307–325.
Loader, C.R., 1999. Bandwidth selection: classical or plug-in? Ann. Statist. 27, 415–438.
Park, B.U., Marron, J.S., 1990. Comparison of data-driven bandwidth selectors. J. Amer. Statist. Assoc. 85, 66–72.
Parzen, E., 1962. On estimation of a probability density function and mode. Ann. Math. Statist. 33, 1065–1076.
Rosenblatt, M., 1956. Remarks on some non-parametric estimates of a density function. Ann. Math. Statist. 27, 832–837.
Scott, D.W., Terrel, G.R., 1987. Biased and unbiased cross-validation in density estimation. J. Amer. Statist. Assoc. 82, 1131–1146.
Sheather, S.J., Jones, M.C., 1991. A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc.
Ser. B 53, 683–690.
Tenreiro, C., 1997. Loi asymptotique des erreurs quadratiques intNegrNees des estimateurs Wa noyau de la densitNe et de la rNegression sous
des conditions de dNependance. Portugaliae Math. 54, 187–213.
Terrel, G.R., 1990. The maximal smoothing principle in density estimation. J. Amer. Statist. Assoc. 85, 470–477.