On the automatic selection of the tuning parameter appearing in certain families of goodness-of-fit tests

ABSTRACT The situation, common in the current literature, is that of a whole family of location-scale/scale invariant test statistics, indexed by a parameter , is available to test the goodness of fit of F, the underlying distribution function of a set of independent real-valued random variables, to a location-scale/scale family of distribution functions. The power properties of the tests associated with the different statistics usually depend on the parameter λ, called the “tuning parameter”, which is the reason that its choice is crucial to obtain a performing test procedure. In this paper, we address the automatic selection of the tuning parameter when Λ is finite, as well as the calibration of the associated goodness-of-fit test procedure. Examples of existing and new tuning parameter selectors are discussed, and the methodology presented of combining different test statistics into a single test procedure is applied to well known families of test statistics for normality and exponentiality. A simulation study is carried out to access the power of the different tests under consideration, and to compare them with the fixed tuning parameter procedure, usually recommended in the literature.


Introduction
Given a sample X 1 , . . . , X n of independent and identically distributed real-valued random variables from a distribution function F, assume that T n,λ = T n,λ (X 1 , . . . , X n ), for λ ∈ , is a finite family of statistics for testing the hypothesis against a general alternative hypothesis, where F is either a location-scale family, or a scale family of distribution functions, with F 0 a known distribution function on R. If the test statistics T n,λ are locationscale invariant in case (2), that is, T n,λ (νX 1 + μ, . . . , νX n + μ) = T n,λ (X 1 , . . . , X n ), for each ν > 0 and μ ∈ R, or scale invariant in case (3), that is, T n,λ (νX 1 , . . . , νX n ) = T n,λ (X 1 , . . . , X n ), for each ν > 0, the distribution of T n,λ under H 0 does not depend on F. Therefore, if large values of T n,λ are significant, each one of the tests with critical regions {T n,λ (X 1 , . . . , X n ) > c n,λ (α)}, has a level of significance at most equal to α, that is, P F (T n,λ (X 1 , . . . , X n ) > c n,λ (α)) ≤ α, for all F ∈ F, where α ∈]0, 1[, and c n,λ (α) = F −1 T n,λ (1 − α) denotes the quantile of order 1 − α of T n,λ under H 0 . Of course, if the distribution functions of all T n,λ under H 0 , are continuous on R, the test procedures associated with the previous critical regions have a level of significance exactly equal to α. The power properties of the previous test procedures usually depend on the parameter λ which is the reason that its choice is crucial to obtain a performing test procedure.
The previous situation, where a finite family of test statistics is available for testing the hypothesis H 0 , is now common in the current literature as evidenced by the works of Epps and Pulley [1], Baringhaus and Henze [2], Henze [3], Güntler and Henze [4], Klar [5], Henze and Meintanis [6], Meintanis [7,8], and Meintanis et al. [9], where goodness-of-fit tests for the normal, exponential, Cauchy, Laplace, or logistic distributions, based on the empirical characteristic function, the probability weighted characteristic function, the integrated empirical distribution function or the Laplace transform, are proposed (for related work, see also [10][11][12][13]). In all these situations the test statistics T n,λ , λ ∈ , are either location-scale invariant in case (2) or scale invariant in case (3). More precisely, they can be written in the form T n,λ (X 1 , . . . , X n ) =T n,λ (Y 1 , . . . , Y n ) with where g is a known function given by g(X 1 , . . . , X n ) = ((X 1 −b n )/â n , . . . , (X n −b n )/â n ), in case (2), and by g(X 1 , . . . , X n ) = (X 1 /â n , . . . , X n /â n ), in case (3), whereâ n = a n (X 1 , . . . , X n ) is a location invariant and scale equivariant estimator of the scale parameter a, i.e.â n (νX 1 + μ, . . . , νX n + μ) = νâ n (X 1 , . . . , X n ), for each ν > 0 and μ ∈ R, and b n =b n (X 1 , . . . , X n ) is a location-scale equivariant estimator of location parameter b, i.e. b n (νX 1 + μ, . . . , νX n + μ) = νb n (X 1 , . . . , X n ) + μ, for each ν > 0 and μ ∈ R. Focusing our attention on the normality test introduced by Epps and Pulley [1], the considered test statistic is defined as a weighted L 2 -distance between the empirical characteristic function of the scaled residual Y j = (X j −X n )/S n , j = 1, . . . , n, given by ϕ n (t) = (1/n) n j=1 exp(itY j ), t ∈ R, and the characteristic function ϕ(t) = exp(−t 2 /2) of the standard Gaussian density φ(x) = (2π) −1/2 exp(−x 2 /2), x ∈ R, whereX n = n −1 n j=1 X j and S 2 n = n −1 n j=1 (X j −X n ) 2 , are the sample mean and the sample variance, respectively. The weight function is given by t → exp(−λ 2 t 2 ), where λ a strictly positive real number that needs to be chosen by the user. Therefore the Epps-Pulley test statistic is given by with The simplicity of the previous expression shows the attractive feature of the considered weight function (see [10,11], for the relation between the Epps-Pulley test statistic and the Bickel-Rosenblatt test statistic). From a practical point of view, it is well known that the finite sample performance of the Epps-Pulley test is very sensitive to the choice of λ. Choosing a small value of λ, which means letting the weight function decay slowly, will produce a test sensitive to short tailed or high moment alternatives, whereas large values of λ, which means putting most of the mass of the weight function near zero, are adequate for detecting alternative distributions with long tails, symmetric or asymmetric (cf. [14]). Note that for large values of λ the considered weight function puts most of its mass near the origin, and then, the previous behaviour can be seen as a consequence of the fact that the tail behaviour of a probability distribution is reflected by the behaviour of its characteristic function at the origin (cf. [15], pp. 419-420). The exponentiality tests introduced in Henze and Meintanis [6], is another example where a whole family of test statistics is available to the user. In this case the test statistics are based on a weighted L 2 -distance between the empirical Laplace transform of the scaled data Y j = X j /X n , j = 1, . . . , n, defined by ψ n (t) = (1/n) n j=1 exp(−tY j ), t ≥ 0, and the Laplace transform of the unit exponential distribution ψ(t) = 1/(1 + t), t ≥ 0, with weight function t → (1 + t) 2 exp(−λt), for λ > 0. Thus, the Henze-Meintanis test statistic is given by for λ ∈]0, +∞[. As the Epps-Pulley test for normality, the Henze-Meintanis test for exponentiality is very sensitive to the choice of λ. As remarked by Baringhaus and Henze [2, p. 552] (see also [6], p. 148), from Tauberian theorems on Laplace transform (cf. [16], Chapter XIII.5), it is known that the tail behaviour of a probability distribution concentrated on [0, ∞[ is reflected by the behaviour ot its Laplace transform at zero and vice versa. Therefore, choosing a small value of λ, which means letting the weight function decay slowly, gives high power against alternative distributions having a point mass or infinite density at zero, and a large value of λ means putting most of the mass of the weight function near zero, which should give high power against alternatives that greatly differ in tail behaviour with respect to the exponential distribution. As illustrated by the previous examples, the parameter λ acts as a tuning parameter, through which the user can increase the power of the test toward some particular direction along the alternative distribution set. However, as the formulation of a specific alternative hypothesis is, in general, impossible in a real situation, the usual practice is to evaluate the test power performance for λ varying in a finite set , and then suggesting a selection of λ that produces a test with a reasonable power against a wide range of alternative distributions. However, this strategy of taking a fixed tuning parameter does not prevent the user from obtaining a test that achieves a very low power against some of the considered alternative distributions (cf. [6]).
Some efforts have been made in order to combine, based on the available data, test procedures associated to different values of the tuning parameter λ into a single test procedure that could show a good power performance against a wide range of alternative distributions. This was the case of the multiple test approach, considered in Klar [5], Fromont and Laurent [17] and Tenreiro [18,19], which can be viewed as an improvement of the classical Bonferroni multiple test procedure. The proposed test leads to the rejection of the null hypothesis if one of the statistics T n,λ , for λ ∈ , is larger than its (1−u) quantile under the null hypothesis, the level u being calibrated so that the resulting multiple test has a level of significance at most equal to α. Thus, the associated critical region is given by for some u ∈]0, 1[. This testing procedure is closely related to the single-step minP multiple testing procedure based on minima of unadjusted p-values (cf. [20], pp. 117-121). Unlike classical Bonferroni multiple, that can be obtained by taking u = α/| |, where | | denotes the cardinality of , the previous rejection region takes into account the dependence structure among the test statistics T n,λ for λ ∈ . As the previous critical region can be written as {T n,λ u > c n,λ u (u)}, wherē the previous multiple test procedure can be seen as a test based on a data-dependent procedure for selecting the tuning parameter λ: for a given sample of size n, one selects the value λ in for which the test statistic T n,λ shows stronger evidence, at level u, against the null hypothesis. More recently, Allison and Santana [21] proposed an alternative data-dependent method, based on the bootstrap, for choosing the tuning parameter. They considered the test with critical region {T n,λ * > c n,λ * (α)}, where λ * = λ * (X 1 , . . . , X n ) is obtained by maximizing the bootstrap power, that is, is a bootstrap random sample of size n drawn with replacement from the empirical distribution function of the transformed sample (Y 1 , . . . , Y n ) defined by (4), B ∈ N is the considered number of bootstrap samples, and I(A) denotes the indicator function of the set A. Unfortunately, the proposed method presents two important drawbacks. Firstly, the suggested bootstrap procedure, as based on a bootstrap random sample drawn from the empirical distribution function of the transformed sample (4), and not from the empirical distribution function of the original sample, does not always produce a good approximation for the power associated with the distribution of the observations. Secondly, by using the quantiles of order 1 − α of each one of the test statistics T n,λ to define the critical region, the proposed test is not correctly calibrated and may reach a level of significance much bigger than α (see Section 4, Figure 1). In order to overcome these problems, we consider in this paper the test with critical region {T n,λ u > c n,λ u (u)}, for u ∈]0, 1[, where the modified tuning parameter selectorλ u , is defined bỹ with X * kj = X U (k−1)n+j , for k = 1, . . . , B and j = 1, . . . , n, where U l , for l = 1, . . . , nB, are independent copies of the discrete uniform distribution on {1, . . . , n}, and u is calibrated so that the test has a level of significance at most equal to α. Although not assumed or discussed in this paper, if, for B and n large enough, the mean in (9) gives a good approximation for the probability P F (T n,λ (X 1 , . . . , X n ) > c n,λ (u)), we might expect thatλ u mimics the behaviour of the ideal tuning parameter this being the main motivation for the previous definition ofλ u . Asλ u depends on U = (U l , l = 1, . . . , nB) ∈ {1, . . . , n} nB , any statement on this tuning parameter selector should always be interpreted conditionally on U. However, by the law of large numbers, different choices of U, essentially lead to tuning parameter selectors with similar behaviours. The paper is organized as follows. Sections 2 and 3 deal with the calibration and the consistency of the tests with critical region is a general family of measurable functions, indexed by u ∈]0, 1[, taking values in , which are either location-scale invariant in case (2) or scale invariant in case (3). The cases of the tuning parameter selectorsλ u andλ u , defined by (8) and (9), respectively, are analysed in detail. In Sections 4 and 5 we will restrict our attention to the cases where T n,λ is either the test statistic for normality of Epps and Pulley [1], or the test statistic for exponentiality of Henze and Meintanis [6]. We conclude that the proposed calibration procedure is effective, and, as a result of a simulation study, we deduce that the tests based onλ u andλ u are serious competitors for the tests based on a fixed tuning parameter usually recommended in the literature, and perhaps should be employed in practice, in the absence of any information about the type of deviation from the null model. All the proofs and some auxiliar results are deferred to Section 8. The simulations and plots in this article were carried out using the R software [22].

The calibration procedure
In this section we will denote by T n,λ , for λ ∈ , a finite family of test statistics for testing the hypothesis (1), whose large values are considered significant. We will also assume that such test statistics are either location-scale invariant in case (2) or scale invariant in case (3). This assumption enables us to consider that the quantiles of order 1−u of T n,λ under H 0 , denoted, as before, by c n,λ (u) = F −1 T n,λ (1 − u), are known quantities as they can be approximated by performing Monte Carlo experiments under the null hypothesis.
Given a general invariant tuning parameter selectorλ u =λ u (X 1 , . . . , X n ), that is, a family of measurable functions indexed by u ∈]0, 1[, taking values in , which are either location-scale invariant in case (2) or scale invariant in case (3), we have from which we conclude that it is always possible to choose u ∈]0, 1[ such that the test with critical region {T n,λ u > c n,λ u (u)}, has a level of significance at most equal to the nominal level α: where the probability P F (T n,λ u > c n,λ u (u)), we denote by ψλ(u), is independent of F, for F ∈ F.

A first calibration stage
Although important, as it assures that the Type I error of the test with critical region {T n,λ u > c n,λ u (u)}, may be put under a preassigned level of significance α through an appropriate choice of u, Theorem 2.1 does not provide a criterium for such a choice. Taking into account that the test should have a level of significance not only less than or equal to α but also as close to α as possible, the practical selection of u will be performed by considering a regular grid for some 0 < p ≤ α/| |, and I p = {k ∈ N : kp < 1}, and taking u = uλ n,α,p , where uλ n,α,p is the largest value of G p satisfying ψλ(u) := P F 0 (T n,λ u > c n,λ u (u)) ≤ α, that is, We present in the next theorem a set of sufficient conditions on ψλ assuring that the test with critical region {T n,λ u > c n,λ u (u)}, for u = uλ n,α,p , has a level of significance as close to α as possible, when p tends to zero.
The previous general result allows us to present a set of sufficient conditions on the null distribution of the statistics T n,λ , weaker that those considered in Tenreiro [18,Theorem 1], assuring that the test associated with the critical region {T n,λ u > c n,λ u (u)}, withλ u given by (8) and u = uλ n,α,p given by (12), has a level of significance not only inferior, but also as close to α as possible, when p tends to zero.

A second calibration stage
Under the assumptions of Theorem 2.3 on the null distribution of the statistics T n,λ , it can be proved that lim u↑1 ψλ(u) = 1, for any invariant tuning parameter selectorλ u (see Section 8, Proposition 8.1). However, the same set of assumptions does not necessarily assure that ψλ is increasing and continuous on ]0, 1[. Therefore, under the assumptions of Theorem 2.3 the test with critical region {T n,λ u > c n,λ u (u)}, with u = uλ n,α,p given by (12), has a level of significance no bigger than α, but we cannot affirm that its level of significance becomes as close to α as possible, when p tends to zero. Next, we will see that this goal can be achieved if a second calibration stage is performed.
This general result gives us a set of sufficient conditions on the null distribution of the statistics T n,λ , assuring that, conditionally on U = (U l , l = 1, . . . , nB), the test with critical region {T n,λ u > c n,λ u (v u )}, with u = uλ n,α,p and v u = vλ u n,α,q given by (12) and (14), respectively, has a level of significance not only less than or equal but also as close to α as possible, when q tends to zero.

Consistency against fixed alternatives
Under some general assumptions, the test procedures considered in the previous section detect an alternative F / ∈ F, if such an alternative is detected by all the test statistics T n,λ , for λ ∈ . Next we will restrict our attention to the tests that use a single calibration stage. However, similar results can be derived for the test procedures considered in Subsection 2.2, where a second calibration stage is used.
In the particular case of the tuning parameter selectorλ u given by (8), it is interesting to note that the previous consistency result may be obtained under weaker assumptions. In fact, the test with critical region {T n,λ u > c n,λ u (u)}, with u = uλ n,α , detects any alternative F / ∈ F that is detected by at least one of the tests based on T n,λ , for λ ∈ . This attractive property of the tuning parameter selectorλ u is stated in the following result.

Combining the EP and HM test statistics
In this and the following section we will restrict our attention to the families of test statistics of Epps and Pulley [1] and of Henze and Meintanis [6], given by (5) and (6), respectively (henceforth denoted by EP and HM). In the former case, the parametric family F is given by (2), with F 0 the distribution function of the standard Gaussian distribution, whereas in the latter it is given by (3), with F 0 the distribution function of the unit exponential distribution. We start by showing that both test statistic families satisfy the assumptions of Theorems 2.3 and 2.4 stated in the previous section. Taking into account Theorems 2.3 and 2.4, the previous result enables us to conclude that, either for the EP test statistic family (n ≥ 3), or for the HM test statistic family (n ≥ 2), the test with critical region {T n,λ u > c n,λ u (u)}, withλ u given by (8) and u = uλ n,α given by (13), has an exact α level of significance. The same is true for the test with critical region {T n,λ up > c n,λ up (v u p )}, withλ u , u p = uλ n,α,p , and v u p = vλ ,u p n,α , given by (9), (12), and (14), respectively. Moreover, from the results presented in Section 3, and those of Baringhaus and Henze [23,Theorems 3 and 4], and Henze and Meintanis [6, Theorems 2.3 and 2.7], we deduce that the previous tests are consistent against each nondegenerated non-normal distribution with finite variance, when T n,λ , λ ∈ , is the EP test statistic family, and they are consistent against each nonnegative non-exponential distribution not degenerated at zero, when T n,λ , λ ∈ , is the HM test statistic family.
In order to implement the previous test procedures in practice, where the values uλ n,α and vλ up n,α , are replaced by the approximations uλ n,α,p and vλ up n,α,q , respectively, the values ψλ(u), ψλ(u) and ψλ up (v), with u p = uλ n,α,p , need to be approximated by Monte Carlo experiments under the null hypothesis, for u and v varying on the regular grid G p = {w k , k ∈ I p }, on the interval ]0, 1[, where w 1 = p and w k+1 = w k + p, for some 0 < p ≤ α/| |. For that, we use 100,000 simulations under the null hypothesis of the involved test statistics T n,λ , λ ∈ , and the R function quantile(·,type = 7) for estimating the 1−u quantiles c n,λ (u), for u varying on G p with p = 0.0001. Further 100,000 simulations are used for estimating the probabilities ψλ(u), ψλ(u) and ψλ up (v), for u and v varying on G p . In the evaluation of λ u , B = 100 bootstrap samples are used.  We always take = {0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 2.5, 3.5, 5}, a set of tuning parameters that includes the range of values for λ considered by Epps and Pulley [1] and Henze and Meintanis [6]. Although the choice of the set , of relevant values for the tuning parameter λ, may be based on some preliminar information, the previous set is meant for the most common situation in practice where no relevant information about the alternative hypothesis is available. For n = 50 we show in Figure 1 the graphics of the functions ψλ(u) and ψλ(u), for u ∈]0, 0.2[, that describe the estimated levels of significance of the test procedures based on the tuning parameter selectorsλ u andλ u , respectively, as a function of u. As observed in Section 1, from these graphics we clearly see that choosing u = α forλ u , as suggested by Allison and Santana [21], leads to a badly calibrated test procedure with a level of significance bigger than α. Similar graphics have been observed for other sample sizes. The suggested continuity of ψλ(u) explains the similar results observed in practice for the test with critical region {T n,λ up > c n,λ up (v u p )}, that includes two calibration stages, and the test with critical region {T n,λ up > c n,λ up (u p )}, that includes a single calibration stage. For this reason, and because it is less time-consuming than the test with two calibration stages, only the test with a single calibration stage is henceforth considered.
For α = 0.01, 0.05, and sample sizes n = 20,50,100, we present in Table 1 estimates of the levels uλ n,α,p and uλ n,α,p , for the preassigned level of significance α, based on regular grids of size 0.0001 on the interval ]0, 1[, and estimates of the nominal levels of significance for the tests based on the EP and HM families of test statistics. The estimation of the nominal levels was based on 100,000 simulations under the null hypotheses. With some few exceptions, the preassigned level α is inside its approximate 95% confidence interval, revealing the effectiveness of the calibration procedures.

Finite sample power analysis
In order to study the performance of the tests based on the data-based tuning parameter selectorsλ u andλ u , a simulation study is conducted for each one of the Epps-Pulley and Henze-Meintanis families of test statistics. Other than to assess their empirical power, the simulation study is also meant to compare the previous tests with the fixed tuning parameter procedures, usually recommended in the literature. For the Epps-Pulley test of normality, we take λ = λ EP := 1/ √ 2, which is one of the two values for λ recommended in the pioneering work of Epps and Pulley [1], and also considered in other studies like those of Baringhaus et al. [24] and Arcones and Wang [25] (see also [14]). For the Henze-Meintanis test of exponentiality, we take λ = λ HM := 1, which is one of the two values for λ recommended in Henze and Meintanis [6]. As before the nominal levels α = 0.01, 0.05 and the sample sizes n = 20,50,100 are considered. All the power estimates are based on 10,000 samples from the considered alternatives.
Although the following conclusions are based on a simulation study carried out for large sets of alternative distributions usually considered in power studies for testing normality (see [1,26,27]), and exponentiality (see [3,6,28]), we limit ourselves to present in Tables 2 and 3 the empirical power results for some of the considered alternatives. In these tables: LN(θ) denotes the lognormal distribution with density (θ x) −1 (2π) −1/2 exp(−(log x) 2 /(2θ 2 ))I(x ≥ 0); W(θ ) denotes the Weibull distribution with density θx θ −1 exp(−x θ )I(x ≥ 0); LF(θ ) denotes the linear increasing failure rate distribution with density (1 + θx) exp(−x − θ x 2 /2)I(x ≥ 0); and PW(θ ) denotes the power distribution with density θ −1 x 1/θ −1 I(0 ≤ x ≤ 1), where x ∈ R. For all the considered alternatives, the tests based on the data-dependent tuning parameter selectors performed similarly. Being clearly less time-consuming than the bootstrapbased method for choosing λ, our preference goes toλ u . For the generality of the considered alternatives, these data-dependent tuning parameter selectors compare well with the fixed tuning parameters λ EP or λ HM , none of them being the best over the considered set of alternative distributions. This is illustrated by the alternative distributions #1, #2 and #3 shown in Tables 2 and 3.
For the alternative distributions #4 in both tables, the tests based on the fixed tuning parameters λ EP or λ HM are slightly more powerful than those based on the data-dependent tuning parameter selectors. For the normality test this is a consequence of small and large values of λ, as λ = 0.1 or λ = 5, included in the set . In fact, for the considered normal mixture alternative the power of the Epps-Pulley tests seems to behave like an inverted Ushaped function of λ. In the case of the exponential test, the power of the Henze-Meintanis tests for the considered alternative, is very low for small values of λ. Therefore, the inclusion of the value λ = 0.1 in the set , may explain the inferior power attained for this alternative by the tests based on the data-dependent tuning parameter selectors. This is a price to pay for having tests with a reasonable power against a wide range of alternative distributions.
The power results reported for alternatives #5 in Tables 2 and 3, illustrate the already mentioned weak point of the strategy of taking a fixed tuning parameter in the absence of any prior information on the underlying alternative distribution. The usually recommended tuning parameters λ EP or λ HM lead to tests that achieve a very low power against these alternatives, for which a smaller tuning parameter value would be a better choice. As values as small as λ = 0.1 or λ = 0.25, have been included in the set , the tests based on the considered data-dependent tuning parameter selectors perform much better than the recommended fixed tuning parameters for these alternatives.

A practical example
In this section we illustrate the use of the tests with critical regions {T n,λ u > c n,λ u (u)}, with u = uλ n,α,p given by (12), for each one of the data-based tuning parameter selectorsλ u =λ u andλ u =λ u , defined by (8) and (9), respectively. To this end, we take the data considered in Allison and Santana [21, Table 9, p. 3287], which concerns the survival times of 43 patients diagnosed with a certain type of Leukaemia. With the intention of testing the appropriateness of the exponential distribution as the underlying distribution from which the Leukaemia data set was obtained, we consider the family of test statistics T n,λ given by (6), with λ ∈ , where we take for the same set of tuning parameter values used in the previous sections. As before, in the implementation ofλ u we use B = 100 bootstrap samples. Approximations of the previous tests p-values for this data set, are reported in Table 4. At level α = 0.05 the null hypothesis of exponentiality is not rejected by any of the considered test procedures. These results are compatible with that obtained by the test based on the fixed tuning parameter λ = λ HM = 1, which is one of the values for λ recommended in Henze and Meintanis [6]. The associated p-value is also reported in Table 4.

Conclusions
As any goodness-of-fit test, the tests considered in this paper have a preference for some finite-dimensional space of alternatives, and cannot pay equal attention to an infinite number of orthogonal alternatives (see [29]). However, the considered tests based on the data-dependent tuning parameter selectorsλ u andλ u have shown to be serious competitors for the recommended tests based on a fixed tuning parameter, and perhaps should be used -especiallyλ u because is less time-consuming thanλ u -in the absence of any information regarding the type of deviation from the null model.

Proofs
Proof of Theorem 2.1.: The first part of Theorem 2.1 follows from (11) as it implies that for all u ∈]0, 1[, where | | denotes the cardinality of . In order to prove that the probability P F (T n,λ u > c n,λ u (u)), is independent of F, for F ∈ F, it is enough to use the invariance properties of T n,λ andλ u .

Proof of Theorem 3.2.:
Let F / ∈ F, and take λ ∈ such that T n,λ p −→ +∞ under F. Proceeding as in the proof of Theorem 3.1, for u = uλ n,α we conclude that P F (T n,λ > c n,λ (u)) → 1, as n → ∞, and the stated result follows from the fact that P F (T n,λ u > c n,λ u (u)) ≥ P F (T n,λ > c n,λ (u)).

Proof of Theorem 4.1.:
The Epps and Pulley test statistics given by (5) are nonconstant whenever n ≥ 3, and well defined on D n , where D c n = {x ∈ R n : x 1 = · · · = x n }. Therefore, they are nonconstant and well defined with probability one, whenever n ≥ 3, and F is such that μ n F (D n ) = 1. This condition is fulfilled whenever F is absolutely continuous on R, which occurs under the null hypothesis of normality. Regarding the Henze and Meintanis test statistics given by (6), they are nonconstant whenever n ≥ 2, and well defined on the set D n = {x ∈ R n : x 1 , . . . , x n > 0}. Therefore, they are nonconstant and well defined with probability one, whenever n ≥ 2, and F is such that μ n F (D n ) = 1, where μ F denotes the probability distribution of F and μ n F is the product measure. This condition is satisfied whenever F is such that F(0) = 0, which is true under the null hypothesis of exponentiality.