RFS Advance Access originally published online on February 10, 2005
Review of Financial Studies 2005 18(2):351-416; doi:10.1093/rfs/hhi016
The Review of Financial Studies Vol. 18, No. 2 © 2005 The Society for Financial Studies; all rights reserved.
How Often to Sample a Continuous-Time Process in the Presence of Market Microstructure Noise
Yacine Aït-Sahalia
Princeton University and NBER
Per A. Mykland
The University of Chicago
Lan Zhang
Carnegie Mellon University
Address correspondence to: Yacine Aït-Sahalia, Bendheim Center for Finance, Princeton University, Princeton, NJ 08540, (609) 258-4015 or email: yacine{at}princeton.edu
 |
Abstract
|
|---|
In theory, the sum of squares of log returns sampled at high
frequency estimates their variance. When market microstructure
noise is present but unaccounted for, however, we show that
the optimal sampling frequency is finite and derives its closed-form
expression. But even with optimal sampling, using say 5-min
returns when transactions are recorded every second, a vast
amount of data is discarded, in contradiction to basic statistical
principles. We demonstrate that modeling the noise and using
all the data is a better solution, even if one misspecifies
the noise distribution. So the answer is: sample as often as
possible.
Over the past few years, price data sampled at very high frequency
have become increasingly available in the form of the Olsen
dataset of currency exchange rates or the TAQ database of NYSE
stocks. If such data were not affected by market microstructure
noise, the realized volatility of the process (i.e., the average
sum of squares of log-returns sampled at high frequency) would
estimate the returns' variance, as is well known. In fact, sampling
as often as possible would theoretically produce in the limit
a perfect estimate of that variance.
We start by asking whether it remains optimal to sample the price process at very high frequency in the presence of market microstructure noise, consistently with the basic statistical principle that, ceteris paribus, more data are preferred to less. We first show that, if noise is present but unaccounted for, then the optimal sampling frequency is finite, and we derive a closed-form formula for it. The intuition for this result is as follows. The volatility of the underlying efficient price process and the market microstructure noise tend to behave differently at different frequencies. Thinking in terms of signal-to-noise ratio, a log-return observed from transaction prices over a tiny time interval is mostly composed of market microstructure noise and brings little information regarding the volatility of the price process since the latter is (at least in the Brownian case) proportional to the time interval separating successive observations. As the time interval separating the two prices in the log-return increases, the amount of market microstructure noise remains constant, since each price is measured with error, while the informational content of volatility increases. Hence very high frequency data are mostly composed of market microstructure noise, while the volatility of the price process is more apparent in longer horizon returns. Running counter to this effect is the basic statistical principle mentioned above: in an idealized setting where the data are observed without error, sampling more frequently can be useful. What is the right balance to strike? What we show is that these two effects compensate each other and result in a finite optimal sampling frequency (in the root mean squared error sense) so that some time aggregation of the returns data is advisable.
By providing a quantitative answer to the question of how often one should sample, we hope to reduce the arbitrariness of the choices that have been made in the empirical literature using high frequency data: for example, using essentially the same Olsen exchange rate series, these somewhat ad hoc choices range from 5-min intervals [e.g., Andersen et al. (2001)
, Barndorff-Nielsen and Shephard (2002)
, Gençay et al. (2002)
to as long as 30 min [e.g., Andersen et al. (2003)
]. When calibrating our analysis to the amount of microstructure noise that has been reported in the literature, we demonstrate how the optimal sampling interval should be determined: for instance, depending upon the amount of microstructure noise relative to the variance of the underlying returns, the optimal sampling frequency varies from 4 min to 3 h, if 1 day's worth of data are used at a time. If a longer time period is used in the analysis, then the optimal sampling frequency can be considerably longer than these values.
But even if one determines the sampling frequency optimally, the fact remains that the empirical researcher is not making full use of the data at his disposal. For instance, suppose that we have available transaction records on a liquid stock, traded once every second. Over a typical 6.5 h day, we therefore start with 23,400 observations. If one decides to sample once every 5 minutes, then whether or not this is the optimal sampling frequency it amounts to retaining only 78 observations. Stated differently, one is throwing away 299 out of every 300 transactions. From a statistical perspective, this is unlikely to be the optimal solution, even though it is undoubtedly better than computing a volatility estimate using noisy squared log-returns sampled every second. Somehow, an optimal solution should make use of all the data, and this is where our analysis is headed next.
So, if one decides to account for the presence of the noise, how should one proceed? We show that modeling the noise term explicitly restores the first order statistical effect that sampling as often as possible is optimal. This will involve an estimator different from the simple sum of squared log-returns. Since we work within a fully parametric framework, likelihood is the key word. Hence we construct the likelihood function for the observed log-returns, which include microstructure noise. To do so, we must postulate a model for the noise term. We assume that the noise is Gaussian. In light of what we know from the sophisticated theoretical microstructure literature, this is likely to be overly simplistic and one may well be concerned about the effect(s) of this assumption. Could it be more harmful than useful? Surprisingly, we demonstrate that our likelihood correction, based on the Gaussianity of the noise, works even if one misspecifies the assumed distribution of the noise term. Specifically, if the econometrician assumes that the noise terms are normally distributed when in fact they are not, not only is it still optimal to sample as often as possible (unlike the result when no allowance is made for the presence of noise), but the estimator has the same variance as if the noise distribution had been correctly specified. This robustness result is, we think, a major argument in favor of incorporating the presence of the noise when estimating continuous time models with high frequency financial data, even if one is unsure about the true distribution of the noise term.
In other words, the answer to the question we pose in our title is "as often as possible," provided one accounts for the presence of the noise when designing the estimator (and we suggest maximum likelihood as a means of doing so). If one is unwilling to account for the noise, then one has to rely on the finite optimal sampling frequency we start our analysis with. However, we stress that while it is optimal if one insists upon using sums of squares of log-returns, this is not the best possible approach to estimate volatility given the complete high frequency dataset at hand.
In a companion paper [Zhang, Mykland, and Aït-Sahalia (2003)
], we study the corresponding nonparametric problem, where the volatility of the underlying price is a stochastic process, and nothing else is known about it, in particular no parametric structure. In that case, the object of interest is the integrated volatility of the process over a fixed time interval, such as a day, and we show how to estimate it using again all the data available (instead of sparse sampling at an arbitrarily lower frequency of, say, 5 min). Since the model is nonparametric, we no longer use a likelihood approach but instead propose a solution based on subsampling and averaging, which involves estimators constructed on two different time scales, and demonstrate that this again dominates sampling at a lower frequency, whether arbitrarily or optimally determined.
This article is organized as follows. We start by describing in Section 1 our reduced form setup and the underlying structural models that support it. We then review in Section 2 the base case where no noise is present, before analyzing in Section 3 the situation where the noise is ignored. In Section 4, we examine the concrete implications of this result for empirical work with high frequency data. Next, we show in Section 5 that accounting for the presence of the noise through the likelihood restores the optimality of high frequency sampling. Our robustness results are presented in Section 6 and interpreted in Section 7. In Section 8, we study the same questions when the observations are sampled at random time intervals, which are an essential feature of transaction-level data. We then turn to various extensions and relaxation of our assumptions in Section 9; we added a drift term, then serially correlated and cross-correlated noise respectively. In Section 10 concludes. All proofs are in the appendix.
 |
1. Setup
|
|---|
Our basic setup is as follows. We assume that the underlying
process of interest, typically the log-price of a security,
is a time-homogeneous diffusion on the real line
 | (1) |
where
X0 = 0,
Wt is a Brownian motion, µ(.,.)
is the drift function,
2 the diffusion coefficient, and

the
drift parameters, where

and

> 0. The parameter space
is an open and bounded set. As usual, the restriction that
is constant is without loss of generality since in the univariate
case a one-to-one transformation can always reduce a known specification

(
Xt) to that case. Also, as discussed in Aït-Sahalia and
Mykland (2003)

, the properties of parametric estimators in this
model are quite different depending upon whether we estimate

alone,
2 alone, or both parameters together. When the data
are noisy, the main effects that we describe are already present
in the simpler of these three cases, where
2 alone is estimated,
and so we focus on that case. Moreover, in the high frequency
context we have in mind, the diffusive component of (1) is of
order (
dt)
1/2 while the drift component is of order
dt only,
so the drift component is mathematically negligible at high
frequencies. This is validated empirically: including a drift
which actually deteriorates the performance of variance estimates
from high frequency data since the drift is estimated with a
large standard error. Not centering the log returns for the
purpose of variance estimation produces more accurate results
[see Merton (1980)

]. So we simplify the analysis one step further
by setting µ = 0, which we do until Section 9.1, where
we then show that adding a drift term does not alter our results.
In Section 9.4, we discuss the situation where the instantaneous
volatility

is stochastic.
But for now,
 | (2) |
Until Section 8, we
treat the case where our observations occur at equidistant time
intervals

, in which case the parameter
2 is estimated at time
T on the basis of
N + 1 discrete observations recorded at times
0 = 0,
1 =

,...,
N =
N 
= T. In Section 8, we let the sampling
intervals themselves be random variables, since this feature
is an essential characteristic of high frequency transaction
data.
The notion that the observed transaction price in high frequency financial data is the unobservable efficient price plus some noise component due to the imperfections of the trading process is a well established concept in the market microstructure literature [see, for instance Black (1986)
]. So, we depart from the inference setup previously studied [Aït-Sahalia and Mykland (2003)
] and we now assume that, instead of observing the process X at dates
i, we observe X with error:
 | (3) |
where
the
U
is are i.i.d. noise with mean zero and variance
a2 and
are independent of the
W process. In this context, we view
X as the efficient log-price, while the observed

is the transaction log-price. In an efficient market,
Xt is
the log of the expectation of the final value of the security
conditional on all publicly available information at time
t.
It corresponds to the log-price that would be, in effect, in
a perfect market with no trading imperfections, frictions, or
informational effects. The Brownian motion
W is the process
representing the arrival of new information, which in this idealized
setting is immediately impounded in
X.
By contrast, Ut summarizes the noise generated by the mechanics of the trading process. We view the source of noise as a diverse array of market microstructure effects, either information or non-information related, such as the presence of a bid-ask spread and the corresponding bounces, the differences in trade sizes and the corresponding differences in representativeness of the prices, the different informational content of price changes owing to informational asymmetries of traders, the gradual response of prices to a block trade, the strategic component of the order flow, inventory control effects, the discreteness of price changes in markets that are not decimalized, etc., all summarized into the term U. That these phenomena are real and important and this is an accepted fact in the market microstructure literature, both theoretical and empirical. One can in fact argue that these phenomena justify this literature.
We view Equation (3) as the simplest possible reduced form of structural market microstructure models. The efficient price process X is typically modeled as a random walk, that is, the discrete time equivalent of Equation (2). Our specification coincides with that of Hasbrouck (1993)
, who discusses the theoretical market microstructure underpinnings of such a model and argues that the parameter a is a summary measure of market quality. Structural market microstructure models do generate Equation (3). For instance, Roll (1984)
proposes a model where U is due entirely to the bid-ask spread. Harris (1990b)
notes that in practice there are sources of noise other than just the bid-ask spread, and studies their effect on the Roll model and its estimators.
Indeed, a disturbance U can also be generated by adverse selection effects as in Glosten (1987)
and Glosten and Harris (1988)
, where the spread has two components: one that is owing to monopoly power, clearing costs, inventory carrying costs, etc., as previously, and a second one that arises because of adverse selection whereby the specialist is concerned that the investor on the other side of the transaction has superior information. When asymmetric information is involved, the disturbance U would typically no longer be uncorrelated with the W process and would exhibit autocorrelation at the first order, which would complicate our analysis without fundamentally altering it: see Sections 9.2 and 9.3 where we relax the assumptions that the Us are serially uncorrelated and independent of the W process.
The situation where the measurement error is primarily due to the fact that transaction prices are multiples of a tick size (i.e.,
where
is the tick size and mi is the integer closest to X
i/
) can be modeled as a rounding off problem [see Gottlieb and Kalay (1985)
, Jacod (1996)
, Delattre and Jacod (1997)
]. The specification of the model in Harris (1990a)
combines both the rounding and bid-ask effects as the dual sources of the noise term U. Finally, structural models, such as that of Madhavan, Richardson, and Roomans (1997)
, also give rise to reduced forms where the observed transaction price
takes the form of an unobserved fundamental value plus error.
With Equation (3) as our basic data generating process, we now turn to the questions we address in this article: how often should one sample a continuous-time process when the data are subject to market microstructure noise, what are the implications of the noise for the estimation of the parameters of the X process, and how should one correct for the presence of the noise, allowing for the possibility that the econometrician misspecifies the assumed distribution of the noise term, and finally allowing for the sampling to occur at random points in time? We proceed from the simplest to the most complex situation by adding one extra layer of complexity at a time: Figure 1 shows the three sampling schemes we consider, starting with fixed sampling without market microstructure noise, then moving to fixed sampling with noise and concluding with an analysis of the situation where transaction prices are not only subject to microstructure noise but are also recorded at random time intervals.

View larger version (13K):
[in this window]
[in a new window]
|
Figure 1 Various discrete sampling modes no noise (Section 2), with noise (Sections 37) and randomly spaced with noise (Section 8)
|
|
 |
2. The Baseline Case: No Microstructure Noise
|
|---|
We start by briefly reviewing what would happen in the absence
of market microstructure noise, that is when
a = 0. With
X denoting
the log-price, the first differences of the observations are
the log-returns

,
i = 1,...,
N. The observations
Yi =

(
W
i+1
W
i) are then i.i.d.
N(0,
2
) so the likelihood
function is
 | (4) |
where
Y = (
Y1,...,
YN)'.
The maximum-likelihood estimator of
2 coincides with the discrete
approximation to the quadratic variation of the process
 | (5) |
which has the following exact small sample moments:
and the following asymptotic distribution
 | (6) |
where
 | (7) |
Thus selecting

as small as possible is optimal for the purpose of estimating
2.
 |
3. When the Observations are Noisy but the Noise is Ignored
|
|---|
Suppose now that market microstructure noise is present but
the presence of the
Us is ignored when estimating
2. In other
words, we use the log-likelihood function (4) even though the
true structure of the observed log-returns
Yis is given by an
MA(1) process since
 | (8) |
where
the
is are uncorrelated with mean zero and variance
2 (if the
Us are normally distributed, then the
is are i.i.d.). The relationship
to the original parametrization (
2,
a2) is given by
 | (9) |
 | (10) |
Equivalently, the inverse change of variable is given by
 | (11) |
 | (12) |
Two important properties of the log-returns Yis emerge from Equations (9) and (10). First, it is clear from Equation (9) that microstructure noise leads to spurious variance in observed log-returns,
2
+ 2a2 versus
2
. This is consistent with the predictions of theoretical microstructure models. For instance, Easley and O'Hara (1992)
develop a model linking the arrival of information, the timing of trades, and the resulting price process. In their model, the transaction price will be a biased representation of the efficient price process, with a variance that is both overstated and heteroskedastic due to the fact that transactions (hence the recording of an observation on the process
) occur at intervals that are time-varying. While our specification is too simple to capture the rich joint dynamics of price and sampling times predicted by their model, heteroskedasticity of the observed variance will also appear in our case once we allow for time variation of the sampling intervals (see Section 8).
In our model, the proportion of the total return variance that is market microstructure-induced is
 | (13) |
at
observation interval

. As

gets smaller,

gets closer to 1,
so that a larger proportion of the variance in the observed
log-return is driven by market microstructure frictions, and
correspondingly a lesser fraction reflects the volatility of
the underlying price process
X.
Second, Equation (10) implies that 1 <
< 0, so that log-returns are (negatively) autocorrelated with first order autocorrelation a2/(
2
+ 2a2) =
/2. It has been noted that market microstructure noise has the potential to explain the empirical autocorrelation of returns. For instance, in the simple Roll model, Ut = (s/2)Qt where s is the bid/ask spread and Qt, the order flow indicator, is a binomial variable that takes the values +1 and 1 with equal probability. Therefore, var[Ut] = a2 = s2/4. Since cov(Yi,Yi 1) = a2, the bid/ask spread can be recovered in this model as
where
=
2
is the first-order autocorrelation of returns. French and Roll (1986)
proposed to adjust variance estimates to control for such autocorrelation and Harris (1990b)
studied the resulting estimators. In Sias and Starks (1997)
, U arises because of the strategic trading of institutional investors which is then put forward as an explanation for the observed serial correlation of returns. Lo and MacKinlay (1990)
show that infrequent trading has implications for the variance and autocorrelations of returns. Other empirical patterns in high frequency financial data have been documented: leptokurtosis, deterministic patterns, and volatility clustering.
Our first result shows that the optimal sampling frequency is finite when noise is present but unaccounted for. The estimator
obtained from maximizing the misspecified log-likelihood function (4) is quadratic in the Yis [see Equation (5)]. In order to obtain its exact (i.e., small sample) variance, we need to calculate the fourth order cumulants of the Yis since
 | (14) |
(see, e.g., Section 2.3 of McCullagh (1987)
for definitions and properties of the cumulants). We have the
following lemma.
Lemma 1. The fourth cumulants of the log-returns are given by
 | (15) |
where s(i, j, k, l) denotes the number of indices among (i, j, k, l) that are equal to min
(i, j, k, l) and U denotes a generic random variable with the common distribution of the U
is. Its fourth cumulant is denoted cum
4[
U].
Now U has mean zero, so in terms of its moments
 | (16) |
In the special case where
U is normally distributed,
cum
4 [
U] = 0 and as a result of
Equation (14) the fourth cumulants
of the log-returns are all 0 (since
W is normal, the log-returns
are also normal in that case). If the distribution of
U is binomial
as in the simple bid/ask model described above, then cum
4 [
U]
=
s4/8; since in general
s will be a tiny percentage
of the asset price, say
s = 0.05%, the resulting cum
4 [
U] will
be very small.
We can now characterize the root mean squared error
of the estimator by the following theorem.
Theorem 1. In small samples (finite T), the bias and variance of the estimator
are given by
 | (17) |
 | (18) |
Its RMSE has a unique minimum in
which is reached at the optimal sampling interval
 | (19) |
As T grows, we have
 | (20) |
The trade-off between
bias and variance made explicit in
Equations (17) and
(18) is
not unlike the situation in nonparametric estimation with
1 playing the role of the bandwidth
h. A lower
h reduces the bias
but increases the variance, and the optimal choice of
h balances
the two effects.
Note that these are exact small sample expressions, valid for all T. Asymptotically in T,
, and hence the RMSE of the estimator is dominated by the bias term which is independent of T. And given the form of the bias (17), one would in fact want to select the largest
possible to minimize the bias (as opposed to the smallest one as in the no-noise case of Section 2). The rate at which
* should increase with T is given by Equation (20). Also, in the limit where the noise disappears (a
0 and cum4 [U]
0), the optimal sampling interval
* tends to 0.
How does a small departure from a normal distribution of the microstructure noise affect the optimal sampling frequency? The answer is that a small positive (resp. negative) departure of cum4[U] starting from the normal value of 0 leads to an increase (resp. decrease) in
*, since
 | (21) |
where

is the value of
* corresponding to Cum
4 [
U] = 0. And, of course, the full formula (19) can be used to
get the exact answer for any departure from normality instead
of the comparative static one.
Another interesting asymptotic situation occurs if one attempts to use higher and higher frequency data (
0, say sampled every minute) over a fixed time period (T fixed, say a day). Since the expressions in Theorem 1 are exact small sample ones, they can in particular be specialized to analyze this situation. With n = T/
, it follows from Equations (17) and (18) that
 | (22) |
 | (23) |
so

becomes an estimator of
E[
U2] =
a2 whose asymptotic variance
is
E[
U4]. Note in particular that

estimates the variance of the noise, which is essentially unrelated to
the object of interest
2. This type of asymptotics is relevant
in the stochastic volatility case we analyze in our companion
paper [Zhang, Mykland, and Aït-Sahalia (2003)

].
Our results also have implications for the two parallel tracks that have developed in the recent financial econometrics literature dealing with discretely observed continuous-time processes. One strand of the literature has argued that estimation methods should be robust to the potential issues arising in the presence of high frequency data and, consequently, be asymptotically valid without requiring that the sampling interval
separating successive observations tend to zero [see, e.g., Hansen and Scheinkman (1995)
, Aït-Sahalia (1996)
, Aït-Sahalia (2002)
]. Another strand of the literature has dispensed with that constraint, and the asymptotic validity of these methods requires that
tend to zero instead of or in addition to, an increasing length of time T over which these observations are recorded [see, e.g., Andersen et al. (2003)
, Bandi and Phillips (2003)
, Barndorff-Nielsen and Shephard (2002)
].
The first strand of literature has been informally warning about the potential dangers of using high frequency financial data without accounting for their inherent noise [see, e.g., Aït-Sahalia (1996
, p. 529)], and we propose a formal modelization of that phenomenon. The implications of our analysis are most important for the second strand of the literature, which is predicated on the use of high frequency data but does not account for the presence of market microstructure noise. Our results show that the properties of estimators based on the local sample path properties of the process (such as the quadratic variation to estimate
2) change dramatically in the presence of noise. Complementary to this are the results of Gloter and Jacod (2000)
which show that the presence of even increasingly negligible noise is sufficient to adversely affect the identification of
2.
 |
4. Concrete Implications for Empirical Work with High Frequency Data
|
|---|
The clear message of Theorem 1 for empirical researchers working
with high frequency financial data is that it may be optimal
to sample less frequently. As discussed in the Introduction,
authors have reduced their sampling frequency below that of
the actual record of observations in a somewhat ad hoc fashion,
with typical choices 5 min and up. Our analysis provides not
only a theoretical rationale for sampling less frequently, but
also gives a precise answer to the question of "how often one
should sample?" For that purpose, we need to calibrate the parameters
appearing in Theorem 1, namely

,

, cum
4[
U],

and
T. We assume
in this calibration exercise that the noise is Gaussian, in
which case cum
4[
U] = 0.
4.1 Stocks
We use existing studies in empirical market microstructure to calibrate the parameters. One such study is Madhavan, Richardson, and Roomans (1997)
, who estimated on the basis of a sample of 274 NYSE stocks that approximately 60% of the total variance of price changes is attributable to market microstructure effects (they report a range of values for
from 54% in the first half hour of trading to 65% in the last half hour, see their Table 4; they also decompose this total variance into components due to discreteness, asymmetric information, transaction costs and the interaction between these effects). Given that their sample contains an average of 15 transactions per hour (their Table 1), we have in our framework
 | (24) |
These
values imply from
Equation (13) that
a = 0.16% if we assume
a realistic value of

= 30% per year. (We do not use their reported
volatility number since they apparently averaged the variance
of price changes over the 274 stocks instead of the variance
of the returns. Since different stocks have different price
levels, the price variances across stocks are not directly comparable.
This does not affect the estimated fraction

, however, since
the price level scaling factor cancels out between the numerator
and the denominator.)
The magnitude of the effect is bound to vary by type of security,
market and time period. Hasbrouck (1993)

estimates the value
of
a to be 0.33%. Some authors have reported even larger effects.
Using a sample of NASDAQ stocks, Kaul and Nimalendran (1990)
estimate that about 50% of the daily variance of returns is
due to the bid-ask effect. With

= 40% (NASDAQ stocks have higher
volatility), the values
yield the value
a = 1.8%. Also on NASDAQ, Conrad, Kaul and Nimalendran (1991)
estimate that 11% of the variance of weekly returns (see their
Table 4, middle portfolio) is due to bid-ask effects. The values
imply that
a = 1.4%.
In Table 1, we compute the value of the optimal sampling interval
* implied by different combinations of sample length (T) and noise magnitude (a). The volatility of the efficient price process is held fixed at
= 30% in Panel (A), which is a realistic value for stocks. The numbers in the table show that the optimal sampling frequency can be substantially affected by even relatively small quantities of microstructure noise. For instance, using the value a = 0.15% calibrated from Madhavan, Richardson, and Roomans (1997)
, we find an optimal sampling interval of 22 minutes if the sampling length is 1 day; longer sample lengths lead to higher optimal sampling intervals. With the higher value of a = 0.3%, approximating the estimate from Hasbrouck (1993)
, the optimal sampling interval is 57 min. A lower value of the magnitude of the noise translates into a higher frequency: for instance,
* = 5 min if a = 0.05% and T = 1 day. Figure 2 displays the RMSE of the estimator as a function of
and T, using parameter values
= 30% and a = 0.15%. The figure illustrates the fact that deviations from the optimal choice of
lead to a substantial increase in the RMSE: for example, with T = 1 month, the RMSE more than doubles if, instead of the optimal
* = 1 h, one uses
= 15 min.
4.2 Currencies
Looking now at foreign exchange markets, empirical market microstructure
studies have quantified the magnitude of the bid-ask spread.
For example, Bessembinder (1994)

computes the average bid/ask
spread
s in the wholesale market for different currencies and
reports values of
s = 0.05% for the German mark, and 0.06% for
the Japanese yen (see Panel B of his
Table 2). We calculated
the corresponding numbers for the 19962002 period to
be 0.04% for the mark (followed by the euro) and 0.06% for the
yen. Emerging market currencies have higher spreads: for instance,
s = 0.12% for Korea and 0.10% for Brazil. During the same period,
the volatility of the exchange rate was

= 10% for the German
mark, 12% for the Japanese yen, 17% for Brazil and 18% for Korea.
In Panel B of
Table 1, we compute
* with

= 10%, a realistic
value for the euro and yen. As we noted above, if the sole source
of the noise were a bid/ask spread of size
s, then
a should
be set to
s/2. Therefore, Panel
B reports the values of
* for
values of
a ranging from 0.02% to 0.1%. For example, the dollar/euro
or dollar/yen exchange rates (calibrated to

= 10%,
a = 0.02%)
should be sampled every
* = 23 min if the overall sample length
is
T = 1 day, and every 1.1 h if
T = 1 year.
Furthermore, using the bid/ask spread alone as a proxy for all
microstructure frictions will lead, except in unusual circumstances,
to an understatement of the parameter
a, since variances are
additive. Thus, since
* is increasing in
a, one should interpret
the value of
* read off 1 on the row corresponding to
a =
s/2
as a lower bound for the optimal sampling interval.
4.3 Monte Carlo Evidence
To validate empirically these results, we perform Monte Carlo simulations. We simulate M = 10,000 samples of length T = 1 year of the process X, add microstructure noise U to generate the observations
, and then the log returns Y. We sample the log-returns at various intervals
ranging from 5 min to 1 week, and calculate the bias and variance of the estimator
over the M simulated paths. We then compare the results to the theoretical values given in Equations (17) and (18) of Theorem 1. The noise distribution is Gaussian,
= 30% and a = 0.15% the values we calibrated to stock returns data above. Table 2 shows that the theoretical values are in close agreement with the results of the Monte Carlo simulations.
Table 2 also illustrates the magnitude of the bias inherent in sampling at too high a frequency. While the value of
2 used to generate the data is 0.09, the expected value of the estimator when sampling every 5 min is 0.18, so on average the estimated quadratic variation is twice as big as it should be in this case.
 |
5. Incorporating Market Microstructure Noise Explicitly
|
|---|
So far we have stuck to the sum of squares of log-returns as
our estimator of volatility. We then showed that, for this estimator,
the optimal sampling frequency is finite. However, this implies
that one is discarding a large proportion of the high frequency
sample (299 out of every 300 observations in the example described
in the introduction), in order to mitigate the bias induced
by market microstructure noise. Next, we show that if we explicitly
incorporate the
Us into the likelihood function, then we are
back in a situation where the optimal sampling scheme consists
in sampling as often as possible that is, using all
the data available.
Specifying the likelihood function of the log-returns, while recognizing that they incorporate noise, requires that we take a stand on the distribution of the noise term. Suppose for now that the microstructure noise is normally distributed, an assumption whose effect we will investigate below in Section 6. Under this assumption, the likelihood function for the Ys is given by
 | (25) |
where the covariance matrix for the vector
Y = (
Y1,...,
YN)' is given by
2V, where
 | (26) |
Further,
 | (27) |
and, neglecting the end effects, an approximate
inverse of
V is the matrix

= [
ij]
i,j=1,..., N where
[see Durbin (1959)

]. The product
V
differs from
the identity matrix only on the first and last rows. The exact
inverse is
V1 = [
vij]
i,j = 1,...,N where
 | (28) |
[see Shaman (1969)

, Haddad (1995)

].
From the perspective of practical implementation, this estimator is nothing else than the MLE estimator of an MA(1) process with Gaussian errors: any existing computer routines for the MA(1) situation can, therefore, be applied [see e.g., Hamilton (1995
, Section 5.4)]. In particular, the likelihood function can be expressed in a computationally efficient form by triangularizing the matrix V, yielding the equivalent expression:
 | (29) |
where
and the

are obtained recursively as

and for
i = 2,...,
N:
This latter form
of the log-likelihood function involves only single sums as
opposed to double sums if one were to compute
Y'
V1Y by
brute force using the expression of
V1 given above.
We now compute the distribution of the MLE estimators of
2 and a2, which follows by the delta method from the classical result for the MA(1) estimators of
and
by the following proposition.
Proposition 1. When U is normally distributed, the MLE
is consistent and its asymptotic variance is given by
with
 | (30) |
Since

is increasing in

, it is optimal to sample
as often as possible. Further, since
 | (31) |
the
loss of efficiency relative to the case where no market microstructure
noise is present (and, if
a2 = 0 is not estimated,

as given in
Equation (7), or if
a2 = 0 is estimated,

is at order
1/2.
Figure 3 plots the asymptotic
variances of

as functions of

with and
without noise (the parameter values are again

= 30% and
a =
0.15%).
Figure 4 reports histograms of the distributions of

and

from 10,000 Monte Carlo simulations with the solid curve plotting the asymptotic
distribution of the estimator from Proposition 1. The sample
path is of length
T = 1 year, the parameter values are the same
as above, and the process is sampled every 5 min since
we are now accounting explicitly for the presence of noise,
there is no longer a reason to sample at lower frequencies.
Indeed, the figure documents the absence of bias and the good
agreement of the asymptotic distribution with the small sample
one.
 |
6. The Effect of Misspecifying the Distribution of the Microstructure Noise
|
|---|
We now study a situation where one attempts to incorporate the
presence of the
Us into the analysis, as in Section 5, but mistakenly
assumes a misspecified model for them. Specifically, we consider
the case where the
Us are assumed to be normally distributed
when in reality they have a different distribution. We still
suppose that the
Us are i.i.d. with mean zero and variance
a2.
Since the econometrician assumes the Us to have a normal distribution, inference is still done with the log-likelihood l(
2,a2), or equivalently l(
,
2) given in Equation (25), using Equations (9) and (10). This means that the scores
and
, or equivalently Equations (C.1) and (C.2) are used as moment functions (or "estimating equations"). Since the first order moments of the moment functions only depend on the second order moment structure of the log-returns (Y1,...,YN), which is unchanged by the absence of normality, the moment functions are unbiased under the true distribution of the Us:
 | (32) |
and similarly for

and

. Hence the estimator

based on these moment functions is consistent and asymptotically unbiased
(even though the likelihood function is misspecified).
The effect of misspecification, therefore, lies in the asymptotic variance matrix. By using the cumulants of the distribution of U, we express the asymptotic variance of these estimators in terms of deviations from normality. But as far as computing the actual estimator, nothing has changed relative to Section 5: we are still calculating the MLE for an MA(1) process with Gaussian errors and can apply exactly the same computational routine.
However, since the error distribution is potentially misspecified, one could expect the asymptotic distribution of the estimator to be altered. This does not happen, as far as
is concerned: see the following theorem.
Theorem 2. The estimators
obtained by maximizing the possibly misspecified log-likelihood function (25) are consistent and their asymptotic variance is given by
 | (33) |
where
is the asymptotic variance in the case where the distribution of U is normal, that is, the expression given in Proposition 1.
In other words, the asymptotic variance of
is identical to its expression if the Us had been normal. Therefore, the correction we proposed for the presence of market microstructure noise relying on the assumption that the noise is Gaussian is robust to misspecification of the error distribution.
Documenting the presence of the correction term through simulations presents a challenge. At the parameter values calibrated to be realistic, the order of magnitude of a is a few basis points, say a = 0.10% = 103. But if U if of order 103, cum4[U] which is of the same order as U4, is of order 1012. In other words, with a typical noise distribution, the correction term in Equation (33) will not be visible.
Nevertheless, to make it discernible, we use a distribution for U with the same calibrated standard deviation a as before, but a disproportionately large fourth cumulant. Such a distribution can be constructed by letting U =
T
where
> 0 is constant and T
is a Student t distribution with v degrees of freedom. T
has mean zero, finite variance as long as v > 2 and finite fourth moment (hence finite fourth cumulant) as long as v > 4. But as v approaches 4 from above,
tends to infinity. This allows us to produce an arbitrarily high value of cum4[U] while controlling for the magnitude of the variance. The specific expressions of a2 and cum4[U] for this choice of U are given by
 | (34) |
 | (35) |
Thus, we can select the two parameters (

,

)
to produce desired values of (
a2, cum
4[
U]). As before, we set
a = 0.15%. Then, given the form of the asymptotic variance matrix
Equation (33), we set cum
4[
U] so that

. This makes

by construction 50% larger than

. The resulting values of (

,

) from solving
Equations (34) and
(35) are

= 0.00115 and
v = 4.854. As above,
we set the other parameters to

= 30%,
T = 1 year, and

= 5
minutes.
Figure 5 reports histograms of the distributions of

and

from 10,000 Monte Carlo simulations. The solid curve plots the asymptotic distribution
of the estimator, given now by
Equation (33). There is again
good adequacy between the asymptotic and small sample distributions.
In particular, we note that as predicted by Theorem 2, the asymptotic
variance of

is unchanged relative to
Figure 4 while that of

is 50% larger. The small sample distribution of

appears unaffected by the non-Gaussianity of the noise; with a skewness of 0.07
and a kurtosis of 2.95, it is closely approximated by its asymptotic
Gaussian limit. The small sample distribution of

does exhibit some kurtosis (4.83), although not large relative
to that of the underlying noise distribution (the values of

and

imply a kurtosis for
U of 3 + 6/(

4) = 10). Similar
simulations but with a longer time span of
T = 5 years are even
closer to the Gaussian asymptotic limit: the kurtosis of the
small sample distribution of

goes down to 2.99.
 |
7. Robustness to Misspecification of the Noise Distribution
|
|---|
Going back to the theoretical aspects, the above Theorem 2 has
implications for the use of the Gaussian likelihood
l that go
beyond consistency, namely that this likelihood can also be
used to estimate the distribution of

under misspecification. With
l denoting the log-likelihood assuming
that the
Us are Gaussian, given in
Equation (25),

denote the observed information matrix in the original parameters
2 and
a2. Then
is the usual estimate
of asymptotic variance when the distribution is correctly specified
as Gaussian. Also note, however, that otherwise, so long as

is consistent,

is also a consistent estimate of the matrix

. Since this matrix coincides with

for all but the (
a2,
a2) term (see
Equation (33)), the asymptotic variance
of

is consistently estimated by

. The similar statement is true for the covariances,
but not, obviously, for the asymptotic variance of

.
In the likelihood context, the possibility of estimating the asymptotic variance by the observed information is due to the second Bartlett identity. For a general log likelihood l, if
and
(differentiation refers to the original parameters (
2, a2), not the transformed parameters (
2,
)) this identity says that
 | (36) |
It
implies that the asymptotic variance takes the form
 | (37) |
It is clear that
Equation (37) remains valid
if the second Bartlett identity holds only to first order, that
is,
 | (38) |
as
N

, for a general criterion
function
l which satisfies

.
However, in view of Theorem 2, Equation (38) cannot be satisfied. In fact, we show in Appendix E that
 | (39) |
where
 | (40) |
From
Equation (40), we see that
g 
0 whenever
2 > 0. This is consistent with the result in Theorem 2 that
the true asymptotic variance matrix,

does not coincide with the one for Gaussian noise,

. On the other hand, the 2
x 2 matrix
gg' is of rank 1, signaling
that there exist linear combinations that will cancel out the
first column of
S
D. From what we already know of the
form of the correction matrix,
D1 gives such a combination
that the asymptotic variance of the original parameters (
2,
a2) will have the property that its first column is not subject
to correction in the absence of normality.
A curious consequence of Equation (39) is that while the observed information can be used to estimate the asymptotic variance of
when a2 is not known, this is not the case when a2 is known. This is because the second Bartlett identity also fails to first order when considering a2 to be known, that is, when differentiating with respect to
2 only. Indeed, in that case we have from the upper left component in the matrix Equation (39):
which is
not
o(1) unless cum
4 [
U] = 0.
To make the connection between Theorem 2 and the second Bartlett identity, one needs to go to the log profile likelihood
 | (41) |
Obviously, maximizing the likelihood
l(
2,
a2)
is the same as maximizing

(
2). Thus one can think of
2 as being
estimated (when
2 is unknown) by maximizing the criterion function

(
2), or by solving

. Also, the observed profile information is related to the original observed information
by
 | (42) |
that is, the first (upper
left hand corner) component of the inverse observed information
in the original problem. We explain this in Appendix E, where
we also show that

. In view of Theorem 2,

can be used to estimate the asymptotic variance of

under the true (possibly non-Gaussian) distribution of the
Us, and so it must be that the criterion
function

satisfies
Equation (38), that is
 | (43) |
This
is indeed the case, as shown in Appendix E.
This phenomenon is related, although not identical, to what occurs in the context of quasi-likelihood [for comprehensive treatments of quasi-likelihood theory, see the books by McCullagh and Nelder (1989)
and Heyde (1997)
, and the references therein, and for early econometrics examples, see Macurdy (1982)
and White (1982)
]. In quasi-likelihood situations, one uses a possibly incorrectly specified score vector which is nevertheless required to satisfy the second Bartlett identity. What makes our situation unusual relative to quasi-likelihood is that the interest parameter
2 and the nuisance parameter a2 are entangled in the same estimating equations (
and
from the Gaussian likelihood) in such a way that the estimate of
2 depends, to first order, on whether a2 is known or not. This is unlike the typical development of quasi-likelihood, where the nuisance parameter separates out [see, e.g., McCullagh and Nelder (1989
, Table 9.1, p. 326)]. Thus only by going to the profile likelihood
can one make the usual comparison to quasi-likelihood.
 |
8. Randomly Spaced Sampling Intervals
|
|---|
One essential feature of transaction data in finance is that
the time that separates successive observations is random, or
at least time-varying. So, as in Aït-Sahalia and Mykland
(2003)

, we are led to consider the case where
i =
i
i1 are either deterministic and time-varying, or random
in which case we assume for simplicity that they are i.i.d.,
independent of the
W process. This assumption, while not completely
realistic [see Engle and Russell (1998)

for a discrete time
analysis of the autoregressive dependence of the times between
trades] allows us to make explicit calculations at the interface
between the continuous and discrete time scales. We denote by
NT the number of observations recorded by time
T.
NT is random
if the

s are. We also suppose that
U
i can be written
Ui, where
the
Ui are i.i.d. and independent of the
W process and the
is.
Thus, the observation noise is the same at all observation times,
whether random or nonrandom. If we define the
Yis as before,
in the first two lines of
Equation (8), though the MA(1) representation
is not valid in the same form.
We can do inference conditionally on the observed sampling times, in light of the fact that the likelihood function using all the available information is
where
ß are the parameters of the state process, that is
(
2,
a2), and

are the parameters of the sampling process, if
any (the density of the sampling intervals density
L(
NT,...,
1;

) may have its own nuisance parameters

, such as an unknown
arrival rate, but we assume that it does not depend on the parameters
ß of the state process). The corresponding log-likelihood
function is
 | (44) |
and since we only
care about ß, we only need to maximize the first term
in that sum.
We operate on the covariance matrix
of the log-returns Ys, now given by
 | (45) |
Note that in the
equally spaced case,

=
2V. But now
Y no longer follows an MA(1)
process in general. Furthermore, the time variation in
is gives
rise to heteroskedasticity as is clear from the diagonal elements
of

. This is consistent with the predictions of the model of
Easley and O'Hara (1992)

where the variance of the transaction
price process

is heteroskedastic as a result of the influence of the sampling times. In their model, the
sampling times are autocorrelated and correlated with the evolution
of the price process, factors we have assumed away here. However,
Aït-Sahalia and Mykland (2003)

show how to conduct likelihood
inference in such a situation.
The log-likelihood function is given by
 | (46) |
In order to calculate this log-likelihood
function in a computationally efficient manner, it is desirable
to avoid the "brute force" inversion of the
N x N matrix

. We
extend the method used in the MA(1) case (see
Equation (29))
as follows. By Theorem 5.3.1 in Dahlquist and Björck (1974)

,
and the development in the proof of their Theorem 5.4.3, we
can decompose

in the form

=
LDLT, where
L is a lower triangular
matrix whose diagonals are all 1 and
D is diagonal. To compute
the relevant quantities, their Example 5.4.3 shows that if one
writes
D =diag(
g1,...,
gn) and
 | (47) |
then
the
gks and
ks follow the recursion equation
g1 =
2
1 + 2
a2 and
for
i = 2,...,
N:
 | (48) |
Then, define

so that

. From

, it follows that

and, for
i = 2,...,
N:
And det(

) = det(
D) since
det(
L) = 1. Thus we have obtained a computationally simple form
for
Equation (46) that generalizes the MA(1) form in
Equation (29) to the case of non-identical sampling intervals:
 | (49) |
We can now turn to statistical inference using this likelihood function. As usual, the asymptotic variance of
is of the form
 | (50) |
To compute this
quantity, suppose in the following that ß
1 and ß
2 can represent either
2 or
a2. We start with:
Lemma 2. Fisher's Conditional Information is given by
 | (51) |
To compute the asymptotic distribution of the
MLE of (ß
1, ß
2), one would then need to
compute the inverse of

where
E
denotes
expectation taken over the law of the sampling intervals. From
Equation (51), and since the order of
E
and
2/

ß
2ß
1 can be interchanged, this requires the computation of
where from
Equation (48) the
gis are given by the
continuous fraction
and
in general
It, therefore, appears that
computing the expected value of ln(
gi) over the law of (
1,
2,...,
i) will be impractical.
8.1 Expansion around a fixed value of 
To continue further with the calculations, we propose to expand around a fixed value of
, namely
0 = E[
]. Specifically, suppose now that
 | (52) |
where

and
0 are nonrandom,
the
is are i.i.d. random variables with mean zero and finite
distribution. We will Taylor-expand the expressions above around

= 0, that is, around the non-random sampling case we have just
finished dealing with. Our expansion is one that is valid when
the randomness of the sampling intervals remains small, that
is, when var[
i] is small, or
o(1). Then we have
0 =
E[

] =
O(1)
and

. The natural scaling is to make the distribution of
i finite, that is, var[
i] =
O(1), so that
2 =
O(var[
i]) =
o(1). But any other choice would have no impact
on the result since var[
i] =
o(1) implies that the product
2var[
i]
is
o(1) and whenever we write remainder terms below they can
be expressed as
Op(
3
3) instead of just
O(
3). We keep the latter
notation for clarity given that we set
i =
Op(1). Furthermore,
for simplicity, we take the
is to be bounded.
We emphasize that the time increments or durations
i do not tend to zero length as
0. It is only the variability of the
is that goes to zero.
Denote by
0 the value of
when
is replaced by
0, and let
denote the matrix whose diagonal elements are the terms
0
i, and whose off-diagonal elements are zero. We obtain the following theorem.
Theorem 3. The MLE
is again consistent, this time with asymptotic variance
 | (53) |
where
and
with
In connection with the preceding result, we underline
that the quantity

is a limit as
T

, as
in
Equation (50).
Equation (53), therefore, is an expansion
in

after
T

.
Note that A(0) is the asymptotic variance matrix already present in Proposition 1, except that it is evaluated at
0 = E[
]. Note also that the second order correction term is proportional to var[
], and is therefore zero in the absence of sampling randomness. When that happens,
=
0 with probability one and the asymptotic variance of the estimator reduces to the leading term A(0), that is, to the result in the fixed sampling case given in Proposition 1.
8.2 Randomly spaced sampling intervals and misspecified microstructure noise
Suppose now, as in Section 6, that the Us are i.i.d., have mean zero and variance a2, but are otherwise not necessarily Gaussian. We adopt the same approach as in Section 6, namely to express the estimator's properties in terms of deviations from the deterministic and Gaussian case. The additional correction terms in the asymptotic variance are given in the following result.
Theorem 4. The asymptotic variance is given by
 | (54) |
where A(0) and
A(2) are given in the statement of Theorem 3 and
while
The term
A(0) is the base asymptotic variance of
the estimator, already present with fixed sampling and Gaussian
noise. The term cum
4[
U]
B(0) is the correction due to the misspecification
of the error distribution. These two terms are identical to
those present in Theorem 2. The terms proportional to
2 are
the further correction terms introduced by the randomness of
the sampling.
A(2) is the base correction term present even
with Gaussian noise in Theorem 3, and cum
4 [
U]
B(2) is the further
correction due to the sampling randomness. Both
A(2) and
B(2) are proportionalto var[

] and hence vanish in the absence of
sampling randomness.
 |
9. Extensions
|
|---|
In this section, we briefly sketch four extensions of our basic
model. First, we show that the introduction of a drift term
does not alter our conclusions. Then we examine the situation
where market microstructure noise is serially correlated; there,
we show that the insight of Theorem 1 remains valid, namely
that the optimal sampling frequency is finite. Third, we turn
to the case where the noise is correlated with the efficient
price signal. Fourth, we discuss what happens if volatility
is stochastic.
In a nutshell, each one of these assumptions can be relaxed without affecting our main conclusion, namely that the presence of the noise gives rise to a finite optimal sampling frequency. The second part of our analysis, dealing with likelihood corrections for microstructure noise, will not necessarily carry through unchanged if the assumptions are relaxed (for instance, there is not even a known likelihood function if volatility is stochastic, and the likelihood must be modified if the assumed variance-covariance structure of the noise is modified).
9.1 Presence of a drift coefficient
What happens to our conclusions when the underlying X process has a drift? We shall see in this case that the presence of the drift does not alter our earlier conclusions. As a simple example, consider linear drift, that is, replace Equation (2) with
 | (55) |
The contamination by market
microstructure noise is as before: the observed process is given
by
Equation (3).
As before, we first-difference to get the log-returns
. The likelihood function is now
where the covariance matrix is given in
Equation (45),
and where

= (
1,...,
N)'. If ß denotes either
2 or
a2, one obtains
so that

no matter whether the
Us are normally distributed
or have another distribution with mean 0 and variance
a2. In
particular,
 | (56) |
Now let

be the 3
x 3 matrix of expected second likelihood
derivatives. Let

. Similarly define

. As before, when the
Us have a normal distribution,
S =
D, and otherwise that is not the case. The asymptotic variance
matrix of the estimators is of the form avar =
E[

]
D1SD1.
Let D
2,a2 be the corresponding 2 x 2 matrix when estimation is carried out on
2 and a2 for known µ, and Dµ is the asymptotic information on µ for known
2 and a2. Similarly define S
2, a2 and avar
2, a2. Since D is block diagonal by Equation (56),
it follows that
Hence
 | (57) |
The asymptotic variance of

is thus the same as if µ were known, in other words, as
if µ = 0, which is the case that we focused on in all
the previous sections.
9.2 Serially correlated noise
We now examine what happens if we relax the assumption that the market microstructure noise is serially independent. Suppose that, instead of being i.i.d. with mean 0 and variance a2, the market microstructure noise follows
 | (58) |
where
b > 0,
c > 0 and
Z is a Brownian motion independent of
W.
U
|
U0 has a Gaussian distribution with mean e
b
U0 and
variance
c2/2
b(1 e
2b
). The unconditional mean
and variance of
U are 0 and
a2 =
c2/2
b. The main consequence
of this model is that the variance contributed by the noise
to a log-return observed over an interval of time

is now of
order
O(

), that is of the same order as the variance of the
efficient price process
2
, instead of being of order
O(1) as
previously. In other words, log-prices observed close together
have very highly correlated noise terms. Because of this feature,
this model for the microstructure noise would be less appropriate
if the primary source of the noise consists of bid-ask bounces.
In such a situation, the fact that a transaction is on the bid
or ask side has little predictive power for the next transaction,
or at least not enough to predict that two successive transactions
are on the same side with very high probability [although Choi,
Salandro, and Shastri (1988)

have argued that serial correlation
in the transaction type can be a component of the bid-ask spread,
and extended the model of Roll (1984)

to allow for it]. On the
other hand, the model (58) can better capture effects such as
the gradual adjustment of prices in response to a shock such
as a large trade. In practice, the noise term probably encompasses
both of these examples, resulting in a situation where the variance
contributed by the noise has both types of components, some
of order
O(1), some of lower orders in

.
The observed log-returns take the form
where the
wis are i.i.d.
N(0,
2
), the
uis are
independent of the
wis, so we have

, and they are Gaussian with mean zero and variance
 | (59) |
instead of 2
a2.
In addition, the uis are now serially correlated at all lags since
for
i
k. The first-order correlation
of the log-returns is now
instead of

.
The result analogous to Theorem 1 is as follows. If one ignores the presence of this type of serially correlated noise when estimating
2, then follows the theorem.
Theorem 5. In small samples (finite T), the RMSE of the estimator
is given by
 | (60) |
so that for large T, starting from a value of c2 in the limit where

0,
increasing
first reduces RMSE
[

].
Hence the optimal sampling frequency is finite.
One would expect this type of noise to be not nearly as bad as i.i.d. noise for the purpose of inferring
2 from high frequency data. Indeed, the variance of the noise is of the same order O(
) as the variance of the efficient price process. Thus log returns computed from transaction prices sampled close together are not subject to as much noise as previously (O(
) versus O(1)) and the squared bias ß2 of the estimator
no longer diverges to infinity as
0: it has the finite limit c4. Nevertheless, ß2 first decreases as
increases from 0, since
and
b2/


bc4 < 0 as

0. For large enough
T, this is sufficient
to generate a finite optimal sampling frequency.
To calibrate the parameter values b and c, we refer to the same empirical microstructure studies we mentioned in Section 4. We now have
as the proportion of total variance that is microstructure-induced; we match it to the numbers in Equation (24) from Madhavan, Richardson, and Roomans (1997)
. In their Table 5, they report the first-order correlation of price changes (hence returns) to be approximately
= 0.2 at their frequency of observation. Here
= cov(Yi, Yi1)/var[Yi]. If we match
= 0.6 and
= 0.2, with
= 30% as before, we obtain (after rounding) c = 0.5 and b = 3 x 104. Figure 6 displays the resulting RMSE of the estimator as a function of
and T. The overall picture is comparable to Figure 2.
As for the rest of the analysis of the article, dealing with
likelihood corrections for microstructure noise, the covariance
matrix of the log-returns,
2V in
Equation (26) should be replaced
by the matrix whose diagonal elements are
and
off-diagonal elements
i >
j are:
Having
modified the matrix
2V, the artificial "normal" distribution
that assumes i.i.d.
Us that are
N(0,
2) would no longer use
the correct second moment structure of the data. Thus we cannot
relate a priori the asymptotic variance of the estimator of
the estimator

to that of the i.i.d. normal case, as we did in Theorem 2.
9.3 Noise correlated with the price process
We have assumed so far that the U process was uncorrelated with the W process. Microstructure noise attributable to informational effects is likely to be correlated with the efficient price process, since it is generated by the response of market participants to information signals (i.e., to the efficient price process). This would be the case for instance in the bid-ask model with adverse selection of Glosten (1987)
. When the U process is no longer uncorrelated from the W process, the form of the variance matrix of the observed log-returns Y must be altered, replacing
2vij in Equation (26) with
where
ij is the Kronecker symbol.
The small sample properties of the misspecified MLE for
2 analogous to those computed in the independent case, including its RMSE, can be obtained from
Specific
expressions for all these quantities depend upon the assumptions
of the particular structural model under consideration: for
instance, in the Glosten (1987)

model (see his Proposition 6),
the
Us remain stationary, the transaction noise
U
i is uncorrelated
with the return noise during the previous observation period,
that is,
U
i1
U
i2, and the efficient return

(
W
i
W
i1) is also uncorrelated with the transaction
noises
U
i+1 and
U
i2. With these in hand, the analysis
of the RMSE and its minimum can then proceed as above. As for
the likelihood corrections for microstructure noise, the same
caveat as in serially correlated
U case applies: having modified
the matrix
2V, the artificial "normal" distribution would no
longer use the correct second moment structure of the data and
the likelihood must be modified accordingly.
9.4 Stochastic volatility
One important departure from our basic model is the case where volatility is stochastic. The observed log-returns are still generated by Equation (3). Now, however, the constant volatility assumption (2) is replaced by
 | (61) |
The
object of interest in much of the literature on high frequency
volatility estimation [see, e.g., Barndorff-Nielsen and Shephard
(2002)

, Andersen et al. (2003)]

is then the integral
 | (62) |
over a fixed time period [0,
T], or possibly
several such time periods. The estimation is based on observations
0 =
t0 <
t1 <

<
tn =
T, and asymptotic results are
obtained when max
ti 
0. The usual estimator for
Equation (62) is the "realized variance"
 | (63) |
In the context of stochastic volatility, ignoring market microstructure noise leads to an even more dangerous situation than when
is constant and T
. We show in the companion paper Zhang, Mykland, and Aït-Sahalia (2003)
that, after suitable scaling, the realized variance is a consistent and asymptotically normal estimator but of the quantity 2a2. This quantity has, in general, nothing to do with the object of interest Equation (62). Stated differently, market microstructure noise totally swamps the variance of the price signal at the level of the realized variance. To obtain a finite optimal sampling interval, one needs that a2
0 as n
, that is, the amount of noise must disappear asymptotically. For further developments on this topic, we refer to Zhang, Mykland, and Aït-Sahalia (2003)
.
 |
10. Conclusions
|
|---|
We showed that the presence of market microstructure noise makes
it optimal to sample less often than would otherwise be the
case in the absence of noise, and we determined accordingly
the optimal sampling frequency in closed-form.
We then addressed the issue of what to do about it, and showed that modeling the noise term explicitly restores the first order statistical effect that sampling as often as possible is optimal. We also demonstrated that this remains the case if one misspecifies the assumed distribution of the noise term. If the econometrician assumes that the noise terms are normally distributed when in fact they are not, not only is it still optimal to sample as often as possible, but the estimator has the same asymptotic variance as if the noise distribution had been correctly specified. This robustness result is, we think, a major argument in favor of incorporating the presence of the noise when estimating continuous time models with high frequency financial data, even if one is unsure about what is the true distribution of the noise term. Hence, the answer to the question we pose in our title is "as often as possible," provided one accounts for the presence of the noise when designing the estimator.
 |
Appendix A. Proof of Lemma 1
|
|---|
To calculate the fourth cumulant cum(
Yi,
Yj,
Yk,
Yl), recall
from
Equation (8) that the observed log-returns are
First, note that the
i are nonrandom, and
W is independent
of the
Us, and has Gaussian increments. Second, the cumulants
are multilinear, so
Out
of these terms, only the last is nonzero because
W has Gaussian
increments (so all cumulants of its increments of order greater
than two are zero), and is independent of the
Us (so all cumulants
involving increments of both
W and
U are also zero). Therefore,
If i = j = k = l, we have:
with
the second equality following from the independence of
U
i and
U
i1, and the third from the fact that the cumulant is
of even order.
If max(i, j, k, l) = min(i, j, k, l) + 1, two situations arise. Set m = min(i, j, k, l) and M = max(i, j, k, l). Also set s = s(i, j, k, l) = #{i, j, k, l = m}. If s is odd, say s = 1 with i = m, and j,k,l = M = m + 1, we get a term of the form
By permutation, the same situation arises if
s =
3. If instead
s is even, that is,
s = 2, then we have terms
of the form
Finally, if at least one
pair of indices in the quadruple (
i,
j,
k,
l) is more than one
integer apart, then
by independence
of the
Us.
 |
Appendix B. Proof of Theorem 1
|
|---|
Given the estimator (5) has the following expected value
The estimator's variance is
Applying
Lemma 1 in the special case where the first two indices and
the last two respectively are identical yields
 | (B.1) |
In the middle case, that is, whenever
j =
i + 1 or
j =
i 1, the number
s of indices that are equal
to the minimum index is always 2. Combining
Equation (B.1) with
Equation (14), we have
with
var[
Yi] and cov(
Yi,
Yi1) = cov(
Yi,
Yi+1) given in
Equations (9) and
(10), so that
since
N =
T/

. The expression for the RMSE follows from those for the
expected value and variance given in
Equations (17) and
(18):
 | (B.2) |
The optimal value
* of the sampling interval given in Equation (19) is obtained by minimizing
over
. The first order condition that arises from setting
to 0 is the cubic equation in
:
 | (B.3) |
We now show that
Equation (B.3) has a unique
positive root, and that it corresponds to a minimum of

. We are, therefore, looking for a real positive
root in

=
z to the cubic equation
 | (B.4) |
where
q > 0 and
p < 0 since from
Equation (16):
Using Vièta's change of variable from
z to
w given by
z =
w
p/(3
w) reduces, after multiplication
by
w3, the cubic to the quadratic equation
 | (B.5) |
in
the variable
y
w3.
Define the discriminant
The two roots
of
Equation (B.5) are
are real if
D 
0 (and distinct if
D > 0) and complex conjugates if
D <
0. Then the three roots of
Equation (B.4) are
 | (B.6) |
[see, e.g., Abramowitz and Stegun
(1972

, Section 3.8.2)]. If
D > 0, the two roots in
y are
both real and positive because
p < 0 and
q > 0 imply
and hence of the three roots given in
Equation (B.6),
z1 is real and positive and
z2 and
z3 are complex conjugates.
If
D = 0, then
y1 =
y2 =
q/2 > 0 and the three roots are
real (two of which are identical) and given by
Of these,
z1 > 0 and
z2 =
z3 < 0. If
D <
0, the three roots are distinct and real because
so
and therefore
so that
Only
z1 is positive because
q > 0 and (
D)
1/2 > 0 imply
that 0 <

<

/2. Therefore cos(

/3) > 0, so
z1 > 0;
sin(

/3) > 0, so
z3 < 0; and
so
z2 < 0.
Thus Equation (B.4) has exactly one root that is positive, and it is given by z1 in Equation (B.6). Since
is of the form
with
a3 > 0, it tends to +

when

tends to +

. Therefore, that single
positive root corresponds to a minimum of

which is reached at
Replacing
q and
p by their values in the expression above yields
Equation (19).
As shown above, if the expression inside the square root
in
formula (19) is negative, the resulting
* is still a positive
real number.
 |
Appendix C. Proof of Proposition 1
|
|---|
The result follows from an application of the delta method to
the known properties of the MLE estimator of an MA(1) process
[Hamilton (1995

, Section 5.4)], as follows. Because we re-use
these calculations below in the proof of Theorem 2 (whose result
cannot be inferred from known MA(1) properties), we recall some
of the expressions of the score vector of the MA(1) likelihood.
The partial derivatives of the log-likelihood function (25)
have the form
 | (C.1) |
and
 | (C.2) |
so that the MLE for
2 is
 | (C.3) |
At the true parameters, the expected value of the score vector is zero: E[i
] = E[i
2] = 0. Hence it follows from Equation (C.1) that
thus as
N
Similarly,
it follows from
Equation (C.2) that
Turning now to Fisher's information, we have
 | (C.4) |
whence the asymptotic variance of

is 2
4
. We also have that
 | (C.5) |
whence
the asymptotic covariance of

and

is zero.
To evaluate
, we compute
 | (C.6) |
and evaluate both terms. For the first term
in
Equation (C.6), we have from
Equation (27):
 | (C.7) |
For the second term, we have for
any non-random
N x N matrix
Q:
where
Tr denotes the matrix trace, which satisfies Tr[
AB] = Tr[
BA].
Therefore
 | (C.8) |
Combining Equations (C.7) and (C.8) with Equation (C.6), it follows that
 | (C.9) |
In light of that
and
Equation (C.5), the asymptotic variance of

is the same as in the
2 known case, that is, (1
2)

(which
of course confirms the result of Durbin (1959)

for this parameter).
We can now retrieve the asymptotic covariance matrix for the original parameters (
2, a2) from that of the parameters (
2,
). This follows from the delta method applied to the change of variable [Equations (9) and (10)]:
 | (C.10) |
Hence
where
 |
Appendix D. Proof of Theorem 2
|
|---|
We have that
 | (D.1) |
where
"true" denotes the true distribution of the
Ys, not the incorrectly
specified normal distribution, and Cum denotes the cumulants
given in Lemma 1. The last transition is because
since
Y has mean zero [see, e.g., McCullagh
(1987

, Section 2.3)]. The need for permutation goes away due
to the summing over all indices (
i,
j,
k,
l), and since
V1 = [
vij] is symmetric.
When looking at Equation (D.1), note that cumnormal(Yi, Yj, Yk, Yl) = 0, where "normal" denotes a Normal distribution with the same first and second order moments as the true distribution. That is, if the Ys were normal we would have
Also,
since the covariance structure does not depend on Gaussianity,
cov
true(
Yi,
Yj) =cov
normal(
Yi,
Yj). Next, we have
 | (D.2) |
with the last equality following from the fact
that

depends only on the second moments of the
Ys. (Note that in general

because the likelihood may be misspecified.) Thus, it follows from
Equation (D.1) that
 | (D.3) |
It follows similarly that
 | (D.4) |
and
 | (D.5) |
We now need to evaluate the sums
that appear on the right-hand sides of
Equations (D.3)
(D.5).
Consider two generic symmetric
N x N matrices [
i, j] and [
i, j]. We are interested in expressions of the form
 | (D.6) |
It follows that if we set
 | (D.7) |
then

(

,

) = cum
4[
U]

(

,

) where
 | (D.8) |
If the two matrices [
i,j] and
[
i,j] satisfy the following reversibility property:
N+1i,N+1j =
i,j and
N+1i,N+1j =
i,j (so long as one is within
the index set), then
Equation (D.8) simplifies to:
This is the case for
V1 and
its derivative
V1/


, as can be seen from the expression
for
vi,j given in
Equation (28), and consequently for
vi,j/


.
Therefore, if we wish to compute the sums in Equations (D.3)
(D.5) we need to find the three quantities
(
v/
,v),
(
v/
,
v/
), and
(v,v), respectively. All are of order O(N), and only the first term is needed. Replacing the terms vi,j and
vi,j/
by their expressions from Equation (28), we obtain:
 | (D.9) |
 | (D.10) |
 | (D.11) |
The
asymptotic variance of the estimator

obtained by maximizing the (incorrectly-specified) log-likelihood function
(25) that assumes Gaussianity of the
Us is given by
where, from
Equations (C.4),
(C.5), and (C.9) we
have
 | (D.12) |
and, in
light of
Equations (D.3)
(D.5),
 | (D.13) |
where
 | (D.14) |
from the expressions just computed.
It follows that
where
Id denotes the identity matrix and
so that
By applying the delta method to change the parametrization,
we now recover the asymptotic variance of the estimates of the
original parameters:
 |
Appendix E. Derivations for Section 7
|
|---|
To see
Equation (39), let "orig" (E.7) denote parametrization
in (and differentiation with respect to) the original parameters
2 and
a2, while "transf" denotes parametrization and differentiation
in
2 and

, and
finv denotes the inverse of the change of variable
function defined in (C.10), namely
 | (E.1) |
and
finv its Jacobian matrix. Then, from

, we have
where

is a 2
x 2 matrix whose terms are linear in

and the second partial derivatives of
finv. Now

, and so

from which it follows that
 | (E.2) |
with

given in
Equation (D.12). Similarly,

and so
 | (E.3) |
with the second equality following
from the expression for
Stransf given in
Equation (D.13).
To complete the calculation, note from Equation (D.14) that
where
Thus
 | (E.4) |
where
 | (E.5) |
which
is the result (40). Inserting Equation (E.4) into Equation (E.3)
yields the result (39).
For the profile likelihood
, let
denote the maximizer of l(
2, a2) for given
2. Thus by definition
. From now on, all differentiation takes place with respect to the original parameters, and we will omit the subscript "orig" in what follows. Since
, it follows that
so that
 | (E.6) |
The profile score then follows
 | (E.7) |
so that at the true value of (
2,
a2),
 | (E.8) |
since

and
as sums of random variables
with expected value zero, so that
while
also as a sum of random variables with expected
value zero.
Therefore
since

. In particular,

as claimed.
Further differentiating Equation (E.7), one obtains
from
Equation (E.6). Evaluated at

, one gets

and

, and so
 | (E.9) |
where

is the upper left element of the matrix

. Thus
Equation (42) is valid.
Alternatively, we can see that the profile likelihood
satisfies the Bartlett identity to first order, that is, Equation (43). Note that by Equation (E.8),
so
that
by invoking
Equation (39).
Continuing the calculation,
 | (E.10) |
since
from the expressions for
Dorig and
gorig in
Equations (E.2) and
(E.5) we have
 | (E.11) |
Then by
Equation (E.9) and the law of large numbers, we have
 | (E.12) |
and
Equation (43) follows from combining
Equation (E.10) with
Equation (E.12).
 |
Appendix F. Proof of Lemma 2
|
|---|

1
Id implies that
 | (F.1) |
and,
since

is linear in the parameters
2 and
a2 (see
Equation (45))
we have
 | (F.2) |
so that
 | (F.3) |
In the rest of this lemma, let expectations be conditional on the
s. We use the notation E[·|
] as a shortcut for E[·|
N,...,
1]. At the true value of the parameter vector, we have,
 | (F.4) |
with the second equality following
from
Equation (46). Then, for any nonrandom
Q, we have
 | (F.5) |
This can be applied to
Q that depends on the

s, even when they are random, because the expected value is
conditional on the

s. Therefore, it follows from Equation (F.4)
that
 | (F.6) |
with the last equality
following from
Equation (F.1) and so
 | (F.7) |
again because of
Equation (F.2).
In light of Equation (46), the expected information (conditional on the
s) is given by
Then,
with the first equality following from
Equation (F.5) applied to
Q =
2
1/

ß
2ß
1,
the second from
Equation (F.3), and the third from the fact
that Tr[
AB] = Tr[
BA]. It follows that
 |
Appendix G. Proof of Theorem 3
|
|---|
In light of
Equations (45) and
(52),
 | (G.1) |
from
which it follows that
 | (G.2) |
since
Also,
Therefore, recalling
Equation (F.6),
we have
 | (G.3) |
We
now consider the behavior as
N

of the terms up to order
2.
The remainder term is handled similarly. Two things can be determined
from the above expansion. Since the
is are i.i.d. with mean
0,
E[

] = 0, and so, taking unconditional expectations with respect
to the law of the
is, we obtain that the coefficient of order

is
Similarly, the coefficient of order
2 is
The matrix

has the following terms
and since
E[
i
j] =
ij var[

] (where
ij denotes the Kronecker symbol), it follows that
 | (G.4) |
where

is the diagonal matrix formed with the diagonal elements of

. From this, we obtain that
 | (G.5) |
To calculate
, in light of Equation (51), we need to differentiate E[
ln det
/
ß1] with respect to ß2. Indeed
where we can
interchange the unconditional expectation and the differentiation
with respect to ß
2 because the unconditional expectation
is taken with respect to the law of the
is, which is independent
of the ß parameters (i.e.,
2 and
a2). Therefore, differentiating
Equation (G.5) with respect to ß
2 will produce the
result we need. (The reader may wonder why we take the expected
value before differentiating, rather than the other way around.
As just discussed, the results are identical. However, it turns
out that taking expectations first reduces the computational
burden quite substantially.)
Combining with Equation (G.5), we therefore have
 | (G.6) |
It is useful now to introduce
the same transformed parameters (
2,

) as in previous sections
and write
0 =
2V with the parameters and
V defined as in
Equations (9),
(10), and
(26), except that

is replaced by
0 in these
expressions. To compute
(0), we start with
 | (G.7) |
with

2/

ß
1 and


/

ß
1 to be computed from
Equations (11) and
(12). If
Id denotes the
identity matrix and
J the matrix with 1 on the infra and supra-diagonal
lines and 0 everywhere else, we have
V =
2Id +
J, so that
V/

= 2
Id+
J. Therefore
 | (G.8) |
Therefore, the first term in Equation (G.7) is O(1) while the second term is O(N) and hence
This holds
also for the partial derivative of
Equation (G.7) with respect
to ß
2. Indeed, given the form of
Equation (G.8), we
have that
since the remainder term in
Equation (G.8) is of the form
p(
N)
q(N), where
p and
q are polynomials
in
N or order greater than or equal to 0 and 1, respectively,
whose differentiation with respect to

will produce terms that
are of order
o(
N). Thus it follows that
 | (G.9) |
Writing the result in matrix form, where
the (1,1) element corresponds to (ß
1, ß
2)
= (
2,
2), the (1,2) and (2,1) elements to (ß
1, ß
2)
= (
2,
a2) and the (2,2) element to (ß
1, ß
2)
= (
a2,
a2), and computing the partial derivatives in
Equation (G.9),
we have
 | (G.10) |
As for the coefficient of order
2, that is
(2) in Equation (G.6), define
 | (G.11) |
so that
We have
Next,
we compute separately
and
Therefore
which can be differentiated
with respect to ß
2 to produce


/

ß
2. As above,
differentiation of the remainder term
o(
N) still produces a
o(
N) term because of the structure of the terms there (they
are again of the form
p(
N)
q(N)).
Note that an alternative expression for
can be obtained as follows. Going back to the definition (G.11),
 | (G.12) |
the first trace becomes
so that we have
where
the calculation of Tr[
V1diag[
V1]] is as before,
and where the
o(
N) term is a sum of terms of the form
p(
N)
q(N) as discussed above. From this one can interchange differentiation
and the
o(
N) term, yielding the final equality above.
Therefore
 | (G.13) |
Writing
the result in matrix form and calculating the partial derivatives,
we obtain
 | (G.14) |
Putting it all together, we have obtained
 | (G.15) |
where
 | (G.16) |
 | (G.17) |
The asymptotic variance of the maximum-likelihood estimators
is, therefore, given by
where the final results for
A(0) =
0[
F(0)]
1 and
A(2) =
0[
F(0)]
1F(2)[
F(0)]
1, obtained
by replacing
F(0) and
F(2) by their expressions in
Equation (G.15),
are given in the statement of the theorem.
 |
Appendix H. Proof of Theorem 4
|
|---|
It follows as in
Equations (D.3)
(D.5) that
 | (H.1) |
since cum
true(
Yi,
Yj,
Yk,
Yl|

)
= 2, ±1, or 0,
x cum
true(
U), as in
Equation (15), and
with

defined in
Equation (D.8). Taking now unconditional expectations,
we have
 | (H.2) |
with
the first and third equalities following from the fact that

.
Since
and consequently
have
been found in the previous subsection (see
Equation (G.15)),
what we need to do to obtain

is to calculate
With
1 given by Equation (G.2), we have for i = 1,2
and, therefore, by bilinearity of

we have
 | (H.3) |
where the "[2]" refers to the
sum over the two terms where ß
1 and ß
2 are
permuted.
The first (and leading) term in Equation (H.3),
corresponds to the equally spaced,
misspecified noise distribution, situation studied in Section
6.
The second term, linear in
, is zero since
with the first equality following from the bilinearity
of

, the second from the fact that the unconditional expectation
over the
is does not depend on the ß parameters, so
expectation and differentiation with respect to ß
2 can be interchanged, and the third equality from the fact that
E [

] = 0.
To calculate the third term in Equation (H.3), the first of two that are quadratic in
, note that
 | (H.4) |
with the second equality obtained by replacing

with its value given in
Equation (G.4),
and the third by recalling that
0 =
2 V. The elements (
i,
j)
of the two arguments of

in
Equation (H.4) are
and
from which

in
Equation (H.4) can be
evaluated through the sum given in
Equation (D.8).
Summing these terms, we obtain
where
The fourth and last term in Equation (H.3), also quadratic in
,
is obtained by first expressing
in its sum form and then taking expectations term
by term. Letting now
we recall our definition
of

(

,

) given in
Equation (D.8) whose unconditional expected
value (over the
is, i.e., over

) we now need to evaluate in
order to obtain
2.
We are thus led to consider four-index tensors
ijkl and to define
 | (H.5) |
where
ijkl is symmetric in the
first two and the last two indices, respectively, that is,
ijkl =
jikl and
ijkl =
ijlk. In terms of our definition of

in
Equation (D.8),
it should be noted that

when one takes
ijkl =
i,j
k,l. The expression we seek is, therefore,
 | (H.6) |
where
ijkl is taken to be the following expected
value
with the third equality
following from the interchangeability of unconditional expectations
and differentiation with respect to ß, and the fourth
from the fact that
E[
rs
tu]

0 only when
r =
s =
t =
u, and
Thus we have
 | (H.7) |
and
Summing these terms, we obtain
where
Putting it all together, we have
Finally, the asymptotic variance of the estimator
is given by
 | (H.8) |
where
is given by the expression in the correctly
specified case (G.15), with
F(0) and
F(2) given in
Equations (G.16) and
(G.17), respectively. Also, in light of
Equation (H.1),
we have
where
Since, from
Equation (H.3), we have
it follows that
(0) is the matrix with entries

, that is,
and
with
It follows from
Equation (H.8) that
where
is the
result given in Theorem 3, namely
Equation (53).
The correction term due to the misspecification of the error distribution is determined by cum4[U] times
The asymptotic variance is then given by
with the terms
A(0),
A(2),
B(0) and
B(2) given in
the statement of the theorem.
 |
Appendix I. Proof of Theorem 5
|
|---|
From
it follows that the estimator (5) has the following expected value
 | (I.1) |
The estimator's variance is
Since
the
Yis are normal with mean zero,
and
for
i >
j
since
Now we have
so that
and consequently
 | (I.2) |
with
N =
T/

. The RMSE expression follows from
Equations (I.1) and
(I.2). As in Theorem 1, these are exact small sample expressions,
valid for all (
T,

).
 |
Footnotes
|
|---|
We are grateful for comments and suggestions from the editor,
Maureen O'Hara, and two anonymous referees, as well as seminar
participants at Berkeley, Harvard, NYU, MIT, Stanford, the Econometric
Society and the Joint Statistical Meetings. Financial support
from the NSF under grants SBR-0111140 (Aït-Sahalia), DMS-0204639
(Mykland and Zhang), and the NIH under grant RO1 AG023141-01
(Zhang) is also gratefully acknowledged.
 |
References
|
|---|
Abramowitz, M., and I. A. Stegun, 1972, Handbook of Mathematical Functions, Dover, New York.
Aït-Sahalia, Y., 1996, "Nonparametric Pricing of Interest Rate Derivative Securities," Econometrica, 64, 527560.[CrossRef][Web of Science]
Aït-Sahalia, Y., 2002, "Maximum-Likelihood Estimation of Discretely-Sampled Diffusions: A Closed-Form Approximation Approach," Econometrica, 70, 223262.[CrossRef]
Aït-Sahalia, Y., and P. A. Mykland, 2003, "The Effects of Random and Discrete Sampling When Estimating Continuous-Time Diffusions," Econometrica, 71, 483549.[CrossRef][Web of Science]
Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys, 2001, "The Distribution of Exchange Rate Realized Volatility," Journal of the American Statistical Association, 96, 4255.[CrossRef]
Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys, 2003, "Modeling and Forecasting Realized Volatility," Econometrica, 71, 579625.[CrossRef]
Bandi, F. M., and P. C. B. Phillips, 2003, "Fully Nonparametric Estimation of Scalar Diffusion Models," Econometrica, 71, 241283.[CrossRef][Web of Science]
Barndorff-Nielsen, O. E., and N. Shephard, 2002, "Econometric Analysis of Realized Volatility and Its Use in Estimating Stochastic Volatility Models," Journal of the Royal Statistical Society B, 64, 253280.[CrossRef]
Bessembinder, H., 1994, "Bid-Ask Spreads in the Interbank Foreign Exchange Markets," Journal of Financial Economics, 35, 317348.[CrossRef]
Black, F., 1986, "Noise," Journal of Finance, 41, 529543.[CrossRef]
Choi, J. Y., D. Salandro, and K. Shastri, 1988, "On the Estimation of Bid-Ask Spreads: Theory and Evidence," The Journal of Financial and Quantitative Analysis, 23, 219230.
Conrad, J., G. Kaul, and M. Nimalendran, 1991, "Components of Short-Horizon Individual Security Returns," Journal of Financial Economics, 29, 365384.[CrossRef]
Dahlquist, G., and A. Björck, 1974, Numerical Methods, Prentice-Hall Series in Automatic Computation, New York.
Delattre, S., and J. Jacod, 1997, "A Central Limit Theorem for Normalized Functions of the Increments of a Diffusion Process, in the Presence of Round-Off Errors," Bernoulli, 3, 128.
Durbin, J., 1959, "Efficient Estimation of Parameters in Moving-Average Models," Biometrika, 46, 306316.[Free Full Text]
Easley, D., and M. O'Hara, 1992, "Time and the Process of Security Price Adjustment," Journal of Finance, 47, 577605.[CrossRef][Web of Science]
Engle, R. F., and J. R. Russell, 1998, "Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data," Econometrica, 66, 11271162.[CrossRef]
French, K., and R. Roll, 1986, "Stock Return Variances: The Arrival of Information and the Reaction of Traders," Journal of Financial Economics, 17, 526.
Gençay, R., G. Ballocchi, M. Dacorogna, R. Olsen, and O. Pictet, 2002, "Real-Time Trading Models and the Statistical Properties of Foreign Exchange Rates," International Economic Review, 43, 463491.[CrossRef]
Glosten, L. R., 1987, "Components of the Bid-Ask Spread and the Statistical Properties of Transaction Prices," Journal of Finance, 42, 12931307.
Glosten, L. R., and L. E. Harris, 1988, "Estimating the Components of the Bid/Ask Spread," Journal of Financial Economics, 21, 123142.
Gloter, A., and J. Jacod, 2000, "Diffusions with Measurement Errors: I Local Asymptotic Normality and II Optimal Estimators," working paper, Université de Paris VI.
Gottlieb, G., and A. Kalay, 1985, "Implications of the Discreteness of Observed Stock Prices," Journal of Finance, 40, 135153.
Haddad, J., 1995, "On the Closed Form of the Likelihood Function of the First Order Moving Average Model," Biometrika, 82, 232234.[Abstract/Free Full Text]
Hamilton, J. D., 1995, Time Series Analysis, Princeton University Press, Princeton, NJ.
Hansen, L. P., and J. A. Scheinkman, 1995, "Back to the Future: Generating Moment Implications for Continuous-Time Markov Processes," Econometrica, 63, 767804.[CrossRef][Web of Science]
Harris, L., 1990a, "Estimation of Stock Price Variances and Serial Covariances from Discrete Observations," Journal of Financial and Quantitative Analysis, 25, 291306.
Harris, L., 1990b, "Statistical Properties of the Roll Serial Covariance Bid/Ask Spread Estimator," Journal of Finance, 45, 579590.
Hasbrouck, J., 1993, "Assessing the Quality of a Security Market: A New Approach to Transaction-Cost Measurement," Review of Financial Studies, 6, 191212.[Abstract/Free Full Text]
Heyde, C. C., 1997, Quasi-Likelihood and its Application, Springer-Verlag, New York.
Jacod, J., 1996, "La Variation Quadratique du Brownien en Présence d'Erreurs d'Arrondi," Astérisque, 236, 155162.
Kaul, G., and M. Nimalendran, 1990, "Price Reversals: Bid-Ask Errors or Market Overreaction," Journal of Financial Economics, 28, 6793.
Lo, A. W., and A. C. MacKinlay, 1990, "An Econometric Analysis of Nonsynchronous Trading," Journal of Econometrics, 45, 181211.[CrossRef]
Macurdy, T. E., 1982, "The Use of Time Series Processes to Model the Error Structure of Earnings in a Longitudinal Data Analysis," Journal of Econometrics, 18, 83114.[CrossRef]
Madhavan, A., M. Richardson, and M. Roomans, 1997, "Why Do Security Prices Change?," Review of Financial Studies, 10, 10351064.[Abstract/Free Full Text]
McCullagh, P., 1987, Tensor Methods in Statistics, Chapman and Hall, London, UK.
McCullagh, P., and J. Nelder, 1989, Generalized Linear Models (2d ed.). Chapman and Hall, London, UK.
Merton, R. C., 1980, "On Estimating the Expected Return on the Market: An Exploratory Investigation," Journal of Financial Economics, 8, 323361.
Roll, R., 1984, "A Simple Model of the Implicit Bid-Ask Spread in an Efficient Market," Journal of Finance, 39, 11271139.[CrossRef]
Shaman, P., 1969, "On the Inverse of the Covariance Matrix of a First Order Moving Average," Biometrika, 56, 595600.[Abstract/Free Full Text]
Sias, R. W., and L. T. Starks, 1997, "Return Autocorrelation and Institutional Investors," Journal of Financial Economics, 46, 103131.
White, H., 1982, "Maximum Likelihood Estimation of Misspecified Models," Econometrica, 50, 125.[CrossRef][Web of Science]
Zhang, L., P. A. Mykland, and Y. Aït-Sahalia, 2003, "A Tale of Two Time Scales: Determining Integrated Volatility with Noisy High-Frequency Data," forthcoming in the Journal of the American Statistical Association.

CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:

|
 |

|
 |
 
B. Y. Zhang, H. Zhou, and H. Zhu
Explaining Credit Default Swap Spreads with the Equity Volatility and Jump Risks of Individual Firms
Rev. Financ. Stud.,
December 1, 2009;
22(12):
5099 - 5131.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Czado and S. Haug
An ACD-ECOGARCH(1,1) Model
J. Financial Econometrics,
November 4, 2009;
(2009)
nbp023v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Bannouh, D. van Dijk, and M. Martens
Range-Based Covariance Estimation Using High-Frequency Data: The Realized Co-Range
J. Financial Econometrics,
October 1, 2009;
7(4):
341 - 372.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
X. Huang and G. Tauchen
The Relative Contribution of Jumps to Total Price Variance
J. Financial Econometrics,
October 1, 2005;
3(4):
456 - 499.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Canopius
Practitioners' Corner: Introduction to the Special Issue
J. Financial Econometrics,
October 1, 2005;
3(4):
447 - 455.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. R. Hansen and A. Lunde
A Realized Variance for the Whole Day Based on Intermittent High-Frequency Data
J. Financial Econometrics,
October 1, 2005;
3(4):
525 - 554.
[Abstract]
[Full Text]
[PDF]
|
 |
|