Time Series Analysis Andrea Beccarini Center for Quantitative Economics
Winter 2013/2014
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
1 / 143
Introduction Objectives
Time series are ubiquitous in economics, and very important in macro economics and financial economics GDP, inflation rates, unemployment, interest rates, stock prices You will learn . . . the formal mathematical treatment of time series and stochastic processes what the most important standard models in economics are how to fit models to real world time series
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
2 / 143
Introduction Prerequisites
Descriptive Statistics Probability Theory Statistical Inference
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
3 / 143
Introduction Class and material
Class Class teacher: Sarah Meyer Time: Tu., 12:00-14:00 Location: CAWM 3 Start: 22 October 2013 Material Course page on Blackboard Slides and class material are (or will be) downloadable
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
4 / 143
Introduction Literature
Neusser, Klaus (2011), Zeitreihenanalyse in den Wirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden. −→ available online in the RUB-Netz Hamilton, James D. (1994), Time Series Analysis, Princeton University Press, Princeton. Pfaff, Bernhard (2006), Analysis of Integrated and Cointegrated Time Series with R, Springer, New York. Schlittgen, Rainer und Streitberg, Bernd (1997), Zeitreihenanalyse, 7. Aufl., Oldenbourg, M¨ unchen.
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
5 / 143
Basics Definition
Definition: Time series A sequence of observations ordered by time is called time series Time series can be univariate or multivariate Time can be discrete or continous The states can be discrete or continuous
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
6 / 143
Basics Definition
Typical notations x1 , x2 , . . . , xT or x(1), x(2), . . . , x(T ) or xt , t = 1, . . . , T or (xt )t≥0 This course is about . . . univariate time series in discrete time with continuous states
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
7 / 143
Basics Examples
Quarterly GDP Germany, 1991 I to 2012 II
600
● ●● ● ● ●
550
● ● ●●
●●
500
● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
450 400 350
GDP (in current billion Euro)
650
●● ●●
● ● ●
●●
●●
●● ● ●
●● ● ● ● ● ● ●
●●
●
● ●
●
● ●
●
● ● ●
1995
2000
2005
2010
Time
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
8 / 143
Basics Examples
6000 2000
DAX
DAX index and log(DAX), 31.12.1964 to 6.4.2009
1970
1980
1990
2000
2010
2000
2010
9.0 8.0 7.0 6.0
logarithm of DAX
Time
1970
1980
1990 Time
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
9 / 143
Basics Definition
Definition: Stochastic process A sequence (Xt )t∈T of random variables, all defined on the same probability space (Ω, A, P), is called stochastic process with discrete time parameter (usually T = N or T = Z) Short version: A stochastic process is a sequence of random variables A stochastic process depends on both chance and time
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
10 / 143
Basics Definition
Distinguish four cases: both time and chance can be fixed or variable
ω fixed
ω variable
t fixed Xt (ω) is a real number Xt (ω) is a random variable
t variable Xt (ω) is a sequence of real numbers (path, realization, trajectory) Xt (ω) is a stochastic process
process.R
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
11 / 143
Basics Examples
Example 1: White noise εt ∼ NID 0, σ 2
Example 2: Random walk Xt
= Xt−1 + εt
εt
2
and X0 = 0
∼ NID(0, σ )
Example 3: A random constant Xt Z
Andrea Beccarini (CQE)
= Z ∼ N(0, σ 2 )
Time Series Analysis
Winter 2013/2014
12 / 143
Basics Moment functions
Definition: Moment functions The following functions of time are called moment functions: µ(t) = E (Xt ) (expectation function) σ 2 (t) = Var (Xt ) (variance function) γ(s, t) = Cov (Xs , Xt ) (covariance function) Correlation function (autocorrelation function) γ(s, t) p ρ(s, t) = p 2 σ (s) σ 2 (t) moments.R
Andrea Beccarini (CQE)
[1]
Time Series Analysis
Winter 2013/2014
13 / 143
Basics Estimation of moment functions
Usually, the moment functions are unknown and have to be estimated Problem: Only a single path (realization) can be observed X1 (1) X2 .. .
(1)
X1 (2) X2 .. .
(2)
(1) XT
(2) XT
... ... ... ...
(n)
X1 (n) X2 .. . (n)
XT
Can we still estimate the expectation function µ(t) and the autocovariance function γ(s, t)? Under which conditions?
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
14 / 143
Basics Estimation of moment functions
X1 (1) X2 .. .
(1)
X1 (2) X2 .. .
(2)
(1) XT
(2) XT
... ... ... ...
(n)
X1 (n) X2 .. . (n)
XT
Usually, the expectation function µ(t) should be estimated by averaging over realizations, n
1 X (i) µ ˆ(t) = Xt n i=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
15 / 143
Basics Estimation of moment functions
X1 (1) X2 .. .
(1)
X1 (2) X2 .. .
(2)
(1) XT
(2) XT
... ... ... ...
(n)
X1 (n) X2 .. . (n)
XT
Under certain conditions, µ(t) can be estimated by averaging over time, T 1 X (1) µ ˆ= Xt T t=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
15 / 143
Basics Estimation of moment functions
X1 (1) X2 .. .
(1)
X1 (2) X2 .. .
(2)
(1) XT
(2) XT
... ... ... ...
(n)
X1 (n) X2 .. . (n)
XT
Usually, the autocovariance γ(t, t + h) should be estimated by averaging over realizations, n
1 X (i) (i) γˆ (t, t + h) = (Xt − µ ˆ(t))(Xt+h − µ ˆ(t + h)) n i=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
16 / 143
Basics Estimation of moment functions
X1 (1) X2 .. .
(1)
X1 (2) X2 .. .
(2)
(1) XT
(2) XT
... ... ... ...
(n)
X1 (n) X2 .. . (n)
XT
Under certain conditions, γ(t, t + h) can be estimated by averaging over time, γˆ (t, t + h) =
T −h 1 X ˆ)(Xt+h (1) − µ ˆ) (Xt (1) − µ T t=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
16 / 143
Basics Definition
Moment functions cannot be estimated without additional assumptions since only one path is observed There are restrictions which allow to estimate the moment functions Restriction of the time heterogeneity: The distribution of (Xt (ω))t∈T must not be completely different for each t ∈ T Restriction of the memory: If the values of the process are coupled too closely over time, the individual observations do not supply any (or only insufficient) information about the distribution
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
17 / 143
Basics Restriction of time heterogeneity: Stationarity
Definition: Strong stationarity Let (Xt )t∈T be a stochastic process, and let t1 , . . . , tn ∈ T be an arbitrary number of n ∈ N arbitrary time points. (Xt )t∈T is called strongly stationary if for arbitrary h ∈ Z P(Xt1 ≤ x1 , . . . , Xtn ≤ xn ) = P(Xt1 +h ≤ x1 , . . . , Xtn +h ≤ xn ) Implication: all univariate marginal distributions are identical
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
18 / 143
Basics Restriction of time heterogeneity: Stationarity
Definition: Weak stationarity (Xt )t∈T is called weakly stationary if 1
the expectation exists and is constant: E (Xt ) = µ < ∞ for all t ∈ T
2
the variance exists and is constant: Var (Xt ) = σ 2 < ∞ for all t ∈ T
3
for all t, s, r ∈ Z (in admissible range) γ(t, s) = γ (t + r , s + r )
Simplified notation for covariance and correlation functions γ(h) = γ(t, t + h) ρ(h) = ρ(t, t + h)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
19 / 143
Basics Restriction of time heterogeneity: Stationarity
Strong stationarity implies weak stationarity (but only if the first two moments exist) A stochastic process is called Gaussian if the joint distribution of Xt1 , . . . , Xtn is multivariate normal For Gaussian processes, weak and strong stationarity coincide Intuition: An observed time series can be regarded as a realization of a stationary process, if a gliding window of appropriate width“ ” always displays qualitatively the same“ picture ” stationary.R Examples
Andrea Beccarini (CQE)
[2]
Time Series Analysis
Winter 2013/2014
20 / 143
Basics Restriction of memory: Ergodicity
Definition: Ergodicity (I) Let (Xt )t∈T be a weakly stationary stochastic process with expectation µ and autocovariance γ(h); define µ ˆ=
T 1 X Xt T t=1
(Xt )t∈T is called (expectation) ergodic, if h i lim E (ˆ µT − µ)2 = 0
T →∞
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
21 / 143
Basics Restriction of memory: Ergodicity
Definition: Ergodicity (II) Let (Xt )t∈T be a weakly stationary stochastic process with expectation µ and autocovariance γ(h); define γˆ (h) =
T −h 1 X (Xt − µ)(Xt+h − µ) T t=1
(Xt )t∈T is called (covariance) ergodic, if for all h ∈ Z h i lim E (ˆ γ (h) − γ(h))2 = 0
T →∞
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
22 / 143
Basics Restriction of memory: Ergodicity
Ergodicity is consistency (in quadratic mean) of the estimators µ ˆ of µ and γˆ (h) of γ(h) for dependent observations The process (Xt )t∈T is expectation ergodic if (γ(h))h∈Z is absolutely summable, i.e. ∞ X
|γ(h)| < ∞
h=−∞
The dependence between far away observations must be sufficiently small
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
23 / 143
Basics Restriction of memory: Ergodicity
Ergodicity condition (for autocovariance): A stationary Gaussian process (Xt )t∈T with absolutely summable autocovariance function γ(h) is (autocovariance) ergodic Under ergodicity, the law of large numbers holds even if the observations are dependent If the dependence γ(h) does not diminish fast enough, the estimators are no longer consistent Examples
Andrea Beccarini (CQE)
[3]
Time Series Analysis
Winter 2013/2014
24 / 143
Basics Estimation of moment functions
Summary of estimators
electricity.R
T 1 X ¯ µ ˆ = XT = Xt T t=1
T −h X
γˆ (h) =
1 T
ρˆ(h) =
γˆ (h) γˆ (0)
(Xt − µ ˆ)(Xt+h − µ ˆ)
t=1
Sometimes, γˆ (h) is defined with factor 1/(T − h)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
25 / 143
Basics Estimation of moment functions
A closer look at the expectation estimator µ ˆ The estimator µ ˆ is unbiased, i.e. E (ˆ µ) = µ
[4]
The variance of µ ˆ is
[5]
T −1 γ (0) 2 X h Var (ˆ µ) = + 1− γ (h) T T T h=1
Under ergodicity, for T → ∞ T · Var (ˆ µ) → γ (0) + 2
∞ X h=1
Andrea Beccarini (CQE)
Time Series Analysis
γ (h) =
∞ X
γ(h)
h=−∞
Winter 2013/2014
26 / 143
Basics Estimation of moment functions
For Gaussian processes, µ ˆ is normally distributed µ ˆ ∼ N (µ, Var (ˆ µ)) and asymptotically √
T (ˆ µ − µ) → Z ∼ N
0, γ (0) + 2
∞ X
! γ (h)
h=1
For non-Gaussian processes, µ ˆ is (often) asymptotically normal ! ∞ X √ T (ˆ µ − µ) → Z ∼ N 0, γ (0) + 2 γ (h) h=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
27 / 143
Basics Estimation of moment functions
A closer look at the autocovariance estimators γˆ (h) For Gaussian processes with absolutely summable covariance function, 0 √ √ T (ˆ γ (0) − γ (0)) , . . . , T (ˆ γ (K ) − γ (K )) is multivariate normal with expectation vector (0, . . . , 0)0 and T · Cov (ˆ γ (h1 ) , γˆ (h2 )) ∞ X = (γ (r ) γ (r + h1 + h2 ) + γ (r − h2 ) γ (r + h1 )) r =−∞
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
28 / 143
Basics Estimation of moment functions
A closer look at the autocorrelation estimators ρˆ(h) For Gaussian processes with absolutely summable covariance function, the random vector √ 0 √ T (ˆ ρ (0) − ρ (0)) , . . . , T (ˆ ρ (K ) − ρ (K )) is multivariate normal with expectation vector (0, . . . , 0)0 and a complicated covariance matrix Be careful: For small to medium sample sizes the autocovariance and autocorrelation estimators are biased! autocorr.R
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
29 / 143
Basics Estimation of moment functions
An important special case for autocorrelation estimators: Let (εt ) be a white-noise process with Var (εt ) = σ 2 < ∞, then E (ˆ ρ (h)) = −T −1 + O(T −2 ) −1 −2 ) T + O(T Cov (ˆ ρ (h1 ) , ρˆ (h2 )) = −2 O T
for h1 = h2 else
For white-noise processes and long time series, the empirical autocorrelations are approximately independent normal random variables with expectation −T −1 and variance T −1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
30 / 143
Mathematical digression (I) Complex numbers
Some quadratic equations do not have real solutions, e.g. x2 + 1 = 0 Still it is possible (and sensible) to define solutions to such equations The definition in common notation is √ i = −1 where i is the number which, when squared, equals −1 The number i is called imaginary (i.e. not real)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
31 / 143
Mathematical digression (I) Complex numbers
Other imaginary numbers follow from this definition, e.g. √ √ √ −16 = 16 −1 = 4i √ √ √ √ −5 = 5 −1 = 5i Further, it is possible to define numbers that contain both a real part and an imaginary part, e.g. 5 − 8i or a + bi Such numbers are called complex and the set of complex numbers is denoted as C The pair a + bi and a − bi is called conjugate complex
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
32 / 143
Mathematical digression (I) Complex numbers
imaginary axis
seq(0, 8, length = 11)
Geometric interpretation:
●
a+bi
er
alu ev lut
so
ab
θ
imaginary part b
real part a
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
33 / 143
Mathematical digression (I) Complex numbers
Polar coordinates and Cartesian coordinates z
= a + bi = r · (cos θ + i sin θ) = re iθ
a = r cos θ b = r sin θ p a2 + b 2 r = b θ = arctan a
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
34 / 143
Mathematical digression (I) Complex numbers
Rules of calculus: Addition (a + bi) + (c + di) = (a + c) + (b + d)i Multiplication (cartesian coordinates) (a + bi) · (c + di) = (ac − bd) + (ad + bc)i Multiplication (polar coordinates) r1 e iθ1 · r2 e iθ2 = r1 r2 e i(θ1 +θ2 )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
35 / 143
Mathematical digression (I) Complex numbers
imaginary axis
seq(−2, 8, length = 11)
Addition:
●
a+bi
c+di ●
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
36 / 143
Mathematical digression (I) Complex numbers
Addition:
imaginary axis
seq(−2, 8, length = 11)
●
●
a+bi
c+di ●
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
36 / 143
Mathematical digression (I) Complex numbers
Addition: (a+c)+(b+d)i
imaginary axis
seq(−2, 8, length = 11)
●
●
a+bi
c+di ●
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
36 / 143
Mathematical digression (I) Complex numbers
imaginary axis
seq(−2, 8, length = 11)
Multiplication:
●
θ2
r2
r1
●
θ1
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
37 / 143
Mathematical digression (I) Complex numbers
Multiplication:
imaginary axis
seq(−2, 8, length = 11)
●
r=
r1
⋅r
2
●
θ = θ1 + θ2 θ2
r2
r1
●
θ1
real axis
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
37 / 143
Mathematical digression (I) Complex numbers
The quadratic equation x 2 + px + q = 0 has the solutions p x =− ± 2 If
p2 4
r
p2 −q 4
− q < 0 the solutions are complex (and conjugate)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
38 / 143
Mathematical digression (I) Complex numbers
Example: The solutions of x 2 − 2x + 5 = 0 are (−2) + x =− 2
r
(−2)2 − 5 = 1 + 2i 4
(−2) x =− − 2
r
(−2)2 − 5 = 1 − 2i 4
and
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
39 / 143
Mathematical digression (II) Linear difference equations
First order difference equation with initial value x0 : xt = c + φ1 xt−1 p-th order difference equation with initial value x0 : xt = c + φ1 xt−1 + . . . + φp xt−p A sequence (xt )t=0,1,... that satisfies the difference equation is called a solution of the difference equation Examples (diffequation.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
40 / 143
Mathematical digression (II) Linear difference equations
We only consider the homogeneous case, i.e. c = 0 The general solution of the first-order difference equation xt = φ1 xt−1 is xt = A · φt1 with arbitrary constant A since xt = Aφt1 = φ1 Aφt−1 = φ1 xt−1 1 The constant is definitized by the initial condition, A = x0 The sequence xt = Aφt1 is convergent if and only if |φ1 | < 1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
41 / 143
Mathematical digression (II) Linear difference equations
Solution of the p-th order difference equation xt = φ1 xt−1 + . . . + φp xt−p Let xt = Az −t , then Az −t z −t
= φ1 Az −(t−1) + . . . + φp Az −(t−p) = φ1 z −(t−1) + . . . + φp z −(t−p)
and thus 1 − φ1 z 1 − . . . − φp z p = 0 Characteristic polynomial, characteristic equation
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
42 / 143
Mathematical digression (II) Linear difference equations
There are p (possibly complex, possibly nondistinct) solutions of the characteristic equation Denote the solutions (called roots) by z1 , . . . , zp If all roots are real and distinct, then xt = A1 z1−t + . . . + Ap zp−t is a solution of the homogeneous difference equation If there are complex roots the solution is oscillating The constants A1 , . . . , Ap can be definitized with p initial conditions (x0 , x−1 , . . . , xp−1 )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
43 / 143
Mathematical digression (II) Linear difference equations
Stability condition: The linear difference equation xt = φ1 xt−1 + . . . + φp xt−p is stable (i.e. convergent) if and only if all roots of the characteristic polynomial 1 − φ1 z − . . . − φp z p = 0 are outside the unit circle, i.e. |zi | > 1 for all i = 1, . . . , p In R, the stability condition can be checked easily using the commands polyroot (base package) or ArmaRoots (fArma package)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
44 / 143
ARMA models Definition
Definition: ARMA process Let (εt )t∈T be a white noise process; the stochastic process Xt = φ1 Xt−1 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q with φp , θq 6= 0 is called ARMA(p, q) process AutoRegressive Moving Average process ARMA processes are important since every stationary process can be approximated by an ARMA process
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
45 / 143
ARMA models Lag operator and lag polynomial
The lag operator is a convenient notational tool The lag operator L shifts the time index of a stochastic process L (Xt )t∈T = (Xt−1 )t∈T LXt
= Xt−1
Rules L2 Xt n
L Xt
= Xt−n
−1
L
= Xt+1
0
= Xt
L Xt
Andrea Beccarini (CQE)
= L (LXt ) = Xt−2
Time Series Analysis
Winter 2013/2014
46 / 143
ARMA models Lag operator and lag polynomial
Lag polynomial A(L) = a0 + a1 L + a2 L2 + . . . + ap Lp Example: Let A(L) = 1 − 0.5L and B(L) = 1 + 4L2 , then C (L) = A(L)B(L) = (1 − 0.5L) 1 + 4L2
= 1 − 0.5L + 4L2 − 2L3 Lag polynomials can be treated in the same way as ordinary polynomials
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
47 / 143
ARMA models Lag operator and lag polynomial
Define the lag polynomials Φ(L) = 1 − φ1 L − . . . − φp Lp Θ(L) = 1 + θ1 L + . . . + θq Lq The ARMA(p, q) process can be written compactly as Φ(L)Xt = Θ(L)εt Important special cases MA(q) process :
Xt = εt + θ1 εt−1 + . . . + θq εt−q
AR(1) process :
Xt = φ1 Xt−1 + εt
AR(p) process :
Xt = φ1 Xt−1 + · · · + φp Xt−p + εt
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
48 / 143
ARMA models MA(q) process
The MA(q) process is Xt
= Θ(L)εt
Xt
= εt + θ1 εt−1 + . . . + θq εt−q
with εt ∼ NID(0, σε2 ) Expectation function E (Xt ) = E (εt + θ1 εt−1 + . . . + θq εt−q ) = E (εt ) + θ1 E (εt−1 ) + . . . + θq E (εt−q ) = 0
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
49 / 143
ARMA models MA(q) process
Autocovariance function γ (s, t) = E (εs + θ1 εs−1 + . . . + θq εs−q ) (εt + θ1 εt−1 + . . . + θq εt−q ) = E εs εt + θ1 εs εt−1 + θ2 εs εt−2 + . . . + θq εs εt−q +θ1 εs−1 εt + θ12 εs−1 εt−1 + θ1 θ2 εs−1 εt−2 + . . . + θ1 θq εs−1 εt−q +... +θq εs−q εt + θ1 θq εs−q εt−1 + θ2 θq εs−q εt−2 + . . . + θq2 εs−q εt−q
The expectations of the cross products are 0 for s 6= t E (εs εt ) = 2 σε for s = t Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
50 / 143
ARMA models MA(q) process
Define θ0 = 1, then γ (t, t) = σε2 γ (t − 1, t) =
Xq
θ2 i=0 i Xq−1 σε2 θi θi+1 i=0
γ (t − 2, t) = σε2
Xq−2 i=0
θi θi+2
γ (t − q, t) = σε2 θ0 θq = σε2 θq γ (s, t) = 0 for s < t − q Hence, MA(q) processes are always stationary Simulation of MA(q) processes (maqsim.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
51 / 143
ARMA models AR(1) process
The AR(1) process is Φ(L)Xt
= εt
(1 − φ1 L)Xt
= εt
Xt
= φ1 Xt−1 + εt
with εt ∼ NID(0, σε2 ) Expectation and variance function
[6]
Stability condition: AR(1) processes are stable if |φ1 | < 1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
52 / 143
ARMA models AR(1) process
Stationarity: Stable AR(1) processes are weakly stationary if
[7]
E (X0 ) = 0 Var (X0 ) =
σε2 1 − φ21
Nonstationary stable processes converge towards stationarity
[8]
It is common parlance to call stable processes stationary Covariance function of stationary AR(1) process
Andrea Beccarini (CQE)
Time Series Analysis
[9]
Winter 2013/2014
53 / 143
ARMA models AR(p) process
The AR(p) process is Φ(L)Xt Xt
= εt = φ1 Xt−1 + . . . + φp Xt−p + εt
with εt ∼ NID(0, σε2 ) Assumption: εt is independent from Xt−1 , Xt−2 , . . . (innovations) Expectation function
[10]
The covariance function is complicated (ar2autocov.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
54 / 143
ARMA models AR(p) process
AR(p) processes are stable if all roots of the characteristic equation Φ(z) = 0 are larger than 1 in absolute value, |zi | > 1 for i = 1, . . . , p An AR(p) process is weakly stationary if the joint distribution of the p initial values (X0 , X−1 , . . . , X−(p−1) ) is appropriate“ ” Stable AR(p) processes converge towards stationarity; they are often called stationary Simulation of AR(p) processes (arpsim.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
55 / 143
ARMA models Invertability
AR and MA processes can be inverted (into each other) Example: Consider the stable AR (1) process with |φ1 | < 1 Xt
= φ1 Xt−1 + εt = φ1 (φ1 Xt−2 + εt−1 ) + εt = φ21 Xt−2 + φ1 εt−1 + εt .. . = φn1 Xt−n + φ1n−1 εt−(n−1) + . . . + φ21 εt−2 + φ1 εt−1 + εt
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
56 / 143
ARMA models Invertability
Since |φ1 | < 1 Xt
=
∞ X
φi1 εt−i
i=0
= εt + θ1 εt−1 + θ2 εt−2 + . . . with θi = φi1 A stable AR(1) process can be written as an MA(∞) process (the same is true for stable AR(p) processes)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
57 / 143
ARMA models Invertability
Using lag polynomials this can be written as (1 − φ1 L)Xt Xt Xt
= εt = (1 − φ1 L)−1 εt ∞ X = (φ1 L)i εt i=0
General compact and elegant notation Φ(L)Xt Xt
= εt = (Φ(L))−1 εt = Θ(L)εt
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
58 / 143
ARMA models Invertability
MA(q) can be written as AR(∞) if all roots of Θ(z) = 0 are larger than 1 in absolute value (invertability condition) Example: MA(1) with |θ1 | < 1; from Xt
= εt + θ1 εt−1
θ1 Xt−1 = θ1 εt−1 + θ12 εt−2 we find Xt = θ1 Xt−1 + εt − θ12 εt−2 Repeated substitution of the εt−i terms yields Xt =
∞ X
φi Xt−i + εt
with φi = (−1)i+1 θ1i
i=1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
59 / 143
ARMA models Invertability
Summary ARMA(p, q) processes are stable if all roots of Φ(z) = 0 are larger than 1 in absolute value ARMA(p, q) processes are invertible if all roots of Θ(z) = 0 are larger than 1 in absolute value
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
60 / 143
ARMA models Invertability
Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q) process either as AR(∞) or as MA(∞) ARMA(p, q) can be written as AR(∞) or MA(∞) Φ(L)Xt Xt (Θ(L))−1 Φ(L)Xt
Andrea Beccarini (CQE)
= Θ(L)εt = (Φ(L))−1 Θ(L)εt = εt
Time Series Analysis
Winter 2013/2014
61 / 143
ARMA models Deterministic components
Until now we only considered processes with zero expectation Many processes have both a zero-expectation stochastic component (Yt ) and a non-zero deterministic component (Dt ) Examples: linear trend Dt = a + bt exponential trend Dt = ab t saisonal patterns
Let (Xt )t∈Z be a stochastic process with deterministic component Dt and define Yt = Xt − Dt
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
62 / 143
ARMA models Deterministic components
Then E (Yt ) = 0 and Cov (Yt , Ys ) = E [(Yt − E (Yt )) (Ys − E (Ys ))] = E [(Xt − Dt − E (Xt −Dt ))(Xs − Ds − E (Xs −Ds ))] = E [(Xt − E (Xt )) (Xs − E (Xs ))] = Cov (Xt , Xs ) The covariance function does not depend on the deterministic component To derive the covariance function of a stochastic process, simply drop the deterministic component
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
63 / 143
ARMA models Deterministic components
Special case: Dt = µt = µ ARMA(p, q) process with constant (non-zero) expectation Xt − µ = φ1 (Xt−1 − µ) + . . . + φp (Xt−p − µ) +εt + θ1 εt−1 + . . . + θq εt−q The process can also be written as Xt = c + φ1 Xt−1 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q where c = µ (1 − φ1 − . . . − φp )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
64 / 143
ARMA models Deterministic components
Wold’s representation theorem: Every stationary stochastic process (Xt )t∈T can be represented as Xt =
∞ X
ψh εt−h + Dt
h=0
with ψ0 = 1,
P∞
2 h=0 ψj
< ∞ and εt white noise with variance σ 2 > 0
Stationary stochastic processes can be written as a sum of a deterministic process and an MA(∞) process Often, low order ARMA(p, q) processes can approximate MA(∞) processes well
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
65 / 143
ARMA models Linear processes and filter
Definition: Linear process Let (εt )t∈Z be a white noise process; a stochastic process (Xt )t∈Z is called linear if it can be written as Xt
=
∞ X
ψh εt−h
h=−∞
= Ψ(L)εt where the coefficients are absolutely summable, i.e.
P∞
h=−∞ |ψh |
< ∞.
The lag polynomial Ψ(L) is called (linear) filter
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
66 / 143
ARMA models Linear processes and filter
Some special filters Change from previous period (difference filter) Ψ(L) = 1 − L Change from last year (for quarterly or monthly data) Ψ(L) = 1 − L4 Ψ(L) = 1 − L12 Elimination of saisonal influences (quarterly data) Ψ(L) = 1 + L + L2 + L3 /4 Ψ(L) = 0.125L2 + 0.25L + 0.25 + 0.25L−1 + 0.125L−2 Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
67 / 143
ARMA models Linear processes and filter
Hodrick-Prescott filter (important tool in empirical macro economics) Decompose a time series (Xt ) into a long-term growth component (Gt ) and a short-term cyclical component (Ct ) Xt = Gt + Ct Trade-off between goodness-of-fit and smoothness of Gt Minimize the criterion function T X
(Xt − Gt )2 + λ
t=1
T −1 X
[(Gt+1 − Gt ) − (Gt − Gt−1 )]2
t=2
with respect to Gt for given smoothness parameter λ
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
68 / 143
ARMA models Linear processes and filter
The FOCs of the minimization problem are G1 X1 . .. . = A .. GT XT where A = (I + λK 0 K )−1 with 1 −2 1 0 0 0 1 −2 1 0 1 −2 1 K = 0 0 .. .. .. .. .. . . . . . 0 0 0 0 0
Andrea Beccarini (CQE)
Time Series Analysis
... 0 ... 0 ... 0 . . . . ..
0 0 0 .. .
0 0 0 .. .
. . . 1 −2 1
Winter 2013/2014
69 / 143
ARMA models Linear processes and filter
The HP filter is a linear filter Typical values for smoothing parameter λ λ = 10 λ = 1600 λ = 14400
annual data quarterly data monthly data
Implementation in R (code by Olaf Posch) Empirical examples (hpfilter.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
70 / 143
Estimation of ARMA models The estimation problem
Problem: The parameters φ1 , . . . , φp , θ1 , . . . , θq , σε2 of an ARMA(p, q) process are usually unknown They have to be estimated from an observed time series X1 , . . . , XT Standard estimation methods: Least squares (OLS) Maximum likelihood (ML)
Assumption: the lag orders p and q are known
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
71 / 143
Estimation of ARMA models Least squares estimation of AR(p) models
The AR(p) model with non-zero constant expectation Xt = c + φ1 Xt−1 + . . . + φp Xt−p + εt can be writte in matrix notation Xp+1 1 Xp Xp−1 Xp+2 1 Xp+1 Xp .. = .. .. .. . . . . XT
... ... .. .
X1 X2 .. .
1 XT −1 XT −2 . . . XT −p
c φ1 .. .
+
εp+1 εp+2 .. .
φp
εT
Compact notation: y = Xβ + u
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
72 / 143
Estimation of ARMA models Least squares estimation of AR(p) models
The standard least squares estimator is −1 0 βˆ = X0 X Xy The matrix of exogenous variables X is stochastic −→ usual results for OLS regression do not hold But: There is no contemporaneous correlation between the error term and the exogenous variables Hence, the OLS estimators are consistent and asymptotically efficient
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
73 / 143
Estimation of ARMA models Least squares estimation of ARMA models
Solve the ARMA equation Xt = c + φ1 Xt−1 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q for εt , εt = Xt − c − φ1 Xt−1 − . . . − φp Xt−p − θ1 εt−1 − . . . − θq εt−q Define the residuals as functions of the unknown parameters εˆt (d, f1 , . . . , fp , g1 , . . . , gq ) = Xt − d − f1 Xt−1 − . . . − fp Xt−p −g1 εˆt−1 − . . . − gq εˆt−q
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
74 / 143
Estimation of ARMA models Least squares estimation of ARMA models
Define the sum of squared residuals S (d, f1 , . . . , fp , g1 , . . . , gq ) =
T X
(ˆ εt (d, f1 , . . . , fp , g1 , . . . , gq ))2
t=1
The least squares estimators are (ˆ c , φˆ1 , . . . , φˆp , θˆ1 , . . . , θˆq ) = arg min S (d, f1 , . . . , fp , g1 , . . . , gq ) Since the residuals are defined recursively one needs starting values εˆ0 , . . . , εˆ−q+1 and X0 , . . . , X−p+1 to calculate εˆ1 Easiest way: Set all starting values to zero ( conditional estimation“) ”
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
75 / 143
Estimation of ARMA models Least squares estimation of ARMA models
The first order conditions are a nonlinear equation system which cannot be solved easily Minimization by standard numerical methods (implemented in all usual statistical packages) Either solve the nonlinear first order conditions equation system or minimize S Simple special case: ARMA(1, 1) arma11.R
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
76 / 143
Estimation of ARMA models Maximum likelihood estimation
Additional assumption: The innovations εt are normally distributed Implication: ARMA processes are Gaussian The joint distribution of X1 , . . . , XT is multivariat normal X1 X = ... ∼ N (µ, Σ) XT
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
77 / 143
Estimation of ARMA models Maximum likelihood estimation
Expectation vector
X1 c/ (1 − φ1 − . . . − φp ) .. µ = E ... = . XT c/ (1 − φ1 − . . . − φp )
Covariance matrix X1 X2 Σ = Cov . = .. XT
Andrea Beccarini (CQE)
. . . γ(T − 1) . . . γ (T − 2) .. .. . . γ(T − 1) γ (T − 2) . . . γ(0) γ(0) γ(1) .. .
Time Series Analysis
γ(1) γ(0) .. .
Winter 2013/2014
78 / 143
Estimation of ARMA models Maximum likelihood estimation
The expectation vector and the covariance matrix contain all 2 unknown parameters ψ = φ1 , . . . , φp , θ1 , . . . , θq , c, σε The likelihood function is −T /2
L (ψ; X) = (2π)
−1/2
(det Σ)
1 0 −1 exp − (X − µ) Σ (X − µ) 2
and the loglikelihood function is ln L (ψ; X) = −
T 1 1 ln (2π) − ln (det Σ) − (X − µ)0 Σ−1 (X − µ) 2 2 2
The ML estimators are ψˆ = arg max ln L (ψ; X)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
79 / 143
Estimation of ARMA models Maximum likelihood estimation
The loglikelihood function has to be maximized by numerical methods Standard properties of ML estimators: 1 2 3 4
consistency asymptotic efficiency asymptotically jointly normally distributed the covariance matrix of the estimators can be consistently estimated
Example: ML estimation of an ARMA(3, 3) model for the interest rate spread (arma33.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
80 / 143
Estimation of ARMA models Hypothesis tests
Since the estimation method is maximum likelihood, the classical tests (Wald, LR, LM) are applicable General null and alternative hypotheses H0 : g (ψ) = 0 H1 : not H0 where g (ψ) is an m-valued function of the parameters Example: If H0 : φ1 = 0 then m = 1 and g (ψ) = φ1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
81 / 143
Estimation of ARMA models Hypothesis tests
Likelihood ratio test statistic LR = 2(ln L(θˆML ) − ln L(θˆR )) where θˆML and θˆR are the unrestricted and restricted estimators Under the null hypothesis d
LR −→ U ∼ χ2m and H0 is rejected at significance level α if LR > χ2m;1−α Disadvantage: Two models must be estimated
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
82 / 143
Estimation of ARMA models Hypothesis tests
For the Wald test we only consider g (ψ) = ψ − ψ0 , i.e. H0 : ψ = ψ0 H1 : not H0 Test statistic d (ψ)( ˆ ψˆ − ψ0 ) W = (ψˆ − ψ0 )0 Cov d
If the null hypothesis is true then W −→ U ∼ χ2m The asymptotic covariance matrix can be estimated consistently as d (ψ) ˆ = H −1 where H is the Hessian matrix returned by the Cov maximization procedure
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
83 / 143
Estimation of ARMA models Hypothesis tests
Test example 1: H0 : φ 1 = 0 H1 : φ1 6= 0 Test example 2 H0 : ψ = ψ0 H1 : not H0 Illustration (arma33.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
84 / 143
Estimation of ARMA models Model selection
Usually, the lag orders p and q of an ARMA model are unknown Trade-off: Goodness-of-fit against parsimony Akaike’s information criterion for the model with non-zero expectation AIC =
ln σ ˆ2 |{z}
goodness-of-fit
+ 2 (p + q + 1) /T | {z } penalty
Choose the model with the smallest AIC
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
85 / 143
Estimation of ARMA models Model selection
Bayesian information criterion BIC (Schwarz information criterion) BIC = ln σ ˆ 2 + (p + q + 1) · ln T /T Hannan-Quinn information criterion HQ = ln σ ˆ 2 + 2 (p + q + 1) · ln (ln T ) /T Both BIC and HQ are consistent while the AIC tends to overfit Illustration (arma33.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
86 / 143
Estimation of ARMA models Model selection
Another illustration: The true model is ARMA(2, 1) with Xt = 0.5Xt−1 + 0.3Xt−2 + εt + 0.7εt−1 ; 1000 samples of size n = 500 were generated; the table shows the model’s orders p and q as selected by AIC and BIC p 0 1 2 3 4 5
0 0 0 0 0 9 11
# orders selected by q 1 2 3 0 0 0 18 64 23 171 21 16 7 35 58 2 12 139 6 12 56
Andrea Beccarini (CQE)
AIC 4 0 14 5 80 37 46
5 0 6 7 45 44 56
0 0 0 0 1 6 1
Time Series Analysis
# orders selected by q 1 2 3 0 0 0 310 167 4 503 3 1 0 2 1 1 0 0 0 0 0
BIC 4 0 0 0 0 0 0
Winter 2013/2014
5 0 0 0 0 0 0
87 / 143
Integrated processes Difference operator
Define the difference operator ∆ = 1 − L, then ∆Xt = Xt − Xt−1 Second order differences ∆2 = ∆(∆) = (1 − L)2 = 1 − 2L + L2 Higher orders ∆n are defined in the same way; note that ∆n 6= 1 − Ln
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
88 / 143
Integrated processes Definition
Definition: Integrated process A stochastic process is called integrated of order 1 if ∆Xt = µ + Ψ(L)εt P where εt is white noise, Ψ(1) 6= 0, and ∞ j=0 j|ψj | < ∞ Common notation: Xt ∼ I (1) I (1) processes are also called difference stationary or unit root processes Stochastic and deterministic trends Trend stationary processes are not I (1) (since Ψ(1) = 0)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
89 / 143
Integrated processes Definition
Stationary processes are sometimes called I (0) Higher order integrations are possible, e.g. Xt
∼ I (2)
∆ Xt
∼ I (0)
2
In general, Xt ∼ I (d) means that ∆d Xt ∼ I (0) Most economic time series are either I (0) or I (1) Some economic time series may be I (2)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
90 / 143
Integrated processes Definition
Example 1: The random walk with drift, Xt = b + Xt−1 + εt , is I (1) because ∆Xt
= Xt − Xt−1 = b + εt = b + Ψ(L)εt
where ψ0 = 1 and ψj = 0 for j 6= 0
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
91 / 143
Integrated processes Definition
Example 2: The trend stationary process, Xt = a + bt + εt , is not I (1) because ∆Xt
= b + εt − εt−1 = Ψ(L)εt
with ψ0 = 1, ψ1 = −1 and ψj = 0 for all other j
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
92 / 143
Integrated processes Definition
Example 3: The AR(2) process“ ” Xt (1 − φL) (1 − L) Xt
= b + (1 + φ) Xt−1 − φXt−2 + εt = b + εt
is I (1) if |φ| < 1 because ∆Xt = Ψ(L) (b + εt ) with Ψ(L) = (1 − φL)−1 = 1 + φL + φ2 L2 + φ3 L3 + φ4 L4 + . . . P 1 i and thus Ψ(1) = ∞ i=0 φ = 1−φ 6= 0. The roots of the characteristic equation are z = 1 and z = 1/φ
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
93 / 143
Integrated processes Definition
Example 4: The process Xt = 0.5Xt−1 − 0.4Xt−2 + εt is a stationary (stable) zero expectation AR(2) process; the process Yt = a + bt + Xt is trend stationary and I (0) since ∆Yt = b + ∆Xt with ∆Xt = Ψ(L)εt = (1 − L) 1 − 0.5L + 0.4L2 and therefore Ψ(1) = 0 (i0andi1.R)
Andrea Beccarini (CQE)
Time Series Analysis
−1
εt
Winter 2013/2014
94 / 143
Integrated processes Definition
Definition: ARIMA process Let (εt )t∈T be a white noise process; the stochastic process (Xt )t∈Z is called integrated autoregressive moving-average process of the orders p, d and q, or ARIMA(p, d, q), if ∆d Xt is an ARMA(p, q) process Φ(L)∆d Xt = Θ(L)εt For d > 0 the process is nonstationary (I (d)) even if all roots of Φ(z) = 0 are outside the unit circle Simulation of an ARIMA(p, d, q) process (arimapdqsim.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
95 / 143
Integrated processes Deterministic versus stochastic trends
Why is it important to distinguish deterministic and stochastic trends? Reason 1: Long-term forecasts and forecasting errors Deterministic trend: The forecasting error variance is bounded Stochastic trend: The forecasting error variance is unbounded Illustrations i0andi1.R
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
96 / 143
Integrated processes Deterministic versus stochastic trends
Why is it important to distinguish deterministic and stochastic trends? Reason 2: Spurious regression OLS regressions will show spurious relationships between time series with (deterministic or stochastic) trends Detrending works if the series have deterministic trends, but it does not help if the series are integrated Illustrations spurious1.R
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
97 / 143
Integrated processes Integrated processes and parameter estimation
OLS estimators (and ML estimators) are consistent and asymptotically normal for stationary processes The asymptotic normality is lost if the processes are integrated We only look at the very special case Xt = φ1 Xt−1 + εt with εt ∼ NID(0, 1) and X0 = 0 The AR(1) process is stationary if |φ1 | < 1 and has a unit root if |φ1 | = 1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
98 / 143
Integrated processes Integrated processes and parameter estimation
The usual OLS estimator of φ1 is PT t=1 Xt Xt−1 φˆ1 = P T 2 t=1 Xt−1 How does the distribution of φˆ look like? Influence of φ and T Consistency? Asymptotic normality? Illustration (phihat.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
99 / 143
Integrated processes Integrated processes and parameter estimation
Consistency and asymptotic normality for I (0) processes (|φ1 | < 1) plim φˆ1 = φ1 √ d T φˆ1 − φ1 → Z ∼ N 0, 1 − φ21 Consistency and asymptotic normality for I (1) processes (φ1 = 1) plim φˆ1 = 1 d T φˆ1 − 1 → V where V is a nondegenerate, nonnormal random variable Root-T -consistency and superconsistency Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
100 / 143
Integrated processes Unit root tests
Importance to distinguish between trend stationarity and difference stationarity Test of hypothesis that a process has a unit root (i.e. is I (1)) Classical approaches: (Augmented) Dickey-Fuller-Test, Phillips-Perron-Test Basic tool: Linear regression Xt ∆Xt
= deterministics + φXt−1 + εt = deterministics + (φ − 1) Xt−1 + εt | {z } =:β
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
101 / 143
Integrated processes Unit root tests
Null and alternative hypothesis H0 : φ = 1
(unit root)
H1 : |φ| < 1
(no unit root)
H0 : β = 0
(unit root)
H1 : β < 0
(no unit root)
or, equivalently,
Unit root tests are one-sided; explosive process are ruled out Rejecting the null hypothesis is evidence in favour of stationarity If the null hypothesis is not rejected, there could be a unit root Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
102 / 143
Integrated processes DF test and ADF test
Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests Possible regressions Xt = φXt−1 + εt Xt = a + φXt−1 + εt Xt = a + bt + φXt−1 + εt
or ∆Xt = βXt−1 + εt or ∆Xt = a + βXt−1 + εt or ∆Xt = a + bt + βXt−1 + εt
Assumption for Dickey-Fuller test: no autocorrelation in εt If there is autocorrelation in εt , use the augmented DF test
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
103 / 143
Integrated processes DF test and ADF test
Dickey-Fuller regression, case 1: no constant, no trend ∆Xt = βXt−1 + εt Null and alternative hypotheses H0 : β = 0 H1 : β < 0 Null hypothesis: stochastic trend without drift Alternative hypothesis: stationary process around zero
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
104 / 143
Integrated processes DF test and ADF test
Dickey-Fuller regression, case 2: constant, no trend ∆Xt = a + βXt−1 + εt Null and alternative hypotheses H0 : β = 0
or H0 : β = 0, a = 0
H1 : β < 0
or
H0 : β < 0, a 6= 0
Null hypothesis: stochastic trend without drift Alternative hypothesis: stationary process around a constant
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
105 / 143
Integrated processes DF test and ADF test
Dickey-Fuller regression, case 3: constant and trend ∆Xt = a + bt + βXt−1 + εt Null and alternative hypotheses H0 : β = 0
or β = 0, b = 0
H1 : β < 0
or
β < 0, b 6= 0
Null hypothesis: stochastic trend with drift Alternative hypothesis: trend stationary process
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
106 / 143
Integrated processes DF test and ADF test
Dickey-Fuller test statistics for single hypotheses “ρ-test” : “τ -test” :
T · βˆ ˆ σˆ β/ˆ φ
The τ -test statistic is computed in the same way as the usual t-test statistic Reject the null hypothesis if the test statistics are too small The critical values are not the quantiles of the t-distribution There are tables with the correct critical values (e.g. Hamilton, table B.6)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
107 / 143
Integrated processes DF test and ADF test
The Dickey-Fuller test statistics for the joint hypotheses are computed in the same way as the usual F -test statistics Reject the null hypothesis if the test statistic is too large The critical values are not the quantiles of the F -distribution There are tables with the correct critical values (e.g. Hamilton, table B.7) Illustrations (dftest.R)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
108 / 143
Integrated processes DF test and ADF test
If there is autocorrelation in εt the DF test does not work (dftest.R) Augmented Dickey-Fuller test (ADF test) regressions: ∆Xt = γ1 ∆Xt−1 + . . . + γp ∆Xt−p + βXt−1 + εt ∆Xt = a + γ1 ∆Xt−1 + . . . + γp ∆Xt−p + βXt−1 + εt ∆Xt = a + bt + γ1 ∆Xt−1 + . . . + γp ∆Xt−p + βXt−1 + εt The added lagged differences capture the autocorrelation The number of lags p must be large enough to make εt white noise The critical values remain the same as in the no-correlation case
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
109 / 143
Integrated processes DF test and ADF test
Further interesting topics (but we skip these) Phillips-Perron test Structural breaks and unit roots KPSS test of stationarity H0 : Xt ∼ I (0) H1 : Xt ∼ I (1)
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
110 / 143
Integrated processes Regression with integrated processes
Spurious regression: If Xt and Yt are independent but both I (1) then the regression Yt = α + βXt + ut will result in an estimated coefficient βˆ that is significantly different from 0 with probability 1 as T → ∞ BUT: The regression Yt = α + βXt + ut may be sensible even though Xt and Yt are I (1) Cointegration
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
111 / 143
Integrated processes Regression with integrated processes
Definition: Cointegration Two stochastic processes (Xt )t∈T and (Yt )t∈T are cointegrated if both processes are I (1) and there is a constant β such that the process (Yt − βXt ) is I (0) If β is known, cointegration can be tested using a standard unit root test on the process (Yt − βXt ) If β is unknown, it can be estimated from the linear regression Yt = α + βXt + ut and cointegration is tested using a modified unit root test on the residual process (uˆt )t=1,...,T Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
112 / 143
GARCH models Conditional expectation
Let (X , Y ) be a bivariate random variable with a joint density function, then Z ∞ E (X |Y = y ) = x fX |Y =y (x)dx −∞
is the conditional expectation of X given Y = y E (X |Y ) denotes a random variable with realization E (X |Y = y ) if the random variable Y realizes as y Both E (X |Y ) and E (X |Y = y ) are called conditional expectation
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
113 / 143
GARCH models Conditional variance
Let (X , Y ) be a bivariate random variable with a joint density function, then Z ∞ Var (X |Y = y ) = (x − E (X |Y = y ))2 fX |Y =y (x)dx −∞
is the conditional variance of X given Y = y Var (X |Y ) denotes a random variable with realization Var (X |Y = y ) if the random variable Y realizes as y Both Var (X |Y = y ) and Var (X |Y ) are called conditional variance
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
114 / 143
GARCH models Rules for conditional expectations
1
Law of iterated expectations: E (E (X |Y )) = E (X )
2
If X and Y are independent, then E (X |Y ) = E (X )
3
The condition can be treated like a constant, E (XY |Y ) = Y · E (X |Y )
4
The conditional expecation is a linear operator. For a1 , . . . , an ∈ R ! n n X X E ai Xi |Y = ai E (Xi |Y ) i=1
Andrea Beccarini (CQE)
i=1
Time Series Analysis
Winter 2013/2014
115 / 143
GARCH models Basics
Some economic time series show volatility clusters, e.g. stock returns, commodity price changes, inflation rates, . . . Simple autoregressive models cannot capture volatility clusters since their conditional variance is constant Example: Stationary AR(1)-process, Xt = αXt−1 + εt with |α| < 1; then σε2 Var (Xt ) = σX2 = , 1 − α2 and the conditional variance is Var (Xt |Xt−1 ) = σε2
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
116 / 143
GARCH models Basics
In the following, we will focus on stock returns Empirical fact: squared (or absolute) returns are positively autocorrelated Implication: Returns are not independent over time The dependence is nonlinear How can we model this kind of dependence?
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
117 / 143
GARCH models ARCH(1)-process
Definition: ARCH(1)-process The stochastic process (Xt )t∈Z is called ARCH(1)-process if E (Xt |Xt−1 ) = 0 Var (Xt |Xt−1 ) = σt2 2 = α0 + α1 Xt−1
for all t ∈ Z, with α0 , α1 > 0 Often, an additional assumption is 2 Xt | (Xt−1 = xt−1 ) ∼ N(0, α0 + α1 xt−1 )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
118 / 143
GARCH models ARCH(1)-process
The unconditional distribution of Xt is a non-normal distribution Leptokurtosis: The tails are heavier than the tails of the normal distribution Example of an ARCH(1)-process Xt = εt σt where (εt )t∈Z is white noise with σε2 = 1 and q 2 σt = α0 + α1 Xt−1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
119 / 143
GARCH models ARCH(1)-process
One can show that
[11]
E (Xt |Xt−1 ) = 0 E (Xt ) = 0 2 Var (Xt |Xt−1 ) = α0 + α1 Xt−1
Var (Xt ) = α0 / (1 − α1 ) Cov (Xt , Xt−i ) = 0
for i > 0
Stationarity condition: 0 < α1 < 1 The unconditional kurtosis is 3(1 − α12 )/(1 − 3α12 ) if εt ∼ N(0, 1). p If α1 > 1/3 = 0.57735, the kurtosis does not exist.
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
[12]
120 / 143
GARCH models ARCH(1)-process
Squared returns follow
[13]
2 Xt2 = α0 + α1 Xt−1 + vt
with vt = σt2 (ε2t − 1) Thus, squared returns of ARCH(1) are AR(1) The process (vt )t∈Z is white noise E (vt ) = 0 Var (vt ) = E (vt2 ) = const. Cov (vt , vt−i ) = 0
Andrea Beccarini (CQE)
Time Series Analysis
(i = 1, 2, . . .)
Winter 2013/2014
121 / 143
GARCH models ARCH(1)-process
Simulation of an ARCH(1)-process for t = 1, . . . , 2500 Parameters: α0 = 0.05, α1 = 0.95, start value X0 = 0 Conditional distribution: εt ∼ N(0, 1) archsim.R Check whether the simulated time series shows the typical stylized facts of return distributions
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
122 / 143
GARCH models Estimation of an ARCH(1)-process
Of course, we do not know the true values of the model parameters α0 and α1 How can we estimate the unknown parameters α0 and α1 ? Observations X1 , . . . , XT Because of 2 Xt2 = α0 + α1 Xt−1 + vt
a possible estimation method is OLS
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
123 / 143
GARCH models Estimation of an ARCH(1)-process
OLS estimator of α1 P α ˆ1 =
2 − X2 Xt2 − Xt2 Xt−1 t−1 2 ≈ ρˆ(Xt2 , Xt−1 ) 2 PT 2 2 X − X t−1 t−1 t=2
T t=2
Careful: These p estimators are only consistent if the kurtosis exists (i.e. if α1 < 1/3) Test of ARCH-effects H0 : α1 = 0 H1 : α1 > 0
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
124 / 143
GARCH models Estimation of an ARCH(1)-process
For T large, under H0 √ Reject H0 if
√
Tα ˆ 1 ∼ N(0, 1)
Tα ˆ 1 > Φ−1 (1 − α)
Second version of this test: Consider the R 2 of the regression 2 + vt , Xt2 = α0 + α1 Xt−1
then under H0
appr
Tα ˆ 12 ≈ TR 2 ∼ χ21 Reject H0 if TR 2 > Fχ−1 2 (1 − α) 1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
125 / 143
GARCH models ARCH(p)-process
Definition: ARCH(p)-process The stochastic process (Xt )t∈Z is called ARCH(p)-process if E (Xt |Xt−1 , . . . Xt−p ) = 0 Var (Xt |Xt−1 , . . . , Xt−p ) = σt2 2 2 = α0 + α1 Xt−1 + . . . + αp Xt−p
for t ∈ Z, where αi ≥ 0 for i = 0, 1, . . . , p − 1 and αp > 0 Often, an additional assumption is that Xt |(Xt−1 = xt−1 , . . . , Xt−p = xt−p ) ∼ N(0, σt2 )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
126 / 143
GARCH models ARCH(p)-process
Example of an ARCH(p)-process Xt = εt σt where(εt )t∈Z is white noise with σε2 = 1 and q 2 + ... + α X2 σt = α0 + α1 Xt−1 p t−p An ARCH(p) process is weakly stationary if all roots of 1 − α1 z − α2 z 2 − . . . − αp z p = 0 are outside the unit circle Then, for all t ∈ Z, E (Xt ) = 0 and Var (Xt ) =
Andrea Beccarini (CQE)
1−
α P0p
Time Series Analysis
i=1 αi
Winter 2013/2014
127 / 143
GARCH models ARCH(p)-process
If (Xt )t∈Z is a stationary ARCH(p) process, then (Xt2 )t∈Z is a stationary AR(p) process 2 2 Xt2 = α0 + α1 Xt−1 + . . . + αp Xt−p + vt
As to the error term, E (vt ) = 0 Var (vt ) = const. Cov (vt , vt−i ) = 0
for i = 1, 2, . . .
Simulating an ARCH(p) is easy
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
128 / 143
GARCH models Estimation of ARCH(p) models
OLS estimation of 2 2 Xt2 = α0 + α1 Xt−1 + . . . + αp Xt−p + vt
Test of ARCH-effects H0 : α1 = α2 = . . . = αp = 0
vs H1 : not H0
Let R 2 denote the coefficient of determination of the regression Under H0 , the test statistic TR 2 ∼ χ2p ; thus reject H0 if TR 2 > Fχ−1 2 (1 − α) p
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
129 / 143
GARCH models Maximum likelihood estimation
Basic idea of the maximum likelihood estimation method: Choose parameters such that the joint density of the observations fX1 ,...,XT (x1 , . . . , xT ) is maximized Let X1 , . . . , XT denote a random sample from X The density fX (x; θ) depends on R unknown parameters θ = (θ1 , . . . , θR )
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
130 / 143
GARCH models Maximum likelihood estimation
ML estimation of θ: Maximize the (log)likelihood function L (θ) = fX1 ,...XT (x1 , . . . , xT ; θ) =
ln L (θ) =
T Y t=1 T X
fX (xt ; θ)
ln fX (xt ; θ)
t=1
ML estimate θˆ = argmax [ln L (θ)]
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
131 / 143
GARCH models Maximum likelihood estimation
Since observations are independent in random samples fX1 ,...,XT (x1 , . . . , xT ) =
T Y
fXt (xt )
t=1
or ln fX1 ,...,XT (x1 , . . . , xT ) =
T X
ln fXt (xt )
t=1
=
T X
ln fX (xt )
t=1
But: ARCH-returns are not independent! Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
132 / 143
GARCH models Maximum likelihood estimation
Factorization with dependent observations fX1 ,...,XT (x1 , . . . , xT ) =
T Y
fXt |Xt−1 ,...,X1 (xt |xt−1 , . . . , x1 )
t=1
or ln fX1 ,...,XT (x1 , . . . , xT ) =
T X
ln fXt |Xt−1 ,...,X1 (xt |xt−1 , . . . , x1 )
t=1
Hence, for an ARCH(1)-process T Y
1 1 fX1 ,...,XT (x1 , . . . , xT ) = fX1 (x1 ) √ p 2 exp − 2 2π σt t=2 Andrea Beccarini (CQE)
Time Series Analysis
xt σt
2 !
Winter 2013/2014
133 / 143
GARCH models Maximum likelihood estimation
The marginal density of X1 is complicated but becomes negligible for large T and, therefore, will be dropped from now on Log-likelihood function (without initial marginal density) ln L(α0 , α1 |x1 , . . . , xT ) T
T
t=2
t=2
T 1X 1X = − ln 2π − ln σt2 − 2 2 2
xt σt
2
2 where σt2 = α0 + α1 xt−1
ML-estimation of α0 and α1 by numerical maximization of ln L(α0 , α1 ) with respect to α0 and α1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
134 / 143
GARCH models GARCH(p,q)-process
Definition: GARCH(p,q)-process The stochastic process (Xt )t∈Z is called GARCH(p, q)-process if E (Xt |Xt−1 , Xt−2 , . . .) = 0 Var (Xt |Xt−1 , Xt−2 , . . .) = σt2 2 2 = α0 + α1 Xt−1 + . . . + αp Xt−p 2 2 +β1 σt−1 + . . . + βq σt−q
for t ∈ Z with αi , βi ≥ 0 Often, an additional assumption is that (Xt |Xt−1 = xt−1 , Xt−2 = xt−2 , . . .) ∼ N(0, σt2 ) Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
135 / 143
GARCH models GARCH(p,q)-process
Conditional variance of GARCH(1, 1) Var (Xt |Xt−1 , Xt−2 , . . .) = σt2 2 2 = α0 + α1 Xt−1 + β1 σt−1 ∞ X α0 2 = + α1 β1i−1 Xt−i 1 − β1 i=1
Unconditional variance Var (Xt ) =
Andrea Beccarini (CQE)
1−
α0 Pq i=1 αi − j=1 βj
Pp
Time Series Analysis
Winter 2013/2014
136 / 143
GARCH models GARCH(p,q)-process
Necessary condition for weak stationarity p X
αi +
i=1
q X
βj < 1
j=1
(Xt )t∈Z has no autocorrelation GARCH-processes can be written as ARMA(max (p, q) , q)-processes in the squared returns Example: GARCH(1, 1)-process with Xt = εt σt and 2 + β σ2 σt2 = α0 + α1 Xt−1 1 t−1
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
137 / 143
GARCH models Estimation of GARCH(p,q)-processes
Estimation of the ARMA(max (p, q) , q)-process in the squared returns Alternative (and better) method: Maximum likelihood For a GARCH(1, 1)-process fX1 ,...,XT (x1 , . . . , xT ) T Y
1 1 = fX1 (x1 ) √ p 2 exp − 2 2π σt t=2
Andrea Beccarini (CQE)
Time Series Analysis
xt σt
2 !
Winter 2013/2014
138 / 143
GARCH models Estimation of GARCH(p,q)-processes
Again, the density of X1 can be neglected Log-Likelihood function ln L(α0 , α1 , β1 |x1 , . . . , xT ) T
T
t=2
t=2
T 1X 1X = − ln 2π − ln σt2 − 2 2 2
xt σt
2
2 2 with σt2 = α0 + α1 xt−1 + β1 σt−1 and σ12 = 0
ML-estimation of α0 , α1 and β1 by numerical maximization
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
139 / 143
GARCH models Estimation of GARCH(p,q)-processes
2 Conditional h-step forecast of the volatility σt+h in a GARCH(1, 1) model α0 2 h 2 E σt+h |Xt , Xt−1 , . . . = (α1 + β1 ) σt − 1 − α1 − β1 α0 + 1 − α1 − β 1
If the process is stationary 2 lim E (σt+h |Xt , Xt−1 , . . .) =
h→∞
α0 1 − α1 − β1
Simulation of GARCH-processes is easy; the estimation can be computer intensive Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
140 / 143
GARCH models Residuals of an estimated GARCH(1,1) model
Careful: Residuals are slightly different from what you know from OLS regressions Estimates: α ˆ0, α ˆ 1 , βˆ1 , µ ˆ 2 + β σ2 From σt2 = α0 + α1 Xt−1 1 t−1 and Xt = µ + σt εt we calculate the standardized residuals
εˆt =
Xt − µ ˆ Xt − µ ˆ =q σ ˆt 2 +β ˆ1 σ 2 α ˆ0 + α ˆ 1 Xt−1 t−1
Histogram of the standardized residuals
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
141 / 143
GARCH models AR(p)-ARCH(q)-models
Definition: (Xt )t∈Z is called AR(p)-ARCH(q)-process if Xt
= µ + φ1 Xt−1 + εt
σt2
= α0 + α1 ε2t−1
where εt ∼ N(0, σt2 ) mean equation / variance equation Maximum likelihood estimation
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
142 / 143
GARCH models Extensions of the GARCH model
There are a number of possible extensions to the GARCH model: Empirical fact: Negative shocks have a larger impact on volatility than positive shocks (leverage effect) News impact curve Nonnormal innovations, e.g. εt ∼ tν
Andrea Beccarini (CQE)
Time Series Analysis
Winter 2013/2014
143 / 143