Time Series Analysis - uni-muenster.de

Time Series Analysis Andrea Beccarini Center for Quantitative Economics

Winter 2013/2014

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

1 / 143

Introduction Objectives

Time series are ubiquitous in economics, and very important in macro economics and financial economics GDP, inflation rates, unemployment, interest rates, stock prices You will learn . . . the formal mathematical treatment of time series and stochastic processes what the most important standard models in economics are how to fit models to real world time series



Winter 2013/2014

2 / 143

Introduction Prerequisites

Descriptive Statistics Probability Theory Statistical Inference



Winter 2013/2014

3 / 143

Introduction Class and material

Class Class teacher: Sarah Meyer Time: Tu., 12:00-14:00 Location: CAWM 3 Start: 22 October 2013 Material Course page on Blackboard Slides and class material are (or will be) downloadable



Winter 2013/2014

4 / 143

Introduction Literature

Neusser, Klaus (2011), Zeitreihenanalyse in den Wirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden. −→ available online in the RUB-Netz Hamilton, James D. (1994), Time Series Analysis, Princeton University Press, Princeton. Pfaff, Bernhard (2006), Analysis of Integrated and Cointegrated Time Series with R, Springer, New York. Schlittgen, Rainer und Streitberg, Bernd (1997), Zeitreihenanalyse, 7. Aufl., Oldenbourg, M¨ unchen.



Winter 2013/2014

5 / 143

Basics Definition

Definition: Time series A sequence of observations ordered by time is called time series Time series can be univariate or multivariate Time can be discrete or continous The states can be discrete or continuous



Winter 2013/2014

6 / 143

Basics Definition

Typical notations x1 , x2 , . . . , xT or x(1), x(2), . . . , x(T ) or xt , t = 1, . . . , T or (xt )t≥0 This course is about . . . univariate time series in discrete time with continuous states



Winter 2013/2014

7 / 143

Basics Examples

Quarterly GDP Germany, 1991 I to 2012 II

600

● ●● ● ● ●

550

● ● ●●

●●

500

● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

450 400 350

GDP (in current billion Euro)

650

●● ●●

● ● ●

●●

●●

●● ● ●

●● ● ● ● ● ● ●

●●

●

● ●

●

● ●

●

● ● ●

1995

2000

2005

2010

Time



Winter 2013/2014

8 / 143

Basics Examples

6000 2000

DAX

DAX index and log(DAX), 31.12.1964 to 6.4.2009

1970

1980

1990

2000

2010

2000

2010

9.0 8.0 7.0 6.0

logarithm of DAX

Time

1970

1980

1990 Time



Winter 2013/2014

9 / 143

Basics Definition

Definition: Stochastic process A sequence (Xt )t∈T of random variables, all defined on the same probability space (Ω, A, P), is called stochastic process with discrete time parameter (usually T = N or T = Z) Short version: A stochastic process is a sequence of random variables A stochastic process depends on both chance and time



Winter 2013/2014

10 / 143

Basics Definition

Distinguish four cases: both time and chance can be fixed or variable

ω fixed

ω variable

t fixed Xt (ω) is a real number Xt (ω) is a random variable

t variable Xt (ω) is a sequence of real numbers (path, realization, trajectory) Xt (ω) is a stochastic process

process.R



Winter 2013/2014

11 / 143

Basics Examples

Example 1: White noise εt ∼ NID 0, σ 2

Example 2: Random walk Xt

= Xt−1 + εt

εt

2

and X0 = 0

∼ NID(0, σ )

Example 3: A random constant Xt Z


= Z ∼ N(0, σ 2 )


Winter 2013/2014

12 / 143

Basics Moment functions

Definition: Moment functions The following functions of time are called moment functions: µ(t) = E (Xt ) (expectation function) σ 2 (t) = Var (Xt ) (variance function) γ(s, t) = Cov (Xs , Xt ) (covariance function) Correlation function (autocorrelation function) γ(s, t) p ρ(s, t) = p 2 σ (s) σ 2 (t) moments.R


[1]


Winter 2013/2014

13 / 143

Basics Estimation of moment functions

Usually, the moment functions are unknown and have to be estimated Problem: Only a single path (realization) can be observed X1 (1) X2 .. .

(1)

X1 (2) X2 .. .

(2)

(1) XT

(2) XT

... ... ... ...

(n)

X1 (n) X2 .. . (n)

XT

Can we still estimate the expectation function µ(t) and the autocovariance function γ(s, t)? Under which conditions?



Winter 2013/2014

14 / 143


X1 (1) X2 .. .

(1)

X1 (2) X2 .. .

(2)

(1) XT

(2) XT

... ... ... ...

(n)

X1 (n) X2 .. . (n)

XT

Usually, the expectation function µ(t) should be estimated by averaging over realizations, n

1 X (i) µ ˆ(t) = Xt n i=1



Winter 2013/2014

15 / 143


X1 (1) X2 .. .

(1)

X1 (2) X2 .. .

(2)

(1) XT

(2) XT

... ... ... ...

(n)

X1 (n) X2 .. . (n)

XT

Under certain conditions, µ(t) can be estimated by averaging over time, T 1 X (1) µ ˆ= Xt T t=1



Winter 2013/2014

15 / 143


X1 (1) X2 .. .

(1)

X1 (2) X2 .. .

(2)

(1) XT

(2) XT

... ... ... ...

(n)

X1 (n) X2 .. . (n)

XT

Usually, the autocovariance γ(t, t + h) should be estimated by averaging over realizations, n

1 X (i) (i) γˆ (t, t + h) = (Xt − µ ˆ(t))(Xt+h − µ ˆ(t + h)) n i=1



Winter 2013/2014

16 / 143


X1 (1) X2 .. .

(1)

X1 (2) X2 .. .

(2)

(1) XT

(2) XT

... ... ... ...

(n)

X1 (n) X2 .. . (n)

XT

Under certain conditions, γ(t, t + h) can be estimated by averaging over time, γˆ (t, t + h) =

T −h 1 X ˆ)(Xt+h (1) − µ ˆ) (Xt (1) − µ T t=1



Winter 2013/2014

16 / 143

Basics Definition

Moment functions cannot be estimated without additional assumptions since only one path is observed There are restrictions which allow to estimate the moment functions Restriction of the time heterogeneity: The distribution of (Xt (ω))t∈T must not be completely different for each t ∈ T Restriction of the memory: If the values of the process are coupled too closely over time, the individual observations do not supply any (or only insufficient) information about the distribution



Winter 2013/2014

17 / 143

Basics Restriction of time heterogeneity: Stationarity

Definition: Strong stationarity Let (Xt )t∈T be a stochastic process, and let t1 , . . . , tn ∈ T be an arbitrary number of n ∈ N arbitrary time points. (Xt )t∈T is called strongly stationary if for arbitrary h ∈ Z P(Xt1 ≤ x1 , . . . , Xtn ≤ xn ) = P(Xt1 +h ≤ x1 , . . . , Xtn +h ≤ xn ) Implication: all univariate marginal distributions are identical



Winter 2013/2014

18 / 143


Definition: Weak stationarity (Xt )t∈T is called weakly stationary if 1

the expectation exists and is constant: E (Xt ) = µ < ∞ for all t ∈ T

2

the variance exists and is constant: Var (Xt ) = σ 2 < ∞ for all t ∈ T

3

for all t, s, r ∈ Z (in admissible range) γ(t, s) = γ (t + r , s + r )

Simplified notation for covariance and correlation functions γ(h) = γ(t, t + h) ρ(h) = ρ(t, t + h)



Winter 2013/2014

19 / 143


Strong stationarity implies weak stationarity (but only if the first two moments exist) A stochastic process is called Gaussian if the joint distribution of Xt1 , . . . , Xtn is multivariate normal For Gaussian processes, weak and strong stationarity coincide Intuition: An observed time series can be regarded as a realization of a stationary process, if a gliding window of appropriate width“ ” always displays qualitatively the same“ picture ” stationary.R Examples


[2]


Winter 2013/2014

20 / 143

Basics Restriction of memory: Ergodicity

Definition: Ergodicity (I) Let (Xt )t∈T be a weakly stationary stochastic process with expectation µ and autocovariance γ(h); define µ ˆ=

T 1 X Xt T t=1

(Xt )t∈T is called (expectation) ergodic, if h i lim E (ˆ µT − µ)2 = 0

T →∞



Winter 2013/2014

21 / 143


Definition: Ergodicity (II) Let (Xt )t∈T be a weakly stationary stochastic process with expectation µ and autocovariance γ(h); define γˆ (h) =

T −h 1 X (Xt − µ)(Xt+h − µ) T t=1

(Xt )t∈T is called (covariance) ergodic, if for all h ∈ Z h i lim E (ˆ γ (h) − γ(h))2 = 0

T →∞



Winter 2013/2014

22 / 143


Ergodicity is consistency (in quadratic mean) of the estimators µ ˆ of µ and γˆ (h) of γ(h) for dependent observations The process (Xt )t∈T is expectation ergodic if (γ(h))h∈Z is absolutely summable, i.e. ∞ X

|γ(h)| < ∞

h=−∞

The dependence between far away observations must be sufficiently small



Winter 2013/2014

23 / 143


Ergodicity condition (for autocovariance): A stationary Gaussian process (Xt )t∈T with absolutely summable autocovariance function γ(h) is (autocovariance) ergodic Under ergodicity, the law of large numbers holds even if the observations are dependent If the dependence γ(h) does not diminish fast enough, the estimators are no longer consistent Examples


[3]


Winter 2013/2014

24 / 143


Summary of estimators

electricity.R

T 1 X ¯ µ ˆ = XT = Xt T t=1

T −h X

γˆ (h) =

1 T

ρˆ(h) =

γˆ (h) γˆ (0)

(Xt − µ ˆ)(Xt+h − µ ˆ)

t=1

Sometimes, γˆ (h) is defined with factor 1/(T − h)



Winter 2013/2014

25 / 143


A closer look at the expectation estimator µ ˆ The estimator µ ˆ is unbiased, i.e. E (ˆ µ) = µ

[4]

The variance of µ ˆ is

[5]

T −1 γ (0) 2 X h Var (ˆ µ) = + 1− γ (h) T T T h=1

Under ergodicity, for T → ∞ T · Var (ˆ µ) → γ (0) + 2

∞ X h=1



γ (h) =

∞ X

γ(h)

h=−∞

Winter 2013/2014

26 / 143


For Gaussian processes, µ ˆ is normally distributed µ ˆ ∼ N (µ, Var (ˆ µ)) and asymptotically √

T (ˆ µ − µ) → Z ∼ N

0, γ (0) + 2

∞ X

! γ (h)

h=1

For non-Gaussian processes, µ ˆ is (often) asymptotically normal ! ∞ X √ T (ˆ µ − µ) → Z ∼ N 0, γ (0) + 2 γ (h) h=1



Winter 2013/2014

27 / 143


A closer look at the autocovariance estimators γˆ (h) For Gaussian processes with absolutely summable covariance function, 0 √ √ T (ˆ γ (0) − γ (0)) , . . . , T (ˆ γ (K ) − γ (K )) is multivariate normal with expectation vector (0, . . . , 0)0 and T · Cov (ˆ γ (h1 ) , γˆ (h2 )) ∞ X = (γ (r ) γ (r + h1 + h2 ) + γ (r − h2 ) γ (r + h1 )) r =−∞



Winter 2013/2014

28 / 143


A closer look at the autocorrelation estimators ρˆ(h) For Gaussian processes with absolutely summable covariance function, the random vector √ 0 √ T (ˆ ρ (0) − ρ (0)) , . . . , T (ˆ ρ (K ) − ρ (K )) is multivariate normal with expectation vector (0, . . . , 0)0 and a complicated covariance matrix Be careful: For small to medium sample sizes the autocovariance and autocorrelation estimators are biased! autocorr.R



Winter 2013/2014

29 / 143


An important special case for autocorrelation estimators: Let (εt ) be a white-noise process with Var (εt ) = σ 2 < ∞, then E (ˆ ρ (h)) = −T −1 + O(T −2 ) −1 −2 ) T + O(T Cov (ˆ ρ (h1 ) , ρˆ (h2 )) = −2 O T

for h1 = h2 else

For white-noise processes and long time series, the empirical autocorrelations are approximately independent normal random variables with expectation −T −1 and variance T −1



Winter 2013/2014

30 / 143

Mathematical digression (I) Complex numbers

Some quadratic equations do not have real solutions, e.g. x2 + 1 = 0 Still it is possible (and sensible) to define solutions to such equations The definition in common notation is √ i = −1 where i is the number which, when squared, equals −1 The number i is called imaginary (i.e. not real)



Winter 2013/2014

31 / 143


Other imaginary numbers follow from this definition, e.g. √ √ √ −16 = 16 −1 = 4i √ √ √ √ −5 = 5 −1 = 5i Further, it is possible to define numbers that contain both a real part and an imaginary part, e.g. 5 − 8i or a + bi Such numbers are called complex and the set of complex numbers is denoted as C The pair a + bi and a − bi is called conjugate complex



Winter 2013/2014

32 / 143


imaginary axis

seq(0, 8, length = 11)

Geometric interpretation:

●

a+bi

er

alu ev lut

so

ab

θ

imaginary part b

real part a

real axis



Winter 2013/2014

33 / 143


Polar coordinates and Cartesian coordinates z

= a + bi = r · (cos θ + i sin θ) = re iθ

a = r cos θ b = r sin θ p a2 + b 2 r = b θ = arctan a



Winter 2013/2014

34 / 143


Rules of calculus: Addition (a + bi) + (c + di) = (a + c) + (b + d)i Multiplication (cartesian coordinates) (a + bi) · (c + di) = (ac − bd) + (ad + bc)i Multiplication (polar coordinates) r1 e iθ1 · r2 e iθ2 = r1 r2 e i(θ1 +θ2 )



Winter 2013/2014

35 / 143


imaginary axis

seq(−2, 8, length = 11)

Addition:

●

a+bi

c+di ●

real axis



Winter 2013/2014

36 / 143


Addition:

imaginary axis

seq(−2, 8, length = 11)

●

●

a+bi

c+di ●

real axis



Winter 2013/2014

36 / 143


Addition: (a+c)+(b+d)i

imaginary axis

seq(−2, 8, length = 11)

●

●

a+bi

c+di ●

real axis



Winter 2013/2014

36 / 143


imaginary axis

seq(−2, 8, length = 11)

Multiplication:

●

θ2

r2

r1

●

θ1

real axis



Winter 2013/2014

37 / 143


Multiplication:

imaginary axis

seq(−2, 8, length = 11)

●

r=

r1

⋅r

2

●

θ = θ1 + θ2 θ2

r2

r1

●

θ1

real axis



Winter 2013/2014

37 / 143


The quadratic equation x 2 + px + q = 0 has the solutions p x =− ± 2 If

p2 4

r

p2 −q 4

− q < 0 the solutions are complex (and conjugate)



Winter 2013/2014

38 / 143


Example: The solutions of x 2 − 2x + 5 = 0 are (−2) + x =− 2

r

(−2)2 − 5 = 1 + 2i 4

(−2) x =− − 2

r

(−2)2 − 5 = 1 − 2i 4

and



Winter 2013/2014

39 / 143

Mathematical digression (II) Linear difference equations

First order difference equation with initial value x0 : xt = c + φ1 xt−1 p-th order difference equation with initial value x0 : xt = c + φ1 xt−1 + . . . + φp xt−p A sequence (xt )t=0,1,... that satisfies the difference equation is called a solution of the difference equation Examples (diffequation.R)



Winter 2013/2014

40 / 143


We only consider the homogeneous case, i.e. c = 0 The general solution of the first-order difference equation xt = φ1 xt−1 is xt = A · φt1 with arbitrary constant A since xt = Aφt1 = φ1 Aφt−1 = φ1 xt−1 1 The constant is definitized by the initial condition, A = x0 The sequence xt = Aφt1 is convergent if and only if |φ1 | < 1



Winter 2013/2014

41 / 143


Solution of the p-th order difference equation xt = φ1 xt−1 + . . . + φp xt−p Let xt = Az −t , then Az −t z −t

= φ1 Az −(t−1) + . . . + φp Az −(t−p) = φ1 z −(t−1) + . . . + φp z −(t−p)

and thus 1 − φ1 z 1 − . . . − φp z p = 0 Characteristic polynomial, characteristic equation



Winter 2013/2014

42 / 143


There are p (possibly complex, possibly nondistinct) solutions of the characteristic equation Denote the solutions (called roots) by z1 , . . . , zp If all roots are real and distinct, then xt = A1 z1−t + . . . + Ap zp−t is a solution of the homogeneous difference equation If there are complex roots the solution is oscillating The constants A1 , . . . , Ap can be definitized with p initial conditions (x0 , x−1 , . . . , xp−1 )



Winter 2013/2014

43 / 143


Stability condition: The linear difference equation xt = φ1 xt−1 + . . . + φp xt−p is stable (i.e. convergent) if and only if all roots of the characteristic polynomial 1 − φ1 z − . . . − φp z p = 0 are outside the unit circle, i.e. |zi | > 1 for all i = 1, . . . , p In R, the stability condition can be checked easily using the commands polyroot (base package) or ArmaRoots (fArma package)



Winter 2013/2014

44 / 143

ARMA models Definition

Definition: ARMA process Let (εt )t∈T be a white noise process; the stochastic process Xt = φ1 Xt−1 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q with φp , θq 6= 0 is called ARMA(p, q) process AutoRegressive Moving Average process ARMA processes are important since every stationary process can be approximated by an ARMA process



Winter 2013/2014

45 / 143

ARMA models Lag operator and lag polynomial

The lag operator is a convenient notational tool The lag operator L shifts the time index of a stochastic process L (Xt )t∈T = (Xt−1 )t∈T LXt

= Xt−1

Rules L2 Xt n

L Xt

= Xt−n

−1

L

= Xt+1

0

= Xt

L Xt


= L (LXt ) = Xt−2


Winter 2013/2014

46 / 143


Lag polynomial A(L) = a0 + a1 L + a2 L2 + . . . + ap Lp Example: Let A(L) = 1 − 0.5L and B(L) = 1 + 4L2 , then C (L) = A(L)B(L) = (1 − 0.5L) 1 + 4L2

= 1 − 0.5L + 4L2 − 2L3 Lag polynomials can be treated in the same way as ordinary polynomials



Winter 2013/2014

47 / 143


Define the lag polynomials Φ(L) = 1 − φ1 L − . . . − φp Lp Θ(L) = 1 + θ1 L + . . . + θq Lq The ARMA(p, q) process can be written compactly as Φ(L)Xt = Θ(L)εt Important special cases MA(q) process :

Xt = εt + θ1 εt−1 + . . . + θq εt−q

AR(1) process :

Xt = φ1 Xt−1 + εt

AR(p) process :

Xt = φ1 Xt−1 + · · · + φp Xt−p + εt



Winter 2013/2014

48 / 143

ARMA models MA(q) process

The MA(q) process is Xt

= Θ(L)εt

Xt

= εt + θ1 εt−1 + . . . + θq εt−q

with εt ∼ NID(0, σε2 ) Expectation function E (Xt ) = E (εt + θ1 εt−1 + . . . + θq εt−q ) = E (εt ) + θ1 E (εt−1 ) + . . . + θq E (εt−q ) = 0



Winter 2013/2014

49 / 143


Autocovariance function γ (s, t) = E (εs + θ1 εs−1 + . . . + θq εs−q ) (εt + θ1 εt−1 + . . . + θq εt−q ) = E εs εt + θ1 εs εt−1 + θ2 εs εt−2 + . . . + θq εs εt−q +θ1 εs−1 εt + θ12 εs−1 εt−1 + θ1 θ2 εs−1 εt−2 + . . . + θ1 θq εs−1 εt−q +... +θq εs−q εt + θ1 θq εs−q εt−1 + θ2 θq εs−q εt−2 + . . . + θq2 εs−q εt−q

The expectations of the cross products are 0 for s 6= t E (εs εt ) = 2 σε for s = t Andrea Beccarini (CQE)


Winter 2013/2014

50 / 143


Define θ0 = 1, then γ (t, t) = σε2 γ (t − 1, t) =

Xq

θ2 i=0 i Xq−1 σε2 θi θi+1 i=0

γ (t − 2, t) = σε2

Xq−2 i=0

θi θi+2

γ (t − q, t) = σε2 θ0 θq = σε2 θq γ (s, t) = 0 for s < t − q Hence, MA(q) processes are always stationary Simulation of MA(q) processes (maqsim.R)



Winter 2013/2014

51 / 143

ARMA models AR(1) process

The AR(1) process is Φ(L)Xt

= εt

(1 − φ1 L)Xt

= εt

Xt

= φ1 Xt−1 + εt

with εt ∼ NID(0, σε2 ) Expectation and variance function

[6]

Stability condition: AR(1) processes are stable if |φ1 | < 1



Winter 2013/2014

52 / 143

ARMA models AR(1) process

Stationarity: Stable AR(1) processes are weakly stationary if

[7]

E (X0 ) = 0 Var (X0 ) =

σε2 1 − φ21

Nonstationary stable processes converge towards stationarity

[8]

It is common parlance to call stable processes stationary Covariance function of stationary AR(1) process



[9]

Winter 2013/2014

53 / 143

ARMA models AR(p) process

The AR(p) process is Φ(L)Xt Xt

= εt = φ1 Xt−1 + . . . + φp Xt−p + εt

with εt ∼ NID(0, σε2 ) Assumption: εt is independent from Xt−1 , Xt−2 , . . . (innovations) Expectation function

[10]

The covariance function is complicated (ar2autocov.R)



Winter 2013/2014

54 / 143

ARMA models AR(p) process

AR(p) processes are stable if all roots of the characteristic equation Φ(z) = 0 are larger than 1 in absolute value, |zi | > 1 for i = 1, . . . , p An AR(p) process is weakly stationary if the joint distribution of the p initial values (X0 , X−1 , . . . , X−(p−1) ) is appropriate“ ” Stable AR(p) processes converge towards stationarity; they are often called stationary Simulation of AR(p) processes (arpsim.R)



Winter 2013/2014

55 / 143

ARMA models Invertability

AR and MA processes can be inverted (into each other) Example: Consider the stable AR (1) process with |φ1 | < 1 Xt

= φ1 Xt−1 + εt = φ1 (φ1 Xt−2 + εt−1 ) + εt = φ21 Xt−2 + φ1 εt−1 + εt .. . = φn1 Xt−n + φ1n−1 εt−(n−1) + . . . + φ21 εt−2 + φ1 εt−1 + εt



Winter 2013/2014

56 / 143


Since |φ1 | < 1 Xt

=

∞ X

φi1 εt−i

i=0

= εt + θ1 εt−1 + θ2 εt−2 + . . . with θi = φi1 A stable AR(1) process can be written as an MA(∞) process (the same is true for stable AR(p) processes)



Winter 2013/2014

57 / 143


Using lag polynomials this can be written as (1 − φ1 L)Xt Xt Xt

= εt = (1 − φ1 L)−1 εt ∞ X = (φ1 L)i εt i=0

General compact and elegant notation Φ(L)Xt Xt

= εt = (Φ(L))−1 εt = Θ(L)εt



Winter 2013/2014

58 / 143


MA(q) can be written as AR(∞) if all roots of Θ(z) = 0 are larger than 1 in absolute value (invertability condition) Example: MA(1) with |θ1 | < 1; from Xt

= εt + θ1 εt−1

θ1 Xt−1 = θ1 εt−1 + θ12 εt−2 we find Xt = θ1 Xt−1 + εt − θ12 εt−2 Repeated substitution of the εt−i terms yields Xt =

∞ X

φi Xt−i + εt

with φi = (−1)i+1 θ1i

i=1



Winter 2013/2014

59 / 143


Summary ARMA(p, q) processes are stable if all roots of Φ(z) = 0 are larger than 1 in absolute value ARMA(p, q) processes are invertible if all roots of Θ(z) = 0 are larger than 1 in absolute value



Winter 2013/2014

60 / 143


Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q) process either as AR(∞) or as MA(∞) ARMA(p, q) can be written as AR(∞) or MA(∞) Φ(L)Xt Xt (Θ(L))−1 Φ(L)Xt


= Θ(L)εt = (Φ(L))−1 Θ(L)εt = εt


Winter 2013/2014

61 / 143

ARMA models Deterministic components

Until now we only considered processes with zero expectation Many processes have both a zero-expectation stochastic component (Yt ) and a non-zero deterministic component (Dt ) Examples: linear trend Dt = a + bt exponential trend Dt = ab t saisonal patterns

Let (Xt )t∈Z be a stochastic process with deterministic component Dt and define Yt = Xt − Dt



Winter 2013/2014

62 / 143


Then E (Yt ) = 0 and Cov (Yt , Ys ) = E [(Yt − E (Yt )) (Ys − E (Ys ))] = E [(Xt − Dt − E (Xt −Dt ))(Xs − Ds − E (Xs −Ds ))] = E [(Xt − E (Xt )) (Xs − E (Xs ))] = Cov (Xt , Xs ) The covariance function does not depend on the deterministic component To derive the covariance function of a stochastic process, simply drop the deterministic component



Winter 2013/2014

63 / 143


Special case: Dt = µt = µ ARMA(p, q) process with constant (non-zero) expectation Xt − µ = φ1 (Xt−1 − µ) + . . . + φp (Xt−p − µ) +εt + θ1 εt−1 + . . . + θq εt−q The process can also be written as Xt = c + φ1 Xt−1 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q where c = µ (1 − φ1 − . . . − φp )



Winter 2013/2014

64 / 143


Wold’s representation theorem: Every stationary stochastic process (Xt )t∈T can be represented as Xt =

∞ X

ψh εt−h + Dt

h=0

with ψ0 = 1,

P∞

2 h=0 ψj

< ∞ and εt white noise with variance σ 2 > 0

Stationary stochastic processes can be written as a sum of a deterministic process and an MA(∞) process Often, low order ARMA(p, q) processes can approximate MA(∞) processes well



Winter 2013/2014

65 / 143

ARMA models Linear processes and filter

Definition: Linear process Let (εt )t∈Z be a white noise process; a stochastic process (Xt )t∈Z is called linear if it can be written as Xt

=

∞ X

ψh εt−h

h=−∞

= Ψ(L)εt where the coefficients are absolutely summable, i.e.

P∞

h=−∞ |ψh |

< ∞.

The lag polynomial Ψ(L) is called (linear) filter



Winter 2013/2014

66 / 143


Some special filters Change from previous period (difference filter) Ψ(L) = 1 − L Change from last year (for quarterly or monthly data) Ψ(L) = 1 − L4 Ψ(L) = 1 − L12 Elimination of saisonal influences (quarterly data) Ψ(L) = 1 + L + L2 + L3 /4 Ψ(L) = 0.125L2 + 0.25L + 0.25 + 0.25L−1 + 0.125L−2 Andrea Beccarini (CQE)


Winter 2013/2014

67 / 143


Hodrick-Prescott filter (important tool in empirical macro economics) Decompose a time series (Xt ) into a long-term growth component (Gt ) and a short-term cyclical component (Ct ) Xt = Gt + Ct Trade-off between goodness-of-fit and smoothness of Gt Minimize the criterion function T X

(Xt − Gt )2 + λ

t=1

T −1 X

[(Gt+1 − Gt ) − (Gt − Gt−1 )]2

t=2

with respect to Gt for given smoothness parameter λ



Winter 2013/2014

68 / 143


The FOCs of the minimization problem are     G1 X1  .   ..   .  = A  ..  GT XT where A = (I + λK 0 K )−1 with  1 −2 1 0 0  0 1 −2 1 0   1 −2 1 K = 0 0  .. .. .. .. ..  . . . . . 0 0 0 0 0



... 0 ... 0 ... 0 . . . . ..

0 0 0 .. .

0 0 0 .. .

      

. . . 1 −2 1

Winter 2013/2014

69 / 143


The HP filter is a linear filter Typical values for smoothing parameter λ λ = 10 λ = 1600 λ = 14400

annual data quarterly data monthly data

Implementation in R (code by Olaf Posch) Empirical examples (hpfilter.R)



Winter 2013/2014

70 / 143

Estimation of ARMA models The estimation problem

Problem: The parameters φ1 , . . . , φp , θ1 , . . . , θq , σε2 of an ARMA(p, q) process are usually unknown They have to be estimated from an observed time series X1 , . . . , XT Standard estimation methods: Least squares (OLS) Maximum likelihood (ML)

Assumption: the lag orders p and q are known



Winter 2013/2014

71 / 143

Estimation of ARMA models Least squares estimation of AR(p) models

The AR(p) model with non-zero constant expectation Xt = c + φ1 Xt−1 + . . . + φp Xt−p + εt can be writte in matrix notation    Xp+1 1 Xp Xp−1  Xp+2   1 Xp+1 Xp     ..  =  .. .. ..  .   . . . XT

... ... .. .

X1 X2 .. .

1 XT −1 XT −2 . . . XT −p

    

c φ1 .. .





    +  

εp+1 εp+2 .. .

φp

    

εT

Compact notation: y = Xβ + u



Winter 2013/2014

72 / 143

Estimation of ARMA models Least squares estimation of AR(p) models

The standard least squares estimator is −1 0 βˆ = X0 X Xy The matrix of exogenous variables X is stochastic −→ usual results for OLS regression do not hold But: There is no contemporaneous correlation between the error term and the exogenous variables Hence, the OLS estimators are consistent and asymptotically efficient



Winter 2013/2014

73 / 143

Estimation of ARMA models Least squares estimation of ARMA models

Solve the ARMA equation Xt = c + φ1 Xt−1 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q for εt , εt = Xt − c − φ1 Xt−1 − . . . − φp Xt−p − θ1 εt−1 − . . . − θq εt−q Define the residuals as functions of the unknown parameters εˆt (d, f1 , . . . , fp , g1 , . . . , gq ) = Xt − d − f1 Xt−1 − . . . − fp Xt−p −g1 εˆt−1 − . . . − gq εˆt−q



Winter 2013/2014

74 / 143


Define the sum of squared residuals S (d, f1 , . . . , fp , g1 , . . . , gq ) =

T X

(ˆ εt (d, f1 , . . . , fp , g1 , . . . , gq ))2

t=1

The least squares estimators are (ˆ c , φˆ1 , . . . , φˆp , θˆ1 , . . . , θˆq ) = arg min S (d, f1 , . . . , fp , g1 , . . . , gq ) Since the residuals are defined recursively one needs starting values εˆ0 , . . . , εˆ−q+1 and X0 , . . . , X−p+1 to calculate εˆ1 Easiest way: Set all starting values to zero ( conditional estimation“) ”



Winter 2013/2014

75 / 143


The first order conditions are a nonlinear equation system which cannot be solved easily Minimization by standard numerical methods (implemented in all usual statistical packages) Either solve the nonlinear first order conditions equation system or minimize S Simple special case: ARMA(1, 1) arma11.R



Winter 2013/2014

76 / 143

Estimation of ARMA models Maximum likelihood estimation

Additional assumption: The innovations εt are normally distributed Implication: ARMA processes are Gaussian The joint distribution of X1 , . . . , XT is multivariat normal   X1   X =  ...  ∼ N (µ, Σ) XT



Winter 2013/2014

77 / 143


Expectation vector 

   X1 c/ (1 − φ1 − . . . − φp )     .. µ = E  ...  =   . XT c/ (1 − φ1 − . . . − φp )

Covariance matrix    X1  X2      Σ = Cov  .  =   ..   XT


. . . γ(T − 1) . . . γ (T − 2) .. .. . . γ(T − 1) γ (T − 2) . . . γ(0) γ(0) γ(1) .. .


γ(1) γ(0) .. .

Winter 2013/2014

    

78 / 143


The expectation vector and the covariance matrix contain all 2 unknown parameters ψ = φ1 , . . . , φp , θ1 , . . . , θq , c, σε The likelihood function is −T /2

L (ψ; X) = (2π)

−1/2

(det Σ)

1 0 −1 exp − (X − µ) Σ (X − µ) 2

and the loglikelihood function is ln L (ψ; X) = −

T 1 1 ln (2π) − ln (det Σ) − (X − µ)0 Σ−1 (X − µ) 2 2 2

The ML estimators are ψˆ = arg max ln L (ψ; X)



Winter 2013/2014

79 / 143


The loglikelihood function has to be maximized by numerical methods Standard properties of ML estimators: 1 2 3 4

consistency asymptotic efficiency asymptotically jointly normally distributed the covariance matrix of the estimators can be consistently estimated

Example: ML estimation of an ARMA(3, 3) model for the interest rate spread (arma33.R)



Winter 2013/2014

80 / 143

Estimation of ARMA models Hypothesis tests

Since the estimation method is maximum likelihood, the classical tests (Wald, LR, LM) are applicable General null and alternative hypotheses H0 : g (ψ) = 0 H1 : not H0 where g (ψ) is an m-valued function of the parameters Example: If H0 : φ1 = 0 then m = 1 and g (ψ) = φ1



Winter 2013/2014

81 / 143


Likelihood ratio test statistic LR = 2(ln L(θˆML ) − ln L(θˆR )) where θˆML and θˆR are the unrestricted and restricted estimators Under the null hypothesis d

LR −→ U ∼ χ2m and H0 is rejected at significance level α if LR > χ2m;1−α Disadvantage: Two models must be estimated



Winter 2013/2014

82 / 143


For the Wald test we only consider g (ψ) = ψ − ψ0 , i.e. H0 : ψ = ψ0 H1 : not H0 Test statistic d (ψ)( ˆ ψˆ − ψ0 ) W = (ψˆ − ψ0 )0 Cov d

If the null hypothesis is true then W −→ U ∼ χ2m The asymptotic covariance matrix can be estimated consistently as d (ψ) ˆ = H −1 where H is the Hessian matrix returned by the Cov maximization procedure



Winter 2013/2014

83 / 143


Test example 1: H0 : φ 1 = 0 H1 : φ1 6= 0 Test example 2 H0 : ψ = ψ0 H1 : not H0 Illustration (arma33.R)



Winter 2013/2014

84 / 143

Estimation of ARMA models Model selection

Usually, the lag orders p and q of an ARMA model are unknown Trade-off: Goodness-of-fit against parsimony Akaike’s information criterion for the model with non-zero expectation AIC =

ln σ ˆ2 |{z}

goodness-of-fit

+ 2 (p + q + 1) /T | {z } penalty

Choose the model with the smallest AIC



Winter 2013/2014

85 / 143


Bayesian information criterion BIC (Schwarz information criterion) BIC = ln σ ˆ 2 + (p + q + 1) · ln T /T Hannan-Quinn information criterion HQ = ln σ ˆ 2 + 2 (p + q + 1) · ln (ln T ) /T Both BIC and HQ are consistent while the AIC tends to overfit Illustration (arma33.R)



Winter 2013/2014

86 / 143


Another illustration: The true model is ARMA(2, 1) with Xt = 0.5Xt−1 + 0.3Xt−2 + εt + 0.7εt−1 ; 1000 samples of size n = 500 were generated; the table shows the model’s orders p and q as selected by AIC and BIC p 0 1 2 3 4 5

0 0 0 0 0 9 11

# orders selected by q 1 2 3 0 0 0 18 64 23 171 21 16 7 35 58 2 12 139 6 12 56


AIC 4 0 14 5 80 37 46

5 0 6 7 45 44 56

0 0 0 0 1 6 1


# orders selected by q 1 2 3 0 0 0 310 167 4 503 3 1 0 2 1 1 0 0 0 0 0

BIC 4 0 0 0 0 0 0

Winter 2013/2014

5 0 0 0 0 0 0

87 / 143

Integrated processes Difference operator

Define the difference operator ∆ = 1 − L, then ∆Xt = Xt − Xt−1 Second order differences ∆2 = ∆(∆) = (1 − L)2 = 1 − 2L + L2 Higher orders ∆n are defined in the same way; note that ∆n 6= 1 − Ln



Winter 2013/2014

88 / 143

Integrated processes Definition

Definition: Integrated process A stochastic process is called integrated of order 1 if ∆Xt = µ + Ψ(L)εt P where εt is white noise, Ψ(1) 6= 0, and ∞ j=0 j|ψj | < ∞ Common notation: Xt ∼ I (1) I (1) processes are also called difference stationary or unit root processes Stochastic and deterministic trends Trend stationary processes are not I (1) (since Ψ(1) = 0)



Winter 2013/2014

89 / 143


Stationary processes are sometimes called I (0) Higher order integrations are possible, e.g. Xt

∼ I (2)

∆ Xt

∼ I (0)

2

In general, Xt ∼ I (d) means that ∆d Xt ∼ I (0) Most economic time series are either I (0) or I (1) Some economic time series may be I (2)



Winter 2013/2014

90 / 143


Example 1: The random walk with drift, Xt = b + Xt−1 + εt , is I (1) because ∆Xt

= Xt − Xt−1 = b + εt = b + Ψ(L)εt

where ψ0 = 1 and ψj = 0 for j 6= 0



Winter 2013/2014

91 / 143


Example 2: The trend stationary process, Xt = a + bt + εt , is not I (1) because ∆Xt

= b + εt − εt−1 = Ψ(L)εt

with ψ0 = 1, ψ1 = −1 and ψj = 0 for all other j



Winter 2013/2014

92 / 143


Example 3: The AR(2) process“ ” Xt (1 − φL) (1 − L) Xt

= b + (1 + φ) Xt−1 − φXt−2 + εt = b + εt

is I (1) if |φ| < 1 because ∆Xt = Ψ(L) (b + εt ) with Ψ(L) = (1 − φL)−1 = 1 + φL + φ2 L2 + φ3 L3 + φ4 L4 + . . . P 1 i and thus Ψ(1) = ∞ i=0 φ = 1−φ 6= 0. The roots of the characteristic equation are z = 1 and z = 1/φ



Winter 2013/2014

93 / 143


Example 4: The process Xt = 0.5Xt−1 − 0.4Xt−2 + εt is a stationary (stable) zero expectation AR(2) process; the process Yt = a + bt + Xt is trend stationary and I (0) since ∆Yt = b + ∆Xt with ∆Xt = Ψ(L)εt = (1 − L) 1 − 0.5L + 0.4L2 and therefore Ψ(1) = 0 (i0andi1.R)



−1

εt

Winter 2013/2014

94 / 143


Definition: ARIMA process Let (εt )t∈T be a white noise process; the stochastic process (Xt )t∈Z is called integrated autoregressive moving-average process of the orders p, d and q, or ARIMA(p, d, q), if ∆d Xt is an ARMA(p, q) process Φ(L)∆d Xt = Θ(L)εt For d > 0 the process is nonstationary (I (d)) even if all roots of Φ(z) = 0 are outside the unit circle Simulation of an ARIMA(p, d, q) process (arimapdqsim.R)



Winter 2013/2014

95 / 143

Integrated processes Deterministic versus stochastic trends

Why is it important to distinguish deterministic and stochastic trends? Reason 1: Long-term forecasts and forecasting errors Deterministic trend: The forecasting error variance is bounded Stochastic trend: The forecasting error variance is unbounded Illustrations i0andi1.R



Winter 2013/2014

96 / 143

Integrated processes Deterministic versus stochastic trends

Why is it important to distinguish deterministic and stochastic trends? Reason 2: Spurious regression OLS regressions will show spurious relationships between time series with (deterministic or stochastic) trends Detrending works if the series have deterministic trends, but it does not help if the series are integrated Illustrations spurious1.R



Winter 2013/2014

97 / 143

Integrated processes Integrated processes and parameter estimation

OLS estimators (and ML estimators) are consistent and asymptotically normal for stationary processes The asymptotic normality is lost if the processes are integrated We only look at the very special case Xt = φ1 Xt−1 + εt with εt ∼ NID(0, 1) and X0 = 0 The AR(1) process is stationary if |φ1 | < 1 and has a unit root if |φ1 | = 1



Winter 2013/2014

98 / 143


The usual OLS estimator of φ1 is PT t=1 Xt Xt−1 φˆ1 = P T 2 t=1 Xt−1 How does the distribution of φˆ look like? Influence of φ and T Consistency? Asymptotic normality? Illustration (phihat.R)



Winter 2013/2014

99 / 143


Consistency and asymptotic normality for I (0) processes (|φ1 | < 1) plim φˆ1 = φ1 √ d T φˆ1 − φ1 → Z ∼ N 0, 1 − φ21 Consistency and asymptotic normality for I (1) processes (φ1 = 1) plim φˆ1 = 1 d T φˆ1 − 1 → V where V is a nondegenerate, nonnormal random variable Root-T -consistency and superconsistency Andrea Beccarini (CQE)


Winter 2013/2014

100 / 143

Integrated processes Unit root tests

Importance to distinguish between trend stationarity and difference stationarity Test of hypothesis that a process has a unit root (i.e. is I (1)) Classical approaches: (Augmented) Dickey-Fuller-Test, Phillips-Perron-Test Basic tool: Linear regression Xt ∆Xt

= deterministics + φXt−1 + εt = deterministics + (φ − 1) Xt−1 + εt | {z } =:β



Winter 2013/2014

101 / 143

Integrated processes Unit root tests

Null and alternative hypothesis H0 : φ = 1

(unit root)

H1 : |φ| < 1

(no unit root)

H0 : β = 0

(unit root)

H1 : β < 0

(no unit root)

or, equivalently,

Unit root tests are one-sided; explosive process are ruled out Rejecting the null hypothesis is evidence in favour of stationarity If the null hypothesis is not rejected, there could be a unit root Andrea Beccarini (CQE)


Winter 2013/2014

102 / 143

Integrated processes DF test and ADF test

Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests Possible regressions Xt = φXt−1 + εt Xt = a + φXt−1 + εt Xt = a + bt + φXt−1 + εt

or ∆Xt = βXt−1 + εt or ∆Xt = a + βXt−1 + εt or ∆Xt = a + bt + βXt−1 + εt

Assumption for Dickey-Fuller test: no autocorrelation in εt If there is autocorrelation in εt , use the augmented DF test



Winter 2013/2014

103 / 143


Dickey-Fuller regression, case 1: no constant, no trend ∆Xt = βXt−1 + εt Null and alternative hypotheses H0 : β = 0 H1 : β < 0 Null hypothesis: stochastic trend without drift Alternative hypothesis: stationary process around zero



Winter 2013/2014

104 / 143


Dickey-Fuller regression, case 2: constant, no trend ∆Xt = a + βXt−1 + εt Null and alternative hypotheses H0 : β = 0

or H0 : β = 0, a = 0

H1 : β < 0

or

H0 : β < 0, a 6= 0

Null hypothesis: stochastic trend without drift Alternative hypothesis: stationary process around a constant



Winter 2013/2014

105 / 143


Dickey-Fuller regression, case 3: constant and trend ∆Xt = a + bt + βXt−1 + εt Null and alternative hypotheses H0 : β = 0

or β = 0, b = 0

H1 : β < 0

or

β < 0, b 6= 0

Null hypothesis: stochastic trend with drift Alternative hypothesis: trend stationary process



Winter 2013/2014

106 / 143


Dickey-Fuller test statistics for single hypotheses “ρ-test” : “τ -test” :

T · βˆ ˆ σˆ β/ˆ φ

The τ -test statistic is computed in the same way as the usual t-test statistic Reject the null hypothesis if the test statistics are too small The critical values are not the quantiles of the t-distribution There are tables with the correct critical values (e.g. Hamilton, table B.6)



Winter 2013/2014

107 / 143


The Dickey-Fuller test statistics for the joint hypotheses are computed in the same way as the usual F -test statistics Reject the null hypothesis if the test statistic is too large The critical values are not the quantiles of the F -distribution There are tables with the correct critical values (e.g. Hamilton, table B.7) Illustrations (dftest.R)



Winter 2013/2014

108 / 143


If there is autocorrelation in εt the DF test does not work (dftest.R) Augmented Dickey-Fuller test (ADF test) regressions: ∆Xt = γ1 ∆Xt−1 + . . . + γp ∆Xt−p + βXt−1 + εt ∆Xt = a + γ1 ∆Xt−1 + . . . + γp ∆Xt−p + βXt−1 + εt ∆Xt = a + bt + γ1 ∆Xt−1 + . . . + γp ∆Xt−p + βXt−1 + εt The added lagged differences capture the autocorrelation The number of lags p must be large enough to make εt white noise The critical values remain the same as in the no-correlation case



Winter 2013/2014

109 / 143


Further interesting topics (but we skip these) Phillips-Perron test Structural breaks and unit roots KPSS test of stationarity H0 : Xt ∼ I (0) H1 : Xt ∼ I (1)



Winter 2013/2014

110 / 143

Integrated processes Regression with integrated processes

Spurious regression: If Xt and Yt are independent but both I (1) then the regression Yt = α + βXt + ut will result in an estimated coefficient βˆ that is significantly different from 0 with probability 1 as T → ∞ BUT: The regression Yt = α + βXt + ut may be sensible even though Xt and Yt are I (1) Cointegration



Winter 2013/2014

111 / 143

Integrated processes Regression with integrated processes

Definition: Cointegration Two stochastic processes (Xt )t∈T and (Yt )t∈T are cointegrated if both processes are I (1) and there is a constant β such that the process (Yt − βXt ) is I (0) If β is known, cointegration can be tested using a standard unit root test on the process (Yt − βXt ) If β is unknown, it can be estimated from the linear regression Yt = α + βXt + ut and cointegration is tested using a modified unit root test on the residual process (uˆt )t=1,...,T Andrea Beccarini (CQE)


Winter 2013/2014

112 / 143

GARCH models Conditional expectation

Let (X , Y ) be a bivariate random variable with a joint density function, then Z ∞ E (X |Y = y ) = x fX |Y =y (x)dx −∞

is the conditional expectation of X given Y = y E (X |Y ) denotes a random variable with realization E (X |Y = y ) if the random variable Y realizes as y Both E (X |Y ) and E (X |Y = y ) are called conditional expectation



Winter 2013/2014

113 / 143

GARCH models Conditional variance

Let (X , Y ) be a bivariate random variable with a joint density function, then Z ∞ Var (X |Y = y ) = (x − E (X |Y = y ))2 fX |Y =y (x)dx −∞

is the conditional variance of X given Y = y Var (X |Y ) denotes a random variable with realization Var (X |Y = y ) if the random variable Y realizes as y Both Var (X |Y = y ) and Var (X |Y ) are called conditional variance



Winter 2013/2014

114 / 143

GARCH models Rules for conditional expectations

1

Law of iterated expectations: E (E (X |Y )) = E (X )

2

If X and Y are independent, then E (X |Y ) = E (X )

3

The condition can be treated like a constant, E (XY |Y ) = Y · E (X |Y )

4

The conditional expecation is a linear operator. For a1 , . . . , an ∈ R ! n n X X E ai Xi |Y = ai E (Xi |Y ) i=1


i=1


Winter 2013/2014

115 / 143

GARCH models Basics

Some economic time series show volatility clusters, e.g. stock returns, commodity price changes, inflation rates, . . . Simple autoregressive models cannot capture volatility clusters since their conditional variance is constant Example: Stationary AR(1)-process, Xt = αXt−1 + εt with |α| < 1; then σε2 Var (Xt ) = σX2 = , 1 − α2 and the conditional variance is Var (Xt |Xt−1 ) = σε2



Winter 2013/2014

116 / 143

GARCH models Basics

In the following, we will focus on stock returns Empirical fact: squared (or absolute) returns are positively autocorrelated Implication: Returns are not independent over time The dependence is nonlinear How can we model this kind of dependence?



Winter 2013/2014

117 / 143

GARCH models ARCH(1)-process

Definition: ARCH(1)-process The stochastic process (Xt )t∈Z is called ARCH(1)-process if E (Xt |Xt−1 ) = 0 Var (Xt |Xt−1 ) = σt2 2 = α0 + α1 Xt−1

for all t ∈ Z, with α0 , α1 > 0 Often, an additional assumption is 2 Xt | (Xt−1 = xt−1 ) ∼ N(0, α0 + α1 xt−1 )



Winter 2013/2014

118 / 143


The unconditional distribution of Xt is a non-normal distribution Leptokurtosis: The tails are heavier than the tails of the normal distribution Example of an ARCH(1)-process Xt = εt σt where (εt )t∈Z is white noise with σε2 = 1 and q 2 σt = α0 + α1 Xt−1



Winter 2013/2014

119 / 143


One can show that

[11]

E (Xt |Xt−1 ) = 0 E (Xt ) = 0 2 Var (Xt |Xt−1 ) = α0 + α1 Xt−1

Var (Xt ) = α0 / (1 − α1 ) Cov (Xt , Xt−i ) = 0

for i > 0

Stationarity condition: 0 < α1 < 1 The unconditional kurtosis is 3(1 − α12 )/(1 − 3α12 ) if εt ∼ N(0, 1). p If α1 > 1/3 = 0.57735, the kurtosis does not exist.



Winter 2013/2014

[12]

120 / 143


Squared returns follow

[13]

2 Xt2 = α0 + α1 Xt−1 + vt

with vt = σt2 (ε2t − 1) Thus, squared returns of ARCH(1) are AR(1) The process (vt )t∈Z is white noise E (vt ) = 0 Var (vt ) = E (vt2 ) = const. Cov (vt , vt−i ) = 0



(i = 1, 2, . . .)

Winter 2013/2014

121 / 143


Simulation of an ARCH(1)-process for t = 1, . . . , 2500 Parameters: α0 = 0.05, α1 = 0.95, start value X0 = 0 Conditional distribution: εt ∼ N(0, 1) archsim.R Check whether the simulated time series shows the typical stylized facts of return distributions



Winter 2013/2014

122 / 143

GARCH models Estimation of an ARCH(1)-process

Of course, we do not know the true values of the model parameters α0 and α1 How can we estimate the unknown parameters α0 and α1 ? Observations X1 , . . . , XT Because of 2 Xt2 = α0 + α1 Xt−1 + vt

a possible estimation method is OLS



Winter 2013/2014

123 / 143


OLS estimator of α1 P α ˆ1 =

2 − X2 Xt2 − Xt2 Xt−1 t−1 2 ≈ ρˆ(Xt2 , Xt−1 ) 2 PT 2 2 X − X t−1 t−1 t=2

T t=2

Careful: These p estimators are only consistent if the kurtosis exists (i.e. if α1 < 1/3) Test of ARCH-effects H0 : α1 = 0 H1 : α1 > 0



Winter 2013/2014

124 / 143


For T large, under H0 √ Reject H0 if

√

Tα ˆ 1 ∼ N(0, 1)

Tα ˆ 1 > Φ−1 (1 − α)

Second version of this test: Consider the R 2 of the regression 2 + vt , Xt2 = α0 + α1 Xt−1

then under H0

appr

Tα ˆ 12 ≈ TR 2 ∼ χ21 Reject H0 if TR 2 > Fχ−1 2 (1 − α) 1



Winter 2013/2014

125 / 143

GARCH models ARCH(p)-process

Definition: ARCH(p)-process The stochastic process (Xt )t∈Z is called ARCH(p)-process if E (Xt |Xt−1 , . . . Xt−p ) = 0 Var (Xt |Xt−1 , . . . , Xt−p ) = σt2 2 2 = α0 + α1 Xt−1 + . . . + αp Xt−p

for t ∈ Z, where αi ≥ 0 for i = 0, 1, . . . , p − 1 and αp > 0 Often, an additional assumption is that Xt |(Xt−1 = xt−1 , . . . , Xt−p = xt−p ) ∼ N(0, σt2 )



Winter 2013/2014

126 / 143


Example of an ARCH(p)-process Xt = εt σt where(εt )t∈Z is white noise with σε2 = 1 and q 2 + ... + α X2 σt = α0 + α1 Xt−1 p t−p An ARCH(p) process is weakly stationary if all roots of 1 − α1 z − α2 z 2 − . . . − αp z p = 0 are outside the unit circle Then, for all t ∈ Z, E (Xt ) = 0 and Var (Xt ) =


1−

α P0p


i=1 αi

Winter 2013/2014

127 / 143


If (Xt )t∈Z is a stationary ARCH(p) process, then (Xt2 )t∈Z is a stationary AR(p) process 2 2 Xt2 = α0 + α1 Xt−1 + . . . + αp Xt−p + vt

As to the error term, E (vt ) = 0 Var (vt ) = const. Cov (vt , vt−i ) = 0

for i = 1, 2, . . .

Simulating an ARCH(p) is easy



Winter 2013/2014

128 / 143

GARCH models Estimation of ARCH(p) models

OLS estimation of 2 2 Xt2 = α0 + α1 Xt−1 + . . . + αp Xt−p + vt

Test of ARCH-effects H0 : α1 = α2 = . . . = αp = 0

vs H1 : not H0

Let R 2 denote the coefficient of determination of the regression Under H0 , the test statistic TR 2 ∼ χ2p ; thus reject H0 if TR 2 > Fχ−1 2 (1 − α) p



Winter 2013/2014

129 / 143

GARCH models Maximum likelihood estimation

Basic idea of the maximum likelihood estimation method: Choose parameters such that the joint density of the observations fX1 ,...,XT (x1 , . . . , xT ) is maximized Let X1 , . . . , XT denote a random sample from X The density fX (x; θ) depends on R unknown parameters θ = (θ1 , . . . , θR )



Winter 2013/2014

130 / 143


ML estimation of θ: Maximize the (log)likelihood function L (θ) = fX1 ,...XT (x1 , . . . , xT ; θ) =

ln L (θ) =

T Y t=1 T X

fX (xt ; θ)

ln fX (xt ; θ)

t=1

ML estimate θˆ = argmax [ln L (θ)]



Winter 2013/2014

131 / 143


Since observations are independent in random samples fX1 ,...,XT (x1 , . . . , xT ) =

T Y

fXt (xt )

t=1

or ln fX1 ,...,XT (x1 , . . . , xT ) =

T X

ln fXt (xt )

t=1

=

T X

ln fX (xt )

t=1

But: ARCH-returns are not independent! Andrea Beccarini (CQE)


Winter 2013/2014

132 / 143


Factorization with dependent observations fX1 ,...,XT (x1 , . . . , xT ) =

T Y

fXt |Xt−1 ,...,X1 (xt |xt−1 , . . . , x1 )

t=1

or ln fX1 ,...,XT (x1 , . . . , xT ) =

T X

ln fXt |Xt−1 ,...,X1 (xt |xt−1 , . . . , x1 )

t=1

Hence, for an ARCH(1)-process T Y

1 1 fX1 ,...,XT (x1 , . . . , xT ) = fX1 (x1 ) √ p 2 exp − 2 2π σt t=2 Andrea Beccarini (CQE)


xt σt

2 !

Winter 2013/2014

133 / 143


The marginal density of X1 is complicated but becomes negligible for large T and, therefore, will be dropped from now on Log-likelihood function (without initial marginal density) ln L(α0 , α1 |x1 , . . . , xT ) T

T

t=2

t=2

T 1X 1X = − ln 2π − ln σt2 − 2 2 2

xt σt

2

2 where σt2 = α0 + α1 xt−1

ML-estimation of α0 and α1 by numerical maximization of ln L(α0 , α1 ) with respect to α0 and α1



Winter 2013/2014

134 / 143

GARCH models GARCH(p,q)-process

Definition: GARCH(p,q)-process The stochastic process (Xt )t∈Z is called GARCH(p, q)-process if E (Xt |Xt−1 , Xt−2 , . . .) = 0 Var (Xt |Xt−1 , Xt−2 , . . .) = σt2 2 2 = α0 + α1 Xt−1 + . . . + αp Xt−p 2 2 +β1 σt−1 + . . . + βq σt−q

for t ∈ Z with αi , βi ≥ 0 Often, an additional assumption is that (Xt |Xt−1 = xt−1 , Xt−2 = xt−2 , . . .) ∼ N(0, σt2 ) Andrea Beccarini (CQE)


Winter 2013/2014

135 / 143


Conditional variance of GARCH(1, 1) Var (Xt |Xt−1 , Xt−2 , . . .) = σt2 2 2 = α0 + α1 Xt−1 + β1 σt−1 ∞ X α0 2 = + α1 β1i−1 Xt−i 1 − β1 i=1

Unconditional variance Var (Xt ) =


1−

α0 Pq i=1 αi − j=1 βj

Pp


Winter 2013/2014

136 / 143


Necessary condition for weak stationarity p X

αi +

i=1

q X

βj < 1

j=1

(Xt )t∈Z has no autocorrelation GARCH-processes can be written as ARMA(max (p, q) , q)-processes in the squared returns Example: GARCH(1, 1)-process with Xt = εt σt and 2 + β σ2 σt2 = α0 + α1 Xt−1 1 t−1



Winter 2013/2014

137 / 143

GARCH models Estimation of GARCH(p,q)-processes

Estimation of the ARMA(max (p, q) , q)-process in the squared returns Alternative (and better) method: Maximum likelihood For a GARCH(1, 1)-process fX1 ,...,XT (x1 , . . . , xT ) T Y

1 1 = fX1 (x1 ) √ p 2 exp − 2 2π σt t=2



xt σt

2 !

Winter 2013/2014

138 / 143


Again, the density of X1 can be neglected Log-Likelihood function ln L(α0 , α1 , β1 |x1 , . . . , xT ) T

T

t=2

t=2

T 1X 1X = − ln 2π − ln σt2 − 2 2 2

xt σt

2

2 2 with σt2 = α0 + α1 xt−1 + β1 σt−1 and σ12 = 0

ML-estimation of α0 , α1 and β1 by numerical maximization



Winter 2013/2014

139 / 143


2 Conditional h-step forecast of the volatility σt+h in a GARCH(1, 1) model α0 2 h 2 E σt+h |Xt , Xt−1 , . . . = (α1 + β1 ) σt − 1 − α1 − β1 α0 + 1 − α1 − β 1

If the process is stationary 2 lim E (σt+h |Xt , Xt−1 , . . .) =

h→∞

α0 1 − α1 − β1

Simulation of GARCH-processes is easy; the estimation can be computer intensive Andrea Beccarini (CQE)


Winter 2013/2014

140 / 143

GARCH models Residuals of an estimated GARCH(1,1) model

Careful: Residuals are slightly different from what you know from OLS regressions Estimates: α ˆ0, α ˆ 1 , βˆ1 , µ ˆ 2 + β σ2 From σt2 = α0 + α1 Xt−1 1 t−1 and Xt = µ + σt εt we calculate the standardized residuals

εˆt =

Xt − µ ˆ Xt − µ ˆ =q σ ˆt 2 +β ˆ1 σ 2 α ˆ0 + α ˆ 1 Xt−1 t−1

Histogram of the standardized residuals



Winter 2013/2014

141 / 143

GARCH models AR(p)-ARCH(q)-models

Definition: (Xt )t∈Z is called AR(p)-ARCH(q)-process if Xt

= µ + φ1 Xt−1 + εt

σt2

= α0 + α1 ε2t−1

where εt ∼ N(0, σt2 ) mean equation / variance equation Maximum likelihood estimation



Winter 2013/2014

142 / 143

GARCH models Extensions of the GARCH model

There are a number of possible extensions to the GARCH model: Empirical fact: Negative shocks have a larger impact on volatility than positive shocks (leverage effect) News impact curve Nonnormal innovations, e.g. εt ∼ tν



Winter 2013/2014

143 / 143

Time Series Analysis - uni-muenster.de

Recommend Documents