Time Series Analysis - uni-muenster.de

Time Series Analysis Andrea Beccarini ... Time series are ubiquitous in economics, ... Hamilton, James D. (1994),...

37 downloads 704 Views 1MB Size
Time Series Analysis Andrea Beccarini Center for Quantitative Economics

Winter 2013/2014

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

1 / 143

Introduction Objectives

Time series are ubiquitous in economics, and very important in macro economics and financial economics GDP, inflation rates, unemployment, interest rates, stock prices You will learn . . . the formal mathematical treatment of time series and stochastic processes what the most important standard models in economics are how to fit models to real world time series

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

2 / 143

Introduction Prerequisites

Descriptive Statistics Probability Theory Statistical Inference

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

3 / 143

Introduction Class and material

Class Class teacher: Sarah Meyer Time: Tu., 12:00-14:00 Location: CAWM 3 Start: 22 October 2013 Material Course page on Blackboard Slides and class material are (or will be) downloadable

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

4 / 143

Introduction Literature

Neusser, Klaus (2011), Zeitreihenanalyse in den Wirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden. −→ available online in the RUB-Netz Hamilton, James D. (1994), Time Series Analysis, Princeton University Press, Princeton. Pfaff, Bernhard (2006), Analysis of Integrated and Cointegrated Time Series with R, Springer, New York. Schlittgen, Rainer und Streitberg, Bernd (1997), Zeitreihenanalyse, 7. Aufl., Oldenbourg, M¨ unchen.

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

5 / 143

Basics Definition

Definition: Time series A sequence of observations ordered by time is called time series Time series can be univariate or multivariate Time can be discrete or continous The states can be discrete or continuous

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

6 / 143

Basics Definition

Typical notations x1 , x2 , . . . , xT or x(1), x(2), . . . , x(T ) or xt , t = 1, . . . , T or (xt )t≥0 This course is about . . . univariate time series in discrete time with continuous states

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

7 / 143

Basics Examples

Quarterly GDP Germany, 1991 I to 2012 II

600

● ●● ● ● ●

550

● ● ●●

●●

500

● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

450 400 350

GDP (in current billion Euro)

650

●● ●●

● ● ●

●●

●●

●● ● ●

●● ● ● ● ● ● ●

●●



● ●



● ●



● ● ●

1995

2000

2005

2010

Time

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

8 / 143

Basics Examples

6000 2000

DAX

DAX index and log(DAX), 31.12.1964 to 6.4.2009

1970

1980

1990

2000

2010

2000

2010

9.0 8.0 7.0 6.0

logarithm of DAX

Time

1970

1980

1990 Time

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

9 / 143

Basics Definition

Definition: Stochastic process A sequence (Xt )t∈T of random variables, all defined on the same probability space (Ω, A, P), is called stochastic process with discrete time parameter (usually T = N or T = Z) Short version: A stochastic process is a sequence of random variables A stochastic process depends on both chance and time

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

10 / 143

Basics Definition

Distinguish four cases: both time and chance can be fixed or variable

ω fixed

ω variable

t fixed Xt (ω) is a real number Xt (ω) is a random variable

t variable Xt (ω) is a sequence of real numbers (path, realization, trajectory) Xt (ω) is a stochastic process

process.R

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

11 / 143

Basics Examples

Example 1: White noise εt ∼ NID 0, σ 2



Example 2: Random walk Xt

= Xt−1 + εt

εt

2

and X0 = 0

∼ NID(0, σ )

Example 3: A random constant Xt Z

Andrea Beccarini (CQE)

= Z ∼ N(0, σ 2 )

Time Series Analysis

Winter 2013/2014

12 / 143

Basics Moment functions

Definition: Moment functions The following functions of time are called moment functions: µ(t) = E (Xt ) (expectation function) σ 2 (t) = Var (Xt ) (variance function) γ(s, t) = Cov (Xs , Xt ) (covariance function) Correlation function (autocorrelation function) γ(s, t) p ρ(s, t) = p 2 σ (s) σ 2 (t) moments.R

Andrea Beccarini (CQE)

[1]

Time Series Analysis

Winter 2013/2014

13 / 143

Basics Estimation of moment functions

Usually, the moment functions are unknown and have to be estimated Problem: Only a single path (realization) can be observed X1 (1) X2 .. .

(1)

X1 (2) X2 .. .

(2)

(1) XT

(2) XT

... ... ... ...

(n)

X1 (n) X2 .. . (n)

XT

Can we still estimate the expectation function µ(t) and the autocovariance function γ(s, t)? Under which conditions?

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

14 / 143

Basics Estimation of moment functions

X1 (1) X2 .. .

(1)

X1 (2) X2 .. .

(2)

(1) XT

(2) XT

... ... ... ...

(n)

X1 (n) X2 .. . (n)

XT

Usually, the expectation function µ(t) should be estimated by averaging over realizations, n

1 X (i) µ ˆ(t) = Xt n i=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

15 / 143

Basics Estimation of moment functions

X1 (1) X2 .. .

(1)

X1 (2) X2 .. .

(2)

(1) XT

(2) XT

... ... ... ...

(n)

X1 (n) X2 .. . (n)

XT

Under certain conditions, µ(t) can be estimated by averaging over time, T 1 X (1) µ ˆ= Xt T t=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

15 / 143

Basics Estimation of moment functions

X1 (1) X2 .. .

(1)

X1 (2) X2 .. .

(2)

(1) XT

(2) XT

... ... ... ...

(n)

X1 (n) X2 .. . (n)

XT

Usually, the autocovariance γ(t, t + h) should be estimated by averaging over realizations, n

1 X (i) (i) γˆ (t, t + h) = (Xt − µ ˆ(t))(Xt+h − µ ˆ(t + h)) n i=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

16 / 143

Basics Estimation of moment functions

X1 (1) X2 .. .

(1)

X1 (2) X2 .. .

(2)

(1) XT

(2) XT

... ... ... ...

(n)

X1 (n) X2 .. . (n)

XT

Under certain conditions, γ(t, t + h) can be estimated by averaging over time, γˆ (t, t + h) =

T −h 1 X ˆ)(Xt+h (1) − µ ˆ) (Xt (1) − µ T t=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

16 / 143

Basics Definition

Moment functions cannot be estimated without additional assumptions since only one path is observed There are restrictions which allow to estimate the moment functions Restriction of the time heterogeneity: The distribution of (Xt (ω))t∈T must not be completely different for each t ∈ T Restriction of the memory: If the values of the process are coupled too closely over time, the individual observations do not supply any (or only insufficient) information about the distribution

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

17 / 143

Basics Restriction of time heterogeneity: Stationarity

Definition: Strong stationarity Let (Xt )t∈T be a stochastic process, and let t1 , . . . , tn ∈ T be an arbitrary number of n ∈ N arbitrary time points. (Xt )t∈T is called strongly stationary if for arbitrary h ∈ Z P(Xt1 ≤ x1 , . . . , Xtn ≤ xn ) = P(Xt1 +h ≤ x1 , . . . , Xtn +h ≤ xn ) Implication: all univariate marginal distributions are identical

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

18 / 143

Basics Restriction of time heterogeneity: Stationarity

Definition: Weak stationarity (Xt )t∈T is called weakly stationary if 1

the expectation exists and is constant: E (Xt ) = µ < ∞ for all t ∈ T

2

the variance exists and is constant: Var (Xt ) = σ 2 < ∞ for all t ∈ T

3

for all t, s, r ∈ Z (in admissible range) γ(t, s) = γ (t + r , s + r )

Simplified notation for covariance and correlation functions γ(h) = γ(t, t + h) ρ(h) = ρ(t, t + h)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

19 / 143

Basics Restriction of time heterogeneity: Stationarity

Strong stationarity implies weak stationarity (but only if the first two moments exist) A stochastic process is called Gaussian if the joint distribution of Xt1 , . . . , Xtn is multivariate normal For Gaussian processes, weak and strong stationarity coincide Intuition: An observed time series can be regarded as a realization of a stationary process, if a gliding window of appropriate width“ ” always displays qualitatively the same“ picture ” stationary.R Examples

Andrea Beccarini (CQE)

[2]

Time Series Analysis

Winter 2013/2014

20 / 143

Basics Restriction of memory: Ergodicity

Definition: Ergodicity (I) Let (Xt )t∈T be a weakly stationary stochastic process with expectation µ and autocovariance γ(h); define µ ˆ=

T 1 X Xt T t=1

(Xt )t∈T is called (expectation) ergodic, if h i lim E (ˆ µT − µ)2 = 0

T →∞

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

21 / 143

Basics Restriction of memory: Ergodicity

Definition: Ergodicity (II) Let (Xt )t∈T be a weakly stationary stochastic process with expectation µ and autocovariance γ(h); define γˆ (h) =

T −h 1 X (Xt − µ)(Xt+h − µ) T t=1

(Xt )t∈T is called (covariance) ergodic, if for all h ∈ Z h i lim E (ˆ γ (h) − γ(h))2 = 0

T →∞

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

22 / 143

Basics Restriction of memory: Ergodicity

Ergodicity is consistency (in quadratic mean) of the estimators µ ˆ of µ and γˆ (h) of γ(h) for dependent observations The process (Xt )t∈T is expectation ergodic if (γ(h))h∈Z is absolutely summable, i.e. ∞ X

|γ(h)| < ∞

h=−∞

The dependence between far away observations must be sufficiently small

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

23 / 143

Basics Restriction of memory: Ergodicity

Ergodicity condition (for autocovariance): A stationary Gaussian process (Xt )t∈T with absolutely summable autocovariance function γ(h) is (autocovariance) ergodic Under ergodicity, the law of large numbers holds even if the observations are dependent If the dependence γ(h) does not diminish fast enough, the estimators are no longer consistent Examples

Andrea Beccarini (CQE)

[3]

Time Series Analysis

Winter 2013/2014

24 / 143

Basics Estimation of moment functions

Summary of estimators

electricity.R

T 1 X ¯ µ ˆ = XT = Xt T t=1

T −h X

γˆ (h) =

1 T

ρˆ(h) =

γˆ (h) γˆ (0)

(Xt − µ ˆ)(Xt+h − µ ˆ)

t=1

Sometimes, γˆ (h) is defined with factor 1/(T − h)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

25 / 143

Basics Estimation of moment functions

A closer look at the expectation estimator µ ˆ The estimator µ ˆ is unbiased, i.e. E (ˆ µ) = µ

[4]

The variance of µ ˆ is

[5]

 T −1  γ (0) 2 X h Var (ˆ µ) = + 1− γ (h) T T T h=1

Under ergodicity, for T → ∞ T · Var (ˆ µ) → γ (0) + 2

∞ X h=1

Andrea Beccarini (CQE)

Time Series Analysis

γ (h) =

∞ X

γ(h)

h=−∞

Winter 2013/2014

26 / 143

Basics Estimation of moment functions

For Gaussian processes, µ ˆ is normally distributed µ ˆ ∼ N (µ, Var (ˆ µ)) and asymptotically √

T (ˆ µ − µ) → Z ∼ N

0, γ (0) + 2

∞ X

! γ (h)

h=1

For non-Gaussian processes, µ ˆ is (often) asymptotically normal ! ∞ X √ T (ˆ µ − µ) → Z ∼ N 0, γ (0) + 2 γ (h) h=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

27 / 143

Basics Estimation of moment functions

A closer look at the autocovariance estimators γˆ (h) For Gaussian processes with absolutely summable covariance function, 0 √ √ T (ˆ γ (0) − γ (0)) , . . . , T (ˆ γ (K ) − γ (K )) is multivariate normal with expectation vector (0, . . . , 0)0 and T · Cov (ˆ γ (h1 ) , γˆ (h2 )) ∞ X = (γ (r ) γ (r + h1 + h2 ) + γ (r − h2 ) γ (r + h1 )) r =−∞

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

28 / 143

Basics Estimation of moment functions

A closer look at the autocorrelation estimators ρˆ(h) For Gaussian processes with absolutely summable covariance function, the random vector √ 0 √ T (ˆ ρ (0) − ρ (0)) , . . . , T (ˆ ρ (K ) − ρ (K )) is multivariate normal with expectation vector (0, . . . , 0)0 and a complicated covariance matrix Be careful: For small to medium sample sizes the autocovariance and autocorrelation estimators are biased! autocorr.R

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

29 / 143

Basics Estimation of moment functions

An important special case for autocorrelation estimators: Let (εt ) be a white-noise process with Var (εt ) = σ 2 < ∞, then E (ˆ ρ (h)) = −T −1 + O(T −2 )  −1 −2 ) T + O(T  Cov (ˆ ρ (h1 ) , ρˆ (h2 )) = −2 O T

for h1 = h2 else

For white-noise processes and long time series, the empirical autocorrelations are approximately independent normal random variables with expectation −T −1 and variance T −1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

30 / 143

Mathematical digression (I) Complex numbers

Some quadratic equations do not have real solutions, e.g. x2 + 1 = 0 Still it is possible (and sensible) to define solutions to such equations The definition in common notation is √ i = −1 where i is the number which, when squared, equals −1 The number i is called imaginary (i.e. not real)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

31 / 143

Mathematical digression (I) Complex numbers

Other imaginary numbers follow from this definition, e.g. √ √ √ −16 = 16 −1 = 4i √ √ √ √ −5 = 5 −1 = 5i Further, it is possible to define numbers that contain both a real part and an imaginary part, e.g. 5 − 8i or a + bi Such numbers are called complex and the set of complex numbers is denoted as C The pair a + bi and a − bi is called conjugate complex

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

32 / 143

Mathematical digression (I) Complex numbers

imaginary axis

seq(0, 8, length = 11)

Geometric interpretation:



a+bi

er

alu ev lut

so

ab

θ

imaginary part b

real part a

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

33 / 143

Mathematical digression (I) Complex numbers

Polar coordinates and Cartesian coordinates z

= a + bi = r · (cos θ + i sin θ) = re iθ

a = r cos θ b = r sin θ p a2 + b 2 r =   b θ = arctan a

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

34 / 143

Mathematical digression (I) Complex numbers

Rules of calculus: Addition (a + bi) + (c + di) = (a + c) + (b + d)i Multiplication (cartesian coordinates) (a + bi) · (c + di) = (ac − bd) + (ad + bc)i Multiplication (polar coordinates) r1 e iθ1 · r2 e iθ2 = r1 r2 e i(θ1 +θ2 )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

35 / 143

Mathematical digression (I) Complex numbers

imaginary axis

seq(−2, 8, length = 11)

Addition:



a+bi

c+di ●

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

36 / 143

Mathematical digression (I) Complex numbers

Addition:

imaginary axis

seq(−2, 8, length = 11)





a+bi

c+di ●

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

36 / 143

Mathematical digression (I) Complex numbers

Addition: (a+c)+(b+d)i

imaginary axis

seq(−2, 8, length = 11)





a+bi

c+di ●

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

36 / 143

Mathematical digression (I) Complex numbers

imaginary axis

seq(−2, 8, length = 11)

Multiplication:



θ2

r2

r1



θ1

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

37 / 143

Mathematical digression (I) Complex numbers

Multiplication:

imaginary axis

seq(−2, 8, length = 11)



r=

r1

⋅r

2



θ = θ1 + θ2 θ2

r2

r1



θ1

real axis

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

37 / 143

Mathematical digression (I) Complex numbers

The quadratic equation x 2 + px + q = 0 has the solutions p x =− ± 2 If

p2 4

r

p2 −q 4

− q < 0 the solutions are complex (and conjugate)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

38 / 143

Mathematical digression (I) Complex numbers

Example: The solutions of x 2 − 2x + 5 = 0 are (−2) + x =− 2

r

(−2)2 − 5 = 1 + 2i 4

(−2) x =− − 2

r

(−2)2 − 5 = 1 − 2i 4

and

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

39 / 143

Mathematical digression (II) Linear difference equations

First order difference equation with initial value x0 : xt = c + φ1 xt−1 p-th order difference equation with initial value x0 : xt = c + φ1 xt−1 + . . . + φp xt−p A sequence (xt )t=0,1,... that satisfies the difference equation is called a solution of the difference equation Examples (diffequation.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

40 / 143

Mathematical digression (II) Linear difference equations

We only consider the homogeneous case, i.e. c = 0 The general solution of the first-order difference equation xt = φ1 xt−1 is xt = A · φt1 with arbitrary constant A since xt = Aφt1 = φ1 Aφt−1 = φ1 xt−1 1 The constant is definitized by the initial condition, A = x0 The sequence xt = Aφt1 is convergent if and only if |φ1 | < 1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

41 / 143

Mathematical digression (II) Linear difference equations

Solution of the p-th order difference equation xt = φ1 xt−1 + . . . + φp xt−p Let xt = Az −t , then Az −t z −t

= φ1 Az −(t−1) + . . . + φp Az −(t−p) = φ1 z −(t−1) + . . . + φp z −(t−p)

and thus 1 − φ1 z 1 − . . . − φp z p = 0 Characteristic polynomial, characteristic equation

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

42 / 143

Mathematical digression (II) Linear difference equations

There are p (possibly complex, possibly nondistinct) solutions of the characteristic equation Denote the solutions (called roots) by z1 , . . . , zp If all roots are real and distinct, then xt = A1 z1−t + . . . + Ap zp−t is a solution of the homogeneous difference equation If there are complex roots the solution is oscillating The constants A1 , . . . , Ap can be definitized with p initial conditions (x0 , x−1 , . . . , xp−1 )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

43 / 143

Mathematical digression (II) Linear difference equations

Stability condition: The linear difference equation xt = φ1 xt−1 + . . . + φp xt−p is stable (i.e. convergent) if and only if all roots of the characteristic polynomial 1 − φ1 z − . . . − φp z p = 0 are outside the unit circle, i.e. |zi | > 1 for all i = 1, . . . , p In R, the stability condition can be checked easily using the commands polyroot (base package) or ArmaRoots (fArma package)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

44 / 143

ARMA models Definition

Definition: ARMA process Let (εt )t∈T be a white noise process; the stochastic process Xt = φ1 Xt−1 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q with φp , θq 6= 0 is called ARMA(p, q) process AutoRegressive Moving Average process ARMA processes are important since every stationary process can be approximated by an ARMA process

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

45 / 143

ARMA models Lag operator and lag polynomial

The lag operator is a convenient notational tool The lag operator L shifts the time index of a stochastic process L (Xt )t∈T = (Xt−1 )t∈T LXt

= Xt−1

Rules L2 Xt n

L Xt

= Xt−n

−1

L

= Xt+1

0

= Xt

L Xt

Andrea Beccarini (CQE)

= L (LXt ) = Xt−2

Time Series Analysis

Winter 2013/2014

46 / 143

ARMA models Lag operator and lag polynomial

Lag polynomial A(L) = a0 + a1 L + a2 L2 + . . . + ap Lp Example: Let A(L) = 1 − 0.5L and B(L) = 1 + 4L2 , then C (L) = A(L)B(L) = (1 − 0.5L) 1 + 4L2



= 1 − 0.5L + 4L2 − 2L3 Lag polynomials can be treated in the same way as ordinary polynomials

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

47 / 143

ARMA models Lag operator and lag polynomial

Define the lag polynomials Φ(L) = 1 − φ1 L − . . . − φp Lp Θ(L) = 1 + θ1 L + . . . + θq Lq The ARMA(p, q) process can be written compactly as Φ(L)Xt = Θ(L)εt Important special cases MA(q) process :

Xt = εt + θ1 εt−1 + . . . + θq εt−q

AR(1) process :

Xt = φ1 Xt−1 + εt

AR(p) process :

Xt = φ1 Xt−1 + · · · + φp Xt−p + εt

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

48 / 143

ARMA models MA(q) process

The MA(q) process is Xt

= Θ(L)εt

Xt

= εt + θ1 εt−1 + . . . + θq εt−q

with εt ∼ NID(0, σε2 ) Expectation function E (Xt ) = E (εt + θ1 εt−1 + . . . + θq εt−q ) = E (εt ) + θ1 E (εt−1 ) + . . . + θq E (εt−q ) = 0

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

49 / 143

ARMA models MA(q) process

Autocovariance function γ (s, t)   = E (εs + θ1 εs−1 + . . . + θq εs−q ) (εt + θ1 εt−1 + . . . + θq εt−q )  = E εs εt + θ1 εs εt−1 + θ2 εs εt−2 + . . . + θq εs εt−q +θ1 εs−1 εt + θ12 εs−1 εt−1 + θ1 θ2 εs−1 εt−2 + . . . + θ1 θq εs−1 εt−q +... +θq εs−q εt + θ1 θq εs−q εt−1 + θ2 θq εs−q εt−2 + . . . + θq2 εs−q εt−q



The expectations of the cross products are  0 for s 6= t E (εs εt ) = 2 σε for s = t Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

50 / 143

ARMA models MA(q) process

Define θ0 = 1, then γ (t, t) = σε2 γ (t − 1, t) =

Xq

θ2 i=0 i Xq−1 σε2 θi θi+1 i=0

γ (t − 2, t) = σε2

Xq−2 i=0

θi θi+2

γ (t − q, t) = σε2 θ0 θq = σε2 θq γ (s, t) = 0 for s < t − q Hence, MA(q) processes are always stationary Simulation of MA(q) processes (maqsim.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

51 / 143

ARMA models AR(1) process

The AR(1) process is Φ(L)Xt

= εt

(1 − φ1 L)Xt

= εt

Xt

= φ1 Xt−1 + εt

with εt ∼ NID(0, σε2 ) Expectation and variance function

[6]

Stability condition: AR(1) processes are stable if |φ1 | < 1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

52 / 143

ARMA models AR(1) process

Stationarity: Stable AR(1) processes are weakly stationary if

[7]

E (X0 ) = 0 Var (X0 ) =

σε2 1 − φ21

Nonstationary stable processes converge towards stationarity

[8]

It is common parlance to call stable processes stationary Covariance function of stationary AR(1) process

Andrea Beccarini (CQE)

Time Series Analysis

[9]

Winter 2013/2014

53 / 143

ARMA models AR(p) process

The AR(p) process is Φ(L)Xt Xt

= εt = φ1 Xt−1 + . . . + φp Xt−p + εt

with εt ∼ NID(0, σε2 ) Assumption: εt is independent from Xt−1 , Xt−2 , . . . (innovations) Expectation function

[10]

The covariance function is complicated (ar2autocov.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

54 / 143

ARMA models AR(p) process

AR(p) processes are stable if all roots of the characteristic equation Φ(z) = 0 are larger than 1 in absolute value, |zi | > 1 for i = 1, . . . , p An AR(p) process is weakly stationary if the joint distribution of the p initial values (X0 , X−1 , . . . , X−(p−1) ) is appropriate“ ” Stable AR(p) processes converge towards stationarity; they are often called stationary Simulation of AR(p) processes (arpsim.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

55 / 143

ARMA models Invertability

AR and MA processes can be inverted (into each other) Example: Consider the stable AR (1) process with |φ1 | < 1 Xt

= φ1 Xt−1 + εt = φ1 (φ1 Xt−2 + εt−1 ) + εt = φ21 Xt−2 + φ1 εt−1 + εt .. . = φn1 Xt−n + φ1n−1 εt−(n−1) + . . . + φ21 εt−2 + φ1 εt−1 + εt

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

56 / 143

ARMA models Invertability

Since |φ1 | < 1 Xt

=

∞ X

φi1 εt−i

i=0

= εt + θ1 εt−1 + θ2 εt−2 + . . . with θi = φi1 A stable AR(1) process can be written as an MA(∞) process (the same is true for stable AR(p) processes)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

57 / 143

ARMA models Invertability

Using lag polynomials this can be written as (1 − φ1 L)Xt Xt Xt

= εt = (1 − φ1 L)−1 εt ∞ X = (φ1 L)i εt i=0

General compact and elegant notation Φ(L)Xt Xt

= εt = (Φ(L))−1 εt = Θ(L)εt

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

58 / 143

ARMA models Invertability

MA(q) can be written as AR(∞) if all roots of Θ(z) = 0 are larger than 1 in absolute value (invertability condition) Example: MA(1) with |θ1 | < 1; from Xt

= εt + θ1 εt−1

θ1 Xt−1 = θ1 εt−1 + θ12 εt−2 we find Xt = θ1 Xt−1 + εt − θ12 εt−2 Repeated substitution of the εt−i terms yields Xt =

∞ X

φi Xt−i + εt

with φi = (−1)i+1 θ1i

i=1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

59 / 143

ARMA models Invertability

Summary ARMA(p, q) processes are stable if all roots of Φ(z) = 0 are larger than 1 in absolute value ARMA(p, q) processes are invertible if all roots of Θ(z) = 0 are larger than 1 in absolute value

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

60 / 143

ARMA models Invertability

Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q) process either as AR(∞) or as MA(∞) ARMA(p, q) can be written as AR(∞) or MA(∞) Φ(L)Xt Xt (Θ(L))−1 Φ(L)Xt

Andrea Beccarini (CQE)

= Θ(L)εt = (Φ(L))−1 Θ(L)εt = εt

Time Series Analysis

Winter 2013/2014

61 / 143

ARMA models Deterministic components

Until now we only considered processes with zero expectation Many processes have both a zero-expectation stochastic component (Yt ) and a non-zero deterministic component (Dt ) Examples: linear trend Dt = a + bt exponential trend Dt = ab t saisonal patterns

Let (Xt )t∈Z be a stochastic process with deterministic component Dt and define Yt = Xt − Dt

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

62 / 143

ARMA models Deterministic components

Then E (Yt ) = 0 and Cov (Yt , Ys ) = E [(Yt − E (Yt )) (Ys − E (Ys ))] = E [(Xt − Dt − E (Xt −Dt ))(Xs − Ds − E (Xs −Ds ))] = E [(Xt − E (Xt )) (Xs − E (Xs ))] = Cov (Xt , Xs ) The covariance function does not depend on the deterministic component To derive the covariance function of a stochastic process, simply drop the deterministic component

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

63 / 143

ARMA models Deterministic components

Special case: Dt = µt = µ ARMA(p, q) process with constant (non-zero) expectation Xt − µ = φ1 (Xt−1 − µ) + . . . + φp (Xt−p − µ) +εt + θ1 εt−1 + . . . + θq εt−q The process can also be written as Xt = c + φ1 Xt−1 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q where c = µ (1 − φ1 − . . . − φp )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

64 / 143

ARMA models Deterministic components

Wold’s representation theorem: Every stationary stochastic process (Xt )t∈T can be represented as Xt =

∞ X

ψh εt−h + Dt

h=0

with ψ0 = 1,

P∞

2 h=0 ψj

< ∞ and εt white noise with variance σ 2 > 0

Stationary stochastic processes can be written as a sum of a deterministic process and an MA(∞) process Often, low order ARMA(p, q) processes can approximate MA(∞) processes well

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

65 / 143

ARMA models Linear processes and filter

Definition: Linear process Let (εt )t∈Z be a white noise process; a stochastic process (Xt )t∈Z is called linear if it can be written as Xt

=

∞ X

ψh εt−h

h=−∞

= Ψ(L)εt where the coefficients are absolutely summable, i.e.

P∞

h=−∞ |ψh |

< ∞.

The lag polynomial Ψ(L) is called (linear) filter

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

66 / 143

ARMA models Linear processes and filter

Some special filters Change from previous period (difference filter) Ψ(L) = 1 − L Change from last year (for quarterly or monthly data) Ψ(L) = 1 − L4 Ψ(L) = 1 − L12 Elimination of saisonal influences (quarterly data)  Ψ(L) = 1 + L + L2 + L3 /4 Ψ(L) = 0.125L2 + 0.25L + 0.25 + 0.25L−1 + 0.125L−2 Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

67 / 143

ARMA models Linear processes and filter

Hodrick-Prescott filter (important tool in empirical macro economics) Decompose a time series (Xt ) into a long-term growth component (Gt ) and a short-term cyclical component (Ct ) Xt = Gt + Ct Trade-off between goodness-of-fit and smoothness of Gt Minimize the criterion function T X

(Xt − Gt )2 + λ

t=1

T −1 X

[(Gt+1 − Gt ) − (Gt − Gt−1 )]2

t=2

with respect to Gt for given smoothness parameter λ

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

68 / 143

ARMA models Linear processes and filter

The FOCs of the minimization problem are     G1 X1  .   ..   .  = A  ..  GT XT where A = (I + λK 0 K )−1 with  1 −2 1 0 0  0 1 −2 1 0   1 −2 1 K = 0 0  .. .. .. .. ..  . . . . . 0 0 0 0 0

Andrea Beccarini (CQE)

Time Series Analysis

... 0 ... 0 ... 0 . . . . ..

0 0 0 .. .

0 0 0 .. .

      

. . . 1 −2 1

Winter 2013/2014

69 / 143

ARMA models Linear processes and filter

The HP filter is a linear filter Typical values for smoothing parameter λ λ = 10 λ = 1600 λ = 14400

annual data quarterly data monthly data

Implementation in R (code by Olaf Posch) Empirical examples (hpfilter.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

70 / 143

Estimation of ARMA models The estimation problem

Problem: The parameters φ1 , . . . , φp , θ1 , . . . , θq , σε2 of an ARMA(p, q) process are usually unknown They have to be estimated from an observed time series X1 , . . . , XT Standard estimation methods: Least squares (OLS) Maximum likelihood (ML)

Assumption: the lag orders p and q are known

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

71 / 143

Estimation of ARMA models Least squares estimation of AR(p) models

The AR(p) model with non-zero constant expectation Xt = c + φ1 Xt−1 + . . . + φp Xt−p + εt can be writte in matrix notation    Xp+1 1 Xp Xp−1  Xp+2   1 Xp+1 Xp     ..  =  .. .. ..  .   . . . XT

... ... .. .

X1 X2 .. .

1 XT −1 XT −2 . . . XT −p

    

c φ1 .. .





    +  

εp+1 εp+2 .. .

φp

    

εT

Compact notation: y = Xβ + u

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

72 / 143

Estimation of ARMA models Least squares estimation of AR(p) models

The standard least squares estimator is −1 0 βˆ = X0 X Xy The matrix of exogenous variables X is stochastic −→ usual results for OLS regression do not hold But: There is no contemporaneous correlation between the error term and the exogenous variables Hence, the OLS estimators are consistent and asymptotically efficient

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

73 / 143

Estimation of ARMA models Least squares estimation of ARMA models

Solve the ARMA equation Xt = c + φ1 Xt−1 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q for εt , εt = Xt − c − φ1 Xt−1 − . . . − φp Xt−p − θ1 εt−1 − . . . − θq εt−q Define the residuals as functions of the unknown parameters εˆt (d, f1 , . . . , fp , g1 , . . . , gq ) = Xt − d − f1 Xt−1 − . . . − fp Xt−p −g1 εˆt−1 − . . . − gq εˆt−q

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

74 / 143

Estimation of ARMA models Least squares estimation of ARMA models

Define the sum of squared residuals S (d, f1 , . . . , fp , g1 , . . . , gq ) =

T X

(ˆ εt (d, f1 , . . . , fp , g1 , . . . , gq ))2

t=1

The least squares estimators are (ˆ c , φˆ1 , . . . , φˆp , θˆ1 , . . . , θˆq ) = arg min S (d, f1 , . . . , fp , g1 , . . . , gq ) Since the residuals are defined recursively one needs starting values εˆ0 , . . . , εˆ−q+1 and X0 , . . . , X−p+1 to calculate εˆ1 Easiest way: Set all starting values to zero ( conditional estimation“) ”

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

75 / 143

Estimation of ARMA models Least squares estimation of ARMA models

The first order conditions are a nonlinear equation system which cannot be solved easily Minimization by standard numerical methods (implemented in all usual statistical packages) Either solve the nonlinear first order conditions equation system or minimize S Simple special case: ARMA(1, 1) arma11.R

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

76 / 143

Estimation of ARMA models Maximum likelihood estimation

Additional assumption: The innovations εt are normally distributed Implication: ARMA processes are Gaussian The joint distribution of X1 , . . . , XT is multivariat normal   X1   X =  ...  ∼ N (µ, Σ) XT

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

77 / 143

Estimation of ARMA models Maximum likelihood estimation

Expectation vector 

   X1 c/ (1 − φ1 − . . . − φp )     .. µ = E  ...  =   . XT c/ (1 − φ1 − . . . − φp )

Covariance matrix    X1  X2      Σ = Cov  .  =   ..   XT

Andrea Beccarini (CQE)

. . . γ(T − 1) . . . γ (T − 2) .. .. . . γ(T − 1) γ (T − 2) . . . γ(0) γ(0) γ(1) .. .

Time Series Analysis

γ(1) γ(0) .. .

Winter 2013/2014

    

78 / 143

Estimation of ARMA models Maximum likelihood estimation

The expectation vector and the covariance matrix contain  all 2 unknown parameters ψ = φ1 , . . . , φp , θ1 , . . . , θq , c, σε The likelihood function is −T /2

L (ψ; X) = (2π)

−1/2

(det Σ)

  1 0 −1 exp − (X − µ) Σ (X − µ) 2

and the loglikelihood function is ln L (ψ; X) = −

T 1 1 ln (2π) − ln (det Σ) − (X − µ)0 Σ−1 (X − µ) 2 2 2

The ML estimators are ψˆ = arg max ln L (ψ; X)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

79 / 143

Estimation of ARMA models Maximum likelihood estimation

The loglikelihood function has to be maximized by numerical methods Standard properties of ML estimators: 1 2 3 4

consistency asymptotic efficiency asymptotically jointly normally distributed the covariance matrix of the estimators can be consistently estimated

Example: ML estimation of an ARMA(3, 3) model for the interest rate spread (arma33.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

80 / 143

Estimation of ARMA models Hypothesis tests

Since the estimation method is maximum likelihood, the classical tests (Wald, LR, LM) are applicable General null and alternative hypotheses H0 : g (ψ) = 0 H1 : not H0 where g (ψ) is an m-valued function of the parameters Example: If H0 : φ1 = 0 then m = 1 and g (ψ) = φ1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

81 / 143

Estimation of ARMA models Hypothesis tests

Likelihood ratio test statistic LR = 2(ln L(θˆML ) − ln L(θˆR )) where θˆML and θˆR are the unrestricted and restricted estimators Under the null hypothesis d

LR −→ U ∼ χ2m and H0 is rejected at significance level α if LR > χ2m;1−α Disadvantage: Two models must be estimated

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

82 / 143

Estimation of ARMA models Hypothesis tests

For the Wald test we only consider g (ψ) = ψ − ψ0 , i.e. H0 : ψ = ψ0 H1 : not H0 Test statistic d (ψ)( ˆ ψˆ − ψ0 ) W = (ψˆ − ψ0 )0 Cov d

If the null hypothesis is true then W −→ U ∼ χ2m The asymptotic covariance matrix can be estimated consistently as d (ψ) ˆ = H −1 where H is the Hessian matrix returned by the Cov maximization procedure

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

83 / 143

Estimation of ARMA models Hypothesis tests

Test example 1: H0 : φ 1 = 0 H1 : φ1 6= 0 Test example 2 H0 : ψ = ψ0 H1 : not H0 Illustration (arma33.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

84 / 143

Estimation of ARMA models Model selection

Usually, the lag orders p and q of an ARMA model are unknown Trade-off: Goodness-of-fit against parsimony Akaike’s information criterion for the model with non-zero expectation AIC =

ln σ ˆ2 |{z}

goodness-of-fit

+ 2 (p + q + 1) /T | {z } penalty

Choose the model with the smallest AIC

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

85 / 143

Estimation of ARMA models Model selection

Bayesian information criterion BIC (Schwarz information criterion) BIC = ln σ ˆ 2 + (p + q + 1) · ln T /T Hannan-Quinn information criterion HQ = ln σ ˆ 2 + 2 (p + q + 1) · ln (ln T ) /T Both BIC and HQ are consistent while the AIC tends to overfit Illustration (arma33.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

86 / 143

Estimation of ARMA models Model selection

Another illustration: The true model is ARMA(2, 1) with Xt = 0.5Xt−1 + 0.3Xt−2 + εt + 0.7εt−1 ; 1000 samples of size n = 500 were generated; the table shows the model’s orders p and q as selected by AIC and BIC p 0 1 2 3 4 5

0 0 0 0 0 9 11

# orders selected by q 1 2 3 0 0 0 18 64 23 171 21 16 7 35 58 2 12 139 6 12 56

Andrea Beccarini (CQE)

AIC 4 0 14 5 80 37 46

5 0 6 7 45 44 56

0 0 0 0 1 6 1

Time Series Analysis

# orders selected by q 1 2 3 0 0 0 310 167 4 503 3 1 0 2 1 1 0 0 0 0 0

BIC 4 0 0 0 0 0 0

Winter 2013/2014

5 0 0 0 0 0 0

87 / 143

Integrated processes Difference operator

Define the difference operator ∆ = 1 − L, then ∆Xt = Xt − Xt−1 Second order differences ∆2 = ∆(∆) = (1 − L)2 = 1 − 2L + L2 Higher orders ∆n are defined in the same way; note that ∆n 6= 1 − Ln

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

88 / 143

Integrated processes Definition

Definition: Integrated process A stochastic process is called integrated of order 1 if ∆Xt = µ + Ψ(L)εt P where εt is white noise, Ψ(1) 6= 0, and ∞ j=0 j|ψj | < ∞ Common notation: Xt ∼ I (1) I (1) processes are also called difference stationary or unit root processes Stochastic and deterministic trends Trend stationary processes are not I (1) (since Ψ(1) = 0)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

89 / 143

Integrated processes Definition

Stationary processes are sometimes called I (0) Higher order integrations are possible, e.g. Xt

∼ I (2)

∆ Xt

∼ I (0)

2

In general, Xt ∼ I (d) means that ∆d Xt ∼ I (0) Most economic time series are either I (0) or I (1) Some economic time series may be I (2)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

90 / 143

Integrated processes Definition

Example 1: The random walk with drift, Xt = b + Xt−1 + εt , is I (1) because ∆Xt

= Xt − Xt−1 = b + εt = b + Ψ(L)εt

where ψ0 = 1 and ψj = 0 for j 6= 0

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

91 / 143

Integrated processes Definition

Example 2: The trend stationary process, Xt = a + bt + εt , is not I (1) because ∆Xt

= b + εt − εt−1 = Ψ(L)εt

with ψ0 = 1, ψ1 = −1 and ψj = 0 for all other j

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

92 / 143

Integrated processes Definition

Example 3: The AR(2) process“ ” Xt (1 − φL) (1 − L) Xt

= b + (1 + φ) Xt−1 − φXt−2 + εt = b + εt

is I (1) if |φ| < 1 because ∆Xt = Ψ(L) (b + εt ) with Ψ(L) = (1 − φL)−1 = 1 + φL + φ2 L2 + φ3 L3 + φ4 L4 + . . . P 1 i and thus Ψ(1) = ∞ i=0 φ = 1−φ 6= 0. The roots of the characteristic equation are z = 1 and z = 1/φ

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

93 / 143

Integrated processes Definition

Example 4: The process Xt = 0.5Xt−1 − 0.4Xt−2 + εt is a stationary (stable) zero expectation AR(2) process; the process Yt = a + bt + Xt is trend stationary and I (0) since ∆Yt = b + ∆Xt with ∆Xt = Ψ(L)εt = (1 − L) 1 − 0.5L + 0.4L2 and therefore Ψ(1) = 0 (i0andi1.R)

Andrea Beccarini (CQE)

Time Series Analysis

−1

εt

Winter 2013/2014

94 / 143

Integrated processes Definition

Definition: ARIMA process Let (εt )t∈T be a white noise process; the stochastic process (Xt )t∈Z is called integrated autoregressive moving-average process of the orders p, d and q, or ARIMA(p, d, q), if ∆d Xt is an ARMA(p, q) process Φ(L)∆d Xt = Θ(L)εt For d > 0 the process is nonstationary (I (d)) even if all roots of Φ(z) = 0 are outside the unit circle Simulation of an ARIMA(p, d, q) process (arimapdqsim.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

95 / 143

Integrated processes Deterministic versus stochastic trends

Why is it important to distinguish deterministic and stochastic trends? Reason 1: Long-term forecasts and forecasting errors Deterministic trend: The forecasting error variance is bounded Stochastic trend: The forecasting error variance is unbounded Illustrations i0andi1.R

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

96 / 143

Integrated processes Deterministic versus stochastic trends

Why is it important to distinguish deterministic and stochastic trends? Reason 2: Spurious regression OLS regressions will show spurious relationships between time series with (deterministic or stochastic) trends Detrending works if the series have deterministic trends, but it does not help if the series are integrated Illustrations spurious1.R

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

97 / 143

Integrated processes Integrated processes and parameter estimation

OLS estimators (and ML estimators) are consistent and asymptotically normal for stationary processes The asymptotic normality is lost if the processes are integrated We only look at the very special case Xt = φ1 Xt−1 + εt with εt ∼ NID(0, 1) and X0 = 0 The AR(1) process is stationary if |φ1 | < 1 and has a unit root if |φ1 | = 1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

98 / 143

Integrated processes Integrated processes and parameter estimation

The usual OLS estimator of φ1 is PT t=1 Xt Xt−1 φˆ1 = P T 2 t=1 Xt−1 How does the distribution of φˆ look like? Influence of φ and T Consistency? Asymptotic normality? Illustration (phihat.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

99 / 143

Integrated processes Integrated processes and parameter estimation

Consistency and asymptotic normality for I (0) processes (|φ1 | < 1) plim φˆ1 = φ1  √   d T φˆ1 − φ1 → Z ∼ N 0, 1 − φ21 Consistency and asymptotic normality for I (1) processes (φ1 = 1) plim φˆ1 = 1   d T φˆ1 − 1 → V where V is a nondegenerate, nonnormal random variable Root-T -consistency and superconsistency Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

100 / 143

Integrated processes Unit root tests

Importance to distinguish between trend stationarity and difference stationarity Test of hypothesis that a process has a unit root (i.e. is I (1)) Classical approaches: (Augmented) Dickey-Fuller-Test, Phillips-Perron-Test Basic tool: Linear regression Xt ∆Xt

= deterministics + φXt−1 + εt = deterministics + (φ − 1) Xt−1 + εt | {z } =:β

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

101 / 143

Integrated processes Unit root tests

Null and alternative hypothesis H0 : φ = 1

(unit root)

H1 : |φ| < 1

(no unit root)

H0 : β = 0

(unit root)

H1 : β < 0

(no unit root)

or, equivalently,

Unit root tests are one-sided; explosive process are ruled out Rejecting the null hypothesis is evidence in favour of stationarity If the null hypothesis is not rejected, there could be a unit root Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

102 / 143

Integrated processes DF test and ADF test

Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests Possible regressions Xt = φXt−1 + εt Xt = a + φXt−1 + εt Xt = a + bt + φXt−1 + εt

or ∆Xt = βXt−1 + εt or ∆Xt = a + βXt−1 + εt or ∆Xt = a + bt + βXt−1 + εt

Assumption for Dickey-Fuller test: no autocorrelation in εt If there is autocorrelation in εt , use the augmented DF test

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

103 / 143

Integrated processes DF test and ADF test

Dickey-Fuller regression, case 1: no constant, no trend ∆Xt = βXt−1 + εt Null and alternative hypotheses H0 : β = 0 H1 : β < 0 Null hypothesis: stochastic trend without drift Alternative hypothesis: stationary process around zero

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

104 / 143

Integrated processes DF test and ADF test

Dickey-Fuller regression, case 2: constant, no trend ∆Xt = a + βXt−1 + εt Null and alternative hypotheses H0 : β = 0

or H0 : β = 0, a = 0

H1 : β < 0

or

H0 : β < 0, a 6= 0

Null hypothesis: stochastic trend without drift Alternative hypothesis: stationary process around a constant

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

105 / 143

Integrated processes DF test and ADF test

Dickey-Fuller regression, case 3: constant and trend ∆Xt = a + bt + βXt−1 + εt Null and alternative hypotheses H0 : β = 0

or β = 0, b = 0

H1 : β < 0

or

β < 0, b 6= 0

Null hypothesis: stochastic trend with drift Alternative hypothesis: trend stationary process

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

106 / 143

Integrated processes DF test and ADF test

Dickey-Fuller test statistics for single hypotheses “ρ-test” : “τ -test” :

T · βˆ ˆ σˆ β/ˆ φ

The τ -test statistic is computed in the same way as the usual t-test statistic Reject the null hypothesis if the test statistics are too small The critical values are not the quantiles of the t-distribution There are tables with the correct critical values (e.g. Hamilton, table B.6)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

107 / 143

Integrated processes DF test and ADF test

The Dickey-Fuller test statistics for the joint hypotheses are computed in the same way as the usual F -test statistics Reject the null hypothesis if the test statistic is too large The critical values are not the quantiles of the F -distribution There are tables with the correct critical values (e.g. Hamilton, table B.7) Illustrations (dftest.R)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

108 / 143

Integrated processes DF test and ADF test

If there is autocorrelation in εt the DF test does not work (dftest.R) Augmented Dickey-Fuller test (ADF test) regressions: ∆Xt = γ1 ∆Xt−1 + . . . + γp ∆Xt−p + βXt−1 + εt ∆Xt = a + γ1 ∆Xt−1 + . . . + γp ∆Xt−p + βXt−1 + εt ∆Xt = a + bt + γ1 ∆Xt−1 + . . . + γp ∆Xt−p + βXt−1 + εt The added lagged differences capture the autocorrelation The number of lags p must be large enough to make εt white noise The critical values remain the same as in the no-correlation case

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

109 / 143

Integrated processes DF test and ADF test

Further interesting topics (but we skip these) Phillips-Perron test Structural breaks and unit roots KPSS test of stationarity H0 : Xt ∼ I (0) H1 : Xt ∼ I (1)

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

110 / 143

Integrated processes Regression with integrated processes

Spurious regression: If Xt and Yt are independent but both I (1) then the regression Yt = α + βXt + ut will result in an estimated coefficient βˆ that is significantly different from 0 with probability 1 as T → ∞ BUT: The regression Yt = α + βXt + ut may be sensible even though Xt and Yt are I (1) Cointegration

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

111 / 143

Integrated processes Regression with integrated processes

Definition: Cointegration Two stochastic processes (Xt )t∈T and (Yt )t∈T are cointegrated if both processes are I (1) and there is a constant β such that the process (Yt − βXt ) is I (0) If β is known, cointegration can be tested using a standard unit root test on the process (Yt − βXt ) If β is unknown, it can be estimated from the linear regression Yt = α + βXt + ut and cointegration is tested using a modified unit root test on the residual process (uˆt )t=1,...,T Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

112 / 143

GARCH models Conditional expectation

Let (X , Y ) be a bivariate random variable with a joint density function, then Z ∞ E (X |Y = y ) = x fX |Y =y (x)dx −∞

is the conditional expectation of X given Y = y E (X |Y ) denotes a random variable with realization E (X |Y = y ) if the random variable Y realizes as y Both E (X |Y ) and E (X |Y = y ) are called conditional expectation

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

113 / 143

GARCH models Conditional variance

Let (X , Y ) be a bivariate random variable with a joint density function, then Z ∞ Var (X |Y = y ) = (x − E (X |Y = y ))2 fX |Y =y (x)dx −∞

is the conditional variance of X given Y = y Var (X |Y ) denotes a random variable with realization Var (X |Y = y ) if the random variable Y realizes as y Both Var (X |Y = y ) and Var (X |Y ) are called conditional variance

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

114 / 143

GARCH models Rules for conditional expectations

1

Law of iterated expectations: E (E (X |Y )) = E (X )

2

If X and Y are independent, then E (X |Y ) = E (X )

3

The condition can be treated like a constant, E (XY |Y ) = Y · E (X |Y )

4

The conditional expecation is a linear operator. For a1 , . . . , an ∈ R ! n n X X E ai Xi |Y = ai E (Xi |Y ) i=1

Andrea Beccarini (CQE)

i=1

Time Series Analysis

Winter 2013/2014

115 / 143

GARCH models Basics

Some economic time series show volatility clusters, e.g. stock returns, commodity price changes, inflation rates, . . . Simple autoregressive models cannot capture volatility clusters since their conditional variance is constant Example: Stationary AR(1)-process, Xt = αXt−1 + εt with |α| < 1; then σε2 Var (Xt ) = σX2 = , 1 − α2 and the conditional variance is Var (Xt |Xt−1 ) = σε2

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

116 / 143

GARCH models Basics

In the following, we will focus on stock returns Empirical fact: squared (or absolute) returns are positively autocorrelated Implication: Returns are not independent over time The dependence is nonlinear How can we model this kind of dependence?

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

117 / 143

GARCH models ARCH(1)-process

Definition: ARCH(1)-process The stochastic process (Xt )t∈Z is called ARCH(1)-process if E (Xt |Xt−1 ) = 0 Var (Xt |Xt−1 ) = σt2 2 = α0 + α1 Xt−1

for all t ∈ Z, with α0 , α1 > 0 Often, an additional assumption is 2 Xt | (Xt−1 = xt−1 ) ∼ N(0, α0 + α1 xt−1 )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

118 / 143

GARCH models ARCH(1)-process

The unconditional distribution of Xt is a non-normal distribution Leptokurtosis: The tails are heavier than the tails of the normal distribution Example of an ARCH(1)-process Xt = εt σt where (εt )t∈Z is white noise with σε2 = 1 and q 2 σt = α0 + α1 Xt−1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

119 / 143

GARCH models ARCH(1)-process

One can show that

[11]

E (Xt |Xt−1 ) = 0 E (Xt ) = 0 2 Var (Xt |Xt−1 ) = α0 + α1 Xt−1

Var (Xt ) = α0 / (1 − α1 ) Cov (Xt , Xt−i ) = 0

for i > 0

Stationarity condition: 0 < α1 < 1 The unconditional kurtosis is 3(1 − α12 )/(1 − 3α12 ) if εt ∼ N(0, 1). p If α1 > 1/3 = 0.57735, the kurtosis does not exist.

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

[12]

120 / 143

GARCH models ARCH(1)-process

Squared returns follow

[13]

2 Xt2 = α0 + α1 Xt−1 + vt

with vt = σt2 (ε2t − 1) Thus, squared returns of ARCH(1) are AR(1) The process (vt )t∈Z is white noise E (vt ) = 0 Var (vt ) = E (vt2 ) = const. Cov (vt , vt−i ) = 0

Andrea Beccarini (CQE)

Time Series Analysis

(i = 1, 2, . . .)

Winter 2013/2014

121 / 143

GARCH models ARCH(1)-process

Simulation of an ARCH(1)-process for t = 1, . . . , 2500 Parameters: α0 = 0.05, α1 = 0.95, start value X0 = 0 Conditional distribution: εt ∼ N(0, 1) archsim.R Check whether the simulated time series shows the typical stylized facts of return distributions

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

122 / 143

GARCH models Estimation of an ARCH(1)-process

Of course, we do not know the true values of the model parameters α0 and α1 How can we estimate the unknown parameters α0 and α1 ? Observations X1 , . . . , XT Because of 2 Xt2 = α0 + α1 Xt−1 + vt

a possible estimation method is OLS

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

123 / 143

GARCH models Estimation of an ARCH(1)-process

OLS estimator of α1  P α ˆ1 =

  2 − X2 Xt2 − Xt2 Xt−1 t−1 2 ≈ ρˆ(Xt2 , Xt−1 ) 2 PT  2 2 X − X t−1 t−1 t=2

T t=2

Careful: These p estimators are only consistent if the kurtosis exists (i.e. if α1 < 1/3) Test of ARCH-effects H0 : α1 = 0 H1 : α1 > 0

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

124 / 143

GARCH models Estimation of an ARCH(1)-process

For T large, under H0 √ Reject H0 if



Tα ˆ 1 ∼ N(0, 1)

Tα ˆ 1 > Φ−1 (1 − α)

Second version of this test: Consider the R 2 of the regression 2 + vt , Xt2 = α0 + α1 Xt−1

then under H0

appr

Tα ˆ 12 ≈ TR 2 ∼ χ21 Reject H0 if TR 2 > Fχ−1 2 (1 − α) 1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

125 / 143

GARCH models ARCH(p)-process

Definition: ARCH(p)-process The stochastic process (Xt )t∈Z is called ARCH(p)-process if E (Xt |Xt−1 , . . . Xt−p ) = 0 Var (Xt |Xt−1 , . . . , Xt−p ) = σt2 2 2 = α0 + α1 Xt−1 + . . . + αp Xt−p

for t ∈ Z, where αi ≥ 0 for i = 0, 1, . . . , p − 1 and αp > 0 Often, an additional assumption is that Xt |(Xt−1 = xt−1 , . . . , Xt−p = xt−p ) ∼ N(0, σt2 )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

126 / 143

GARCH models ARCH(p)-process

Example of an ARCH(p)-process Xt = εt σt where(εt )t∈Z is white noise with σε2 = 1 and q 2 + ... + α X2 σt = α0 + α1 Xt−1 p t−p An ARCH(p) process is weakly stationary if all roots of 1 − α1 z − α2 z 2 − . . . − αp z p = 0 are outside the unit circle Then, for all t ∈ Z, E (Xt ) = 0 and Var (Xt ) =

Andrea Beccarini (CQE)

1−

α P0p

Time Series Analysis

i=1 αi

Winter 2013/2014

127 / 143

GARCH models ARCH(p)-process

If (Xt )t∈Z is a stationary ARCH(p) process, then (Xt2 )t∈Z is a stationary AR(p) process 2 2 Xt2 = α0 + α1 Xt−1 + . . . + αp Xt−p + vt

As to the error term, E (vt ) = 0 Var (vt ) = const. Cov (vt , vt−i ) = 0

for i = 1, 2, . . .

Simulating an ARCH(p) is easy

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

128 / 143

GARCH models Estimation of ARCH(p) models

OLS estimation of 2 2 Xt2 = α0 + α1 Xt−1 + . . . + αp Xt−p + vt

Test of ARCH-effects H0 : α1 = α2 = . . . = αp = 0

vs H1 : not H0

Let R 2 denote the coefficient of determination of the regression Under H0 , the test statistic TR 2 ∼ χ2p ; thus reject H0 if TR 2 > Fχ−1 2 (1 − α) p

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

129 / 143

GARCH models Maximum likelihood estimation

Basic idea of the maximum likelihood estimation method: Choose parameters such that the joint density of the observations fX1 ,...,XT (x1 , . . . , xT ) is maximized Let X1 , . . . , XT denote a random sample from X The density fX (x; θ) depends on R unknown parameters θ = (θ1 , . . . , θR )

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

130 / 143

GARCH models Maximum likelihood estimation

ML estimation of θ: Maximize the (log)likelihood function L (θ) = fX1 ,...XT (x1 , . . . , xT ; θ) =

ln L (θ) =

T Y t=1 T X

fX (xt ; θ)

ln fX (xt ; θ)

t=1

ML estimate θˆ = argmax [ln L (θ)]

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

131 / 143

GARCH models Maximum likelihood estimation

Since observations are independent in random samples fX1 ,...,XT (x1 , . . . , xT ) =

T Y

fXt (xt )

t=1

or ln fX1 ,...,XT (x1 , . . . , xT ) =

T X

ln fXt (xt )

t=1

=

T X

ln fX (xt )

t=1

But: ARCH-returns are not independent! Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

132 / 143

GARCH models Maximum likelihood estimation

Factorization with dependent observations fX1 ,...,XT (x1 , . . . , xT ) =

T Y

fXt |Xt−1 ,...,X1 (xt |xt−1 , . . . , x1 )

t=1

or ln fX1 ,...,XT (x1 , . . . , xT ) =

T X

ln fXt |Xt−1 ,...,X1 (xt |xt−1 , . . . , x1 )

t=1

Hence, for an ARCH(1)-process T Y

1 1 fX1 ,...,XT (x1 , . . . , xT ) = fX1 (x1 ) √ p 2 exp − 2 2π σt t=2 Andrea Beccarini (CQE)

Time Series Analysis



xt σt

2 !

Winter 2013/2014

133 / 143

GARCH models Maximum likelihood estimation

The marginal density of X1 is complicated but becomes negligible for large T and, therefore, will be dropped from now on Log-likelihood function (without initial marginal density) ln L(α0 , α1 |x1 , . . . , xT ) T

T

t=2

t=2

T 1X 1X = − ln 2π − ln σt2 − 2 2 2



xt σt

2

2 where σt2 = α0 + α1 xt−1

ML-estimation of α0 and α1 by numerical maximization of ln L(α0 , α1 ) with respect to α0 and α1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

134 / 143

GARCH models GARCH(p,q)-process

Definition: GARCH(p,q)-process The stochastic process (Xt )t∈Z is called GARCH(p, q)-process if E (Xt |Xt−1 , Xt−2 , . . .) = 0 Var (Xt |Xt−1 , Xt−2 , . . .) = σt2 2 2 = α0 + α1 Xt−1 + . . . + αp Xt−p 2 2 +β1 σt−1 + . . . + βq σt−q

for t ∈ Z with αi , βi ≥ 0 Often, an additional assumption is that (Xt |Xt−1 = xt−1 , Xt−2 = xt−2 , . . .) ∼ N(0, σt2 ) Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

135 / 143

GARCH models GARCH(p,q)-process

Conditional variance of GARCH(1, 1) Var (Xt |Xt−1 , Xt−2 , . . .) = σt2 2 2 = α0 + α1 Xt−1 + β1 σt−1 ∞ X α0 2 = + α1 β1i−1 Xt−i 1 − β1 i=1

Unconditional variance Var (Xt ) =

Andrea Beccarini (CQE)

1−

α0 Pq i=1 αi − j=1 βj

Pp

Time Series Analysis

Winter 2013/2014

136 / 143

GARCH models GARCH(p,q)-process

Necessary condition for weak stationarity p X

αi +

i=1

q X

βj < 1

j=1

(Xt )t∈Z has no autocorrelation GARCH-processes can be written as ARMA(max (p, q) , q)-processes in the squared returns Example: GARCH(1, 1)-process with Xt = εt σt and 2 + β σ2 σt2 = α0 + α1 Xt−1 1 t−1

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

137 / 143

GARCH models Estimation of GARCH(p,q)-processes

Estimation of the ARMA(max (p, q) , q)-process in the squared returns Alternative (and better) method: Maximum likelihood For a GARCH(1, 1)-process fX1 ,...,XT (x1 , . . . , xT ) T Y

1 1 = fX1 (x1 ) √ p 2 exp − 2 2π σt t=2

Andrea Beccarini (CQE)

Time Series Analysis



xt σt

2 !

Winter 2013/2014

138 / 143

GARCH models Estimation of GARCH(p,q)-processes

Again, the density of X1 can be neglected Log-Likelihood function ln L(α0 , α1 , β1 |x1 , . . . , xT ) T

T

t=2

t=2

T 1X 1X = − ln 2π − ln σt2 − 2 2 2



xt σt

2

2 2 with σt2 = α0 + α1 xt−1 + β1 σt−1 and σ12 = 0

ML-estimation of α0 , α1 and β1 by numerical maximization

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

139 / 143

GARCH models Estimation of GARCH(p,q)-processes

2 Conditional h-step forecast of the volatility σt+h in a GARCH(1, 1) model    α0 2 h 2 E σt+h |Xt , Xt−1 , . . . = (α1 + β1 ) σt − 1 − α1 − β1 α0 + 1 − α1 − β 1

If the process is stationary 2 lim E (σt+h |Xt , Xt−1 , . . .) =

h→∞

α0 1 − α1 − β1

Simulation of GARCH-processes is easy; the estimation can be computer intensive Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

140 / 143

GARCH models Residuals of an estimated GARCH(1,1) model

Careful: Residuals are slightly different from what you know from OLS regressions Estimates: α ˆ0, α ˆ 1 , βˆ1 , µ ˆ 2 + β σ2 From σt2 = α0 + α1 Xt−1 1 t−1 and Xt = µ + σt εt we calculate the standardized residuals

εˆt =

Xt − µ ˆ Xt − µ ˆ =q σ ˆt 2 +β ˆ1 σ 2 α ˆ0 + α ˆ 1 Xt−1 t−1

Histogram of the standardized residuals

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

141 / 143

GARCH models AR(p)-ARCH(q)-models

Definition: (Xt )t∈Z is called AR(p)-ARCH(q)-process if Xt

= µ + φ1 Xt−1 + εt

σt2

= α0 + α1 ε2t−1

where εt ∼ N(0, σt2 ) mean equation / variance equation Maximum likelihood estimation

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

142 / 143

GARCH models Extensions of the GARCH model

There are a number of possible extensions to the GARCH model: Empirical fact: Negative shocks have a larger impact on volatility than positive shocks (leverage effect) News impact curve Nonnormal innovations, e.g. εt ∼ tν

Andrea Beccarini (CQE)

Time Series Analysis

Winter 2013/2014

143 / 143