ANALYSIS OF NETWORK TIME SERIES

Download Analysis of Network Time Series. Matt Nunes. Lancaster University with. Marina Knight and Guy Nason. Newton Institute, January 2014. Matt N...

1 downloads 710 Views 1MB Size
Analysis of Network Time Series Matt Nunes Lancaster University with Marina Knight and Guy Nason

Newton Institute, January 2014

Matt Nunes

Analysis of Network Time Series

1 / 22

Time series on networks Many multivariate time series we encounter in practice will be observed on a network / graph, e.g. transport data computer traffic social media

Matt Nunes

Analysis of Network Time Series

2 / 22

Time series on networks Many multivariate time series we encounter in practice will be observed on a network / graph, e.g. transport data computer traffic social media Data summary: time series observed on (large) network network is real, or can be constructed aim is efficient analysis e.g. forecasting the series

Matt Nunes

Analysis of Network Time Series

2 / 22

Toy example: Cases of Mumps 2005

Weekly cases of Mumps at county level for 2005 (T = 1, . . . , 52. Number of Counties: P = 47). data source: Health Protection Agency (thanks to D. Harding & D. DeAngelis). Matt Nunes

Analysis of Network Time Series

3 / 22

What is the data like?

0.2

80 20

0.0

60

Counties

0.4

40

0.6

Cases (Somerset=solid, Avon=dashed)

0.8

100

120

1.0 Wilt West West West Warw Wale Tyne Surr Suff Staf Sout Some Shro Oxfo Nott Nort Nort Nort Norf Mers Lond Linc Leic Lanc Kent Isle Humb Hert Here Hamp Grea Glou Esse East Durh Dors Devo Derb Cumb Corn Clev Ches Camb Buck Berk Bedf Avon

0

−0.2

Avon Bedf Berk Buck Camb Ches Clev Corn Cumb Derb Devo Dors Durh East Esse Glou Grea Hamp Here Hert Humb Isle Kent Lanc Leic Linc Lond Mers Norf Nort Nort Nort Nott Oxfo Shro Some Sout Staf Suff Surr Tyne Wale Warw West West West Wilt

−0.4 0

10

Counties

Matt Nunes

Analysis of Network Time Series

20

30

40

Time

4 / 22

Example: Wind speed data (t=1)

HUMBERSIDE

WINTER HILL ●







ROCHDALE EMLEY MOOR ● MANCHESTER

CROSBY ●

RINGWAY SPEKE









RHYL

VALLEY

HAWARDEN AIRP.







CRANWELL

EAST MIDS. ●





NOTTINGHAM ●

SHAWBURY



ABERDARON





WADDINGTON ● CONINGSBY

LEEK

CAPEL CURIG

DONNA NOOK SCAMPTON



WOODFORD



● ●



WAINFLEET





● ●



TRAWSGOED ●

SHOBDON AIRF. ●



ELMDON ●COVENTRY ● CHURCH LAWFORD









WATTISHAM

SENNYBRIDGE CREDENHILL ●





PEMBREY SANDS

● BRIZE NORTON ● FAIRFORD AIRF.



ANDREWSFIELD

LUTON

LITTLE RISSINGTON

MILFORD HAVEN



MILDENHALL

BEDFORD





LAKENHEATH

MONKS WOOD ●

PERSHORE

ABERPORTH

MARHAM

WITTERING

COLESHILL



WEYBOURNE



HOLBEACH

SUTTON BONINGTONCOTTESMORE

LAKE VYRNWY





STANSTED HIGH WYCOMBE ●

SOUTHEND AIRP. ● FILTON LONDON CITY ● ● BENSON NORTHOLT ● ● KEW ● MUMBLES HEAD ● SHOEBURYNESS ETON ● ● ● ● ST. ATHAN LYNEHAM HEATHROW ● GRAVESEND ● MANSTON ● AIRF. BRISTOL S. FARNBOROUGH ● KENLEY ● ● ● WISLEY LARKHILL ● EAST MALLING ● GATWICK ODIHAM ● ● ●● ● CHIVENORLISCOMBE BOSCOMBE DOWN MIDDLE WALLOP CHARLWOOD ● ● LANGDON BAY YEOVILTON ● SOTON:EASTLEIGH ● HERSTMONCEUX DUNKESWELL AEROD. ● ● SOTON:OCEANOG. ● THORNEY ISLAND ● HURN ● ● NORTH WYKE ● SHOREHAM AIRP. ● SOLENT ● EXETER AIRP. ● WIGHT:NEEDLES ● BODMIN ● WIGHT:ST. CATHERINES PT ● ISLE OF PORTLAND ●

● ●

CAMBORNE

PLYMOUTH

BERRY HEAD





CULDROSE

Hourly wind speeds collected at UK Met. Office weather stations (T = 721 records, P = 102 stations). Data source: British Atmospheric Data Centre (BADC) / Centre for Environmental Data Archival (CEDA). Matt Nunes

Analysis of Network Time Series

5 / 22

Correlation: Scampton and Rochdale

0.8 0.6 0.0

0.2

0.4

0.6 0.4 0.0

0.2

ACF

0.8

1.0

381 & 1125

1.0

381

10

15

20

25

5

15

1125 & 381

1125

20

25

20

25

0.8 0.0

0.2

0.4

0.6

0.8 0.6 0.4 0.0 −25

−20

−15

−10

−5

0

0

5

Lag

Matt Nunes

10 Lag

0.2

ACF

0

Lag

1.0

5

1.0

0

10

15 Lag

Analysis of Network Time Series

6 / 22

Where is the Network?

Mumps / wind data does not come with a “natural” network.

Matt Nunes

Analysis of Network Time Series

7 / 22

Where is the Network?

Mumps / wind data does not come with a “natural” network. Can use a tree structure to link weather stations by geography, or county borders to link data.

Matt Nunes

Analysis of Network Time Series

7 / 22

Where is the Network?

Mumps / wind data does not come with a “natural” network. Can use a tree structure to link weather stations by geography, or county borders to link data. Other possible ways to construct networks: link with transport corridors, e.g. rail connections, roads etc “friends” / followers in social media academic collaborations (common papers, number of citations)

Matt Nunes

Analysis of Network Time Series

7 / 22

County Network

● Morpeth ● Carlisle

● Newcastle ● Durham ● Middlesborough

● Lancaster ● York ● Leeds Kingston ●Upon Hull ● Manchester ● Liverpool ● Chester

● Sheffield

● Shrewsbury ● Stafford

● Rhayader

● Lincoln ● Derby ● Nottingham ● Leicester ● Norwich

● Birmingham ● Worcester ● Northampton ● Warwick ● Gloucester ● Oxford

● Bristol

● Devizes ● Taunton ● Exeter

● Winchester ● Dorchester

● Cambridge ● Bedford ● Ipswich ● Aylesbury ● Hertford ● Chelmsford ● London ● Reading

● Guildford

● Maidstone

● Lewes

● Newport ● Chichester

● Truro

Network structure using distances between county towns. Matt Nunes

Analysis of Network Time Series

8 / 22

Weather station network

HUMBERSIDE

WINTER HILL ●







ROCHDALE EMLEY MOOR ● MANCHESTER

CROSBY ●

RINGWAY

HAWARDEN AIRP.



WADDINGTON ● CONINGSBY



LEEK

CAPEL CURIG









NOTTINGHAM EAST MIDS.

SHAWBURY









ABERDARON

DONNA NOOK SCAMPTON



WOODFORD

SPEKE









RHYL

VALLEY



CRANWELL

● ●



WAINFLEET





● ●





SHOBDON AIRF. ●



ELMDON ●COVENTRY ● CHURCH LAWFORD

LAKENHEATH

MONKS WOOD ●



BEDFORD







WATTISHAM

SENNYBRIDGE CREDENHILL LUTON

LITTLE RISSINGTON







PEMBREY SANDS

● BRIZE NORTON ● FAIRFORD AIRF.



MILFORD HAVEN



MILDENHALL ●

PERSHORE



MARHAM

WITTERING

COLESHILL TRAWSGOED ABERPORTH ●

WEYBOURNE



HOLBEACH

SUTTON BONINGTONCOTTESMORE

LAKE VYRNWY

ANDREWSFIELD ●



STANSTED HIGH WYCOMBE ●

● SOUTHEND AIRP. LONDON CITY ● ● BENSON NORTHOLT ● KEW ● SHOEBURYNESS ETON ● ● ● ● HEATHROW GRAVESEND MANSTON ● AIRF. S. FARNBOROUGH ● KENLEY ● ● ● WISLEY LARKHILL ● EAST MALLING ● GATWICK ODIHAM ● ● ●● ● CHIVENORLISCOMBE BOSCOMBE DOWN ● MIDDLE WALLOP CHARLWOOD ● LANGDON BAY YEOVILTON ● SOTON:EASTLEIGH ● HERSTMONCEUX DUNKESWELL AEROD. ● ● SOTON:OCEANOG. ● THORNEY ISLAND ● HURN ● ● NORTH WYKE ● SHOREHAM AIRP. ● SOLENT ● EXETER AIRP. ● WIGHT:NEEDLES ● BODMIN ● WIGHT:ST. CATHERINES PT ● ISLE OF PORTLAND ●

MUMBLES HEAD ST. ATHAN ●

FILTON ●



LYNEHAM



BRISTOL

● ●

CAMBORNE

PLYMOUTH

BERRY HEAD





CULDROSE

Weather stations network created by minimal spanning tree. Matt Nunes

Analysis of Network Time Series

9 / 22

Analysis question Characteristics of network data: complex structure over time lots of dependence between series possible redundancy in the system

Matt Nunes

Analysis of Network Time Series

10 / 22

Analysis question Characteristics of network data: complex structure over time lots of dependence between series possible redundancy in the system Can we capture the network structure? Can we reduce the dimension of the analysis problem?

Matt Nunes

Analysis of Network Time Series

10 / 22

Analysis question Characteristics of network data: complex structure over time lots of dependence between series possible redundancy in the system Can we capture the network structure? Can we reduce the dimension of the analysis problem? Trick: transform complex data into simpler time series for data analysis (dimension reduction) with lifting schemes

Matt Nunes

Analysis of Network Time Series

10 / 22

Dimension reduction with lifting Lifting schemes (Sweldens 1995) can provide efficient representations for data arising on complex data domains e.g. graphs / networks (“wavelet transform on a network”).

Matt Nunes

Analysis of Network Time Series

11 / 22

Dimension reduction with lifting Lifting schemes (Sweldens 1995) can provide efficient representations for data arising on complex data domains e.g. graphs / networks (“wavelet transform on a network”).

Why lifting? Wavelets are well-known for their ability to decorrelate systems (for decorrelation properties of lifting for time series, see KNN2014 in preparation)

Lifting schemes can naturally account for network structure They are computationally fast and invertible

Matt Nunes

Analysis of Network Time Series

11 / 22

Network lifting algorithm: NetTree1

7 3

2

3

4

1

3 2

6

4

5 4

4

Split: Pick network node to be ‘lifted’.

1

5

Jansen, M., Nason, G.P. and Silverman, B.W. (2009) Multiscale methods for data on graphs and irregular

multidimensional situations. J. Roy. Statist. Soc. Series B, 71, 97–126. Matt Nunes

Analysis of Network Time Series

12 / 22

Network lifting algorithm: NetTree1

7

Predict: Use function values at neigbouring nodes Ji to predict ci using distance-weighted average. The lifted coefficient is the residual P di = ci − cˆi = ci − j∈Ji ai,j cj .

1

3

2

3

4

1

−5/3 2

6

4

5 4

4 5

Jansen, M., Nason, G.P. and Silverman, B.W. (2009) Multiscale methods for data on graphs and irregular

multidimensional situations. J. Roy. Statist. Soc. Series B, 71, 97–126. Matt Nunes

Analysis of Network Time Series

12 / 22

Network lifting algorithm: NetTree1

6.63 3

1.59 1

−5/3 2

Update: Remove node i and redistribute lost signal content to neighbours using the lifted coefficient di .

1

4.58 4

3.52 5

Jansen, M., Nason, G.P. and Silverman, B.W. (2009) Multiscale methods for data on graphs and irregular

multidimensional situations. J. Roy. Statist. Soc. Series B, 71, 97–126. Matt Nunes

Analysis of Network Time Series

12 / 22

Network lifting algorithm: NetTree1

6.63 3

1.59

Repeat 1–3 until no network nodes are left.

1

We get

−5/3 2

4.58 4

3.52 5

Y = WX .

1

Jansen, M., Nason, G.P. and Silverman, B.W. (2009) Multiscale methods for data on graphs and irregular

multidimensional situations. J. Roy. Statist. Soc. Series B, 71, 97–126. Matt Nunes

Analysis of Network Time Series

12 / 22

Dimension reduction for network time series Proposed dimension reduction procedure: For each timepoint t = 1, . . . , T , perform NetTree lifting algorithm

Matt Nunes

Analysis of Network Time Series

13 / 22

Dimension reduction for network time series Proposed dimension reduction procedure: For each timepoint t = 1, . . . , T , perform NetTree lifting algorithm Get new time series in the wavelet domain: Yt = W(Xt ), wavelet coefficients series: P − Nc series; scaling coefficients series: Nc

Matt Nunes

Analysis of Network Time Series

13 / 22

Dimension reduction for network time series Proposed dimension reduction procedure: For each timepoint t = 1, . . . , T , perform NetTree lifting algorithm Get new time series in the wavelet domain: Yt = W(Xt ), wavelet coefficients series: P − Nc series; scaling coefficients series: Nc the transform takes into account network structure the resulting time series object is sparse

Matt Nunes

Analysis of Network Time Series

13 / 22

Dimension reduction for network time series Proposed dimension reduction procedure: For each timepoint t = 1, . . . , T , perform NetTree lifting algorithm Get new time series in the wavelet domain: Yt = W(Xt ), wavelet coefficients series: P − Nc series; scaling coefficients series: Nc the transform takes into account network structure the resulting time series object is sparse What does this sparsity give us? Matt Nunes

Analysis of Network Time Series

13 / 22

Mumps: Correlation before lifting

0.6 0.2 −0.2

−0.2

0.2

ACF

0.6

1.0

Durham & Tyne and Wear

1.0

Durham

2

4

6

8

10

12

14

0

2

4

6

8

10

Lag

Tyne and Wear & Durham

Tyne and Wear

12

14

12

14

0.6 0.2 −0.2

−0.2

0.2

ACF

0.6

1.0

Lag

1.0

0

−14

−10 −8

−6

−4

−2

0

0

2

Lag

Matt Nunes

4

6

8

10

Lag

Analysis of Network Time Series

14 / 22

Mumps: Correlation after lifting

0.6 0.2 −0.2

−0.2

0.2

ACF

0.6

1.0

Durham & Tyne and Wear

1.0

Durham

2

4

6

8

10

12

14

0

2

4

6

8

10

Lag

Tyne and Wear & Durham

Tyne and Wear

12

14

12

14

0.6 0.2 −0.2

−0.2

0.2

ACF

0.6

1.0

Lag

1.0

0

−14

−10 −8

−6

−4

−2

0

0

2

Lag

Matt Nunes

4

6

8

10

Lag

Analysis of Network Time Series

15 / 22

Shobdon Airf. and Coleshill: ACF (before lifting)

0.8 0.6 0.0

0.2

0.4

0.6 0.4 0.0

0.2

ACF

0.8

1.0

669 & 19187

1.0

669

10

15

20

25

5

15

19187 & 669

19187

20

25

20

25

0.8 0.0

0.2

0.4

0.6

0.8 0.6 0.4 0.0 −25

−20

−15

−10

−5

0

0

5

Lag

Matt Nunes

10 Lag

0.2

ACF

0

Lag

1.0

5

1.0

0

10

15 Lag

Analysis of Network Time Series

16 / 22

Shobdon Airf. and Coleshill: ACF (after lifting)

ACF

0.0 0.2 0.4 0.6 0.8 1.0 10

15

20

25

0

5

−25

15 Lag

19187 & 669

19187

−20

−15

−10

−5

0

0

5

Lag

Matt Nunes

10

Lag

20

25

20

25

0.0 0.2 0.4 0.6 0.8 1.0

5

0.0 0.2 0.4 0.6 0.8 1.0

0

ACF

669 & 19187

0.0 0.2 0.4 0.6 0.8 1.0

669

10

15 Lag

Analysis of Network Time Series

17 / 22

A Model

Massive decorrelation across network and in time...

Matt Nunes

Analysis of Network Time Series

18 / 22

A Model

Massive decorrelation across network and in time... Let G be the graph (network), and let S(G) be set of scaling function coefficient nodes.

Matt Nunes

Analysis of Network Time Series

18 / 22

A Model

Massive decorrelation across network and in time... Let G be the graph (network), and let S(G) be set of scaling function coefficient nodes. Then divide the wavelet coefficients into TWO sets:

Matt Nunes

Analysis of Network Time Series

18 / 22

A Model

Massive decorrelation across network and in time... Let G be the graph (network), and let S(G) be set of scaling function coefficient nodes. Then divide the wavelet coefficients into TWO sets: Z(G) set of nodes whose series are white noise. Q(G) set of nodes whose series are not white noise.

Matt Nunes

Analysis of Network Time Series

18 / 22

A Model

Massive decorrelation across network and in time... Let G be the graph (network), and let S(G) be set of scaling function coefficient nodes. Then divide the wavelet coefficients into TWO sets: Z(G) set of nodes whose series are white noise. Q(G) set of nodes whose series are not white noise. Can model Q(G) as appropriate, e.g. low-order ARMA.

Matt Nunes

Analysis of Network Time Series

18 / 22

A Model

For the Mumps Data: Scaling coefficients: S(G) = {30, 47} = {“NorthYorkshire00 , “Wiltshire00 }.

Matt Nunes

Analysis of Network Time Series

19 / 22

A Model

For the Mumps Data: Scaling coefficients: S(G) = {30, 47} = {“NorthYorkshire00 , “Wiltshire00 }. Of the remaining 45 time series:

Matt Nunes

Analysis of Network Time Series

19 / 22

A Model

For the Mumps Data: Scaling coefficients: S(G) = {30, 47} = {“NorthYorkshire00 , “Wiltshire00 }. Of the remaining 45 time series: 28 are deemed to be white noise Z(G); 17 are low order ARMA, Q(G).

Matt Nunes

Analysis of Network Time Series

19 / 22

A Model

For the Mumps Data: Scaling coefficients: S(G) = {30, 47} = {“NorthYorkshire00 , “Wiltshire00 }. Of the remaining 45 time series: 28 are deemed to be white noise Z(G); 17 are low order ARMA, Q(G). That is over 60% of the series are white noise.

Matt Nunes

Analysis of Network Time Series

19 / 22

Potential benefit for forecasting

Decorrelation across network means can treat series separately.

Matt Nunes

Analysis of Network Time Series

20 / 22

Potential benefit for forecasting

Decorrelation across network means can treat series separately. Forecasting procedure: 1

Perform network lifting dimension reduction procedure.

Matt Nunes

Analysis of Network Time Series

20 / 22

Potential benefit for forecasting

Decorrelation across network means can treat series separately. Forecasting procedure: 1 2

Perform network lifting dimension reduction procedure. (Few) scaling coefficient series in S(G) can be extrapolated forward. White noise processes in Z(G): extremely easy to predict (i.e. the mean). Q(G): use favourite forecasting method, e.g. Box-Jenkins for ARMA processes.

Matt Nunes

Analysis of Network Time Series

20 / 22

Potential benefit for forecasting

Decorrelation across network means can treat series separately. Forecasting procedure: 1 2

3

Perform network lifting dimension reduction procedure. (Few) scaling coefficient series in S(G) can be extrapolated forward. White noise processes in Z(G): extremely easy to predict (i.e. the mean). Q(G): use favourite forecasting method, e.g. Box-Jenkins for ARMA processes.

Then invert the lifting transform to obtain forecasts for the original series.

Matt Nunes

Analysis of Network Time Series

20 / 22

Improved Forecasting Performance

HUMBERSIDE

WINTER HILL ●







ROCHDALE EMLEY MOOR ● MANCHESTER

CROSBY ●

RINGWAY SPEKE









RHYL

VALLEY



LEEK

CAPEL CURIG









NOTTINGHAM EAST MIDS.

SHAWBURY



ABERDARON





WADDINGTON ● CONINGSBY



HAWARDEN AIRP.



DONNA NOOK SCAMPTON



WOODFORD



CRANWELL

● ●



WAINFLEET





● ●



TRAWSGOED ●

SHOBDON AIRF. ●



ELMDON ●COVENTRY ● CHURCH LAWFORD

LAKENHEATH









WATTISHAM

SENNYBRIDGE CREDENHILL ●



PEMBREY SANDS

● BRIZE NORTON ● FAIRFORD AIRF.



MILFORD HAVEN

ANDREWSFIELD

LUTON

LITTLE RISSINGTON ●



MILDENHALL

BEDFORD





MONKS WOOD ●

PERSHORE

ABERPORTH

MARHAM

WITTERING

COLESHILL



WEYBOURNE



HOLBEACH

SUTTON BONINGTONCOTTESMORE

LAKE VYRNWY





STANSTED HIGH WYCOMBE ●

● SOUTHEND AIRP. LONDON CITY ● ● BENSON NORTHOLT ● ● KEW ● ● SHOEBURYNESS ETON ● ● ● ● LYNEHAM HEATHROW GRAVESEND MANSTON ● AIRF. S. FARNBOROUGH ● KENLEY ● ● ● WISLEY LARKHILL ● EAST MALLING ● GATWICK ODIHAM ● ● ●● ● BOSCOMBE DOWN ● MIDDLE WALLOP CHARLWOOD ● LANGDON BAY YEOVILTON ● SOTON:EASTLEIGH ● HERSTMONCEUX DUNKESWELL AEROD. ● ● SOTON:OCEANOG. ● THORNEY ISLAND ● HURN ● ● NORTH WYKE ● SHOREHAM AIRP. ● SOLENT ● EXETER AIRP. ● WIGHT:NEEDLES ● ● WIGHT:ST. CATHERINES PT ISLE OF PORTLAND ●

MUMBLES HEAD ST. ATHAN ●

FILTON ●

BRISTOL

CHIVENORLISCOMBE

BODMIN ●

● ●

CAMBORNE

PLYMOUTH

BERRY HEAD





CULDROSE

Forecasting performance improves when compared to time domain forecasting (ignoring network structure) ∼ 20%. Additional improvement gained by bootstrapping over lifting orderings (“nondecimation”) and aggregating forecasts. Matt Nunes

Analysis of Network Time Series

21 / 22

Summary

Developed a new technique for network time series data reduction, which takes into account network structure decorrelation much simpler modelling and forecasting in the transform domain Preliminary results encouraging.

Matt Nunes

Analysis of Network Time Series

22 / 22

Summary

Developed a new technique for network time series data reduction, which takes into account network structure decorrelation much simpler modelling and forecasting in the transform domain Preliminary results encouraging. Open questions: Choice of network Optimal / adaptive lifting strategies for maximal decorrelation effect Other analysis tasks, ...e.g. changepoints?

Matt Nunes

Analysis of Network Time Series

22 / 22

References

Sweldens, W. (1995). The lifting scheme: A new philosophy in biorthogonal wavelet construction. In A. Laine and M. Unser (Eds.), Wavelet Applications in Signal and Image Processing III, pp. 68–79. Proc. SPIE 2569. Jansen, M., Nason, G.P. and Silverman, B.W. (2009). Multiscale methods for data on graphs and irregular multidimensional situations. J. Roy. Statist. Soc. B 71(2), 97–126. Knight, M. I., Nason, G. P. and Nunes, M. A. (2014). Hurst exponent estimation for long memory processes using wavelet lifting (in preparation).

Matt Nunes

Analysis of Network Time Series

22 / 22