Analysis of Network Time Series Matt Nunes Lancaster University with Marina Knight and Guy Nason
Newton Institute, January 2014
Matt Nunes
Analysis of Network Time Series
1 / 22
Time series on networks Many multivariate time series we encounter in practice will be observed on a network / graph, e.g. transport data computer traffic social media
Matt Nunes
Analysis of Network Time Series
2 / 22
Time series on networks Many multivariate time series we encounter in practice will be observed on a network / graph, e.g. transport data computer traffic social media Data summary: time series observed on (large) network network is real, or can be constructed aim is efficient analysis e.g. forecasting the series
Matt Nunes
Analysis of Network Time Series
2 / 22
Toy example: Cases of Mumps 2005
Weekly cases of Mumps at county level for 2005 (T = 1, . . . , 52. Number of Counties: P = 47). data source: Health Protection Agency (thanks to D. Harding & D. DeAngelis). Matt Nunes
Analysis of Network Time Series
3 / 22
What is the data like?
0.2
80 20
0.0
60
Counties
0.4
40
0.6
Cases (Somerset=solid, Avon=dashed)
0.8
100
120
1.0 Wilt West West West Warw Wale Tyne Surr Suff Staf Sout Some Shro Oxfo Nott Nort Nort Nort Norf Mers Lond Linc Leic Lanc Kent Isle Humb Hert Here Hamp Grea Glou Esse East Durh Dors Devo Derb Cumb Corn Clev Ches Camb Buck Berk Bedf Avon
0
−0.2
Avon Bedf Berk Buck Camb Ches Clev Corn Cumb Derb Devo Dors Durh East Esse Glou Grea Hamp Here Hert Humb Isle Kent Lanc Leic Linc Lond Mers Norf Nort Nort Nort Nott Oxfo Shro Some Sout Staf Suff Surr Tyne Wale Warw West West West Wilt
−0.4 0
10
Counties
Matt Nunes
Analysis of Network Time Series
20
30
40
Time
4 / 22
Example: Wind speed data (t=1)
HUMBERSIDE
WINTER HILL ●
●
●
●
ROCHDALE EMLEY MOOR ● MANCHESTER
CROSBY ●
RINGWAY SPEKE
●
●
●
●
RHYL
VALLEY
HAWARDEN AIRP.
●
●
●
CRANWELL
EAST MIDS. ●
●
●
NOTTINGHAM ●
SHAWBURY
●
ABERDARON
●
●
WADDINGTON ● CONINGSBY
LEEK
CAPEL CURIG
DONNA NOOK SCAMPTON
●
WOODFORD
●
● ●
●
WAINFLEET
●
●
● ●
●
TRAWSGOED ●
SHOBDON AIRF. ●
●
ELMDON ●COVENTRY ● CHURCH LAWFORD
●
●
●
●
WATTISHAM
SENNYBRIDGE CREDENHILL ●
●
●
PEMBREY SANDS
● BRIZE NORTON ● FAIRFORD AIRF.
●
ANDREWSFIELD
LUTON
LITTLE RISSINGTON
MILFORD HAVEN
●
MILDENHALL
BEDFORD
●
●
LAKENHEATH
MONKS WOOD ●
PERSHORE
ABERPORTH
MARHAM
WITTERING
COLESHILL
●
WEYBOURNE
●
HOLBEACH
SUTTON BONINGTONCOTTESMORE
LAKE VYRNWY
●
●
STANSTED HIGH WYCOMBE ●
SOUTHEND AIRP. ● FILTON LONDON CITY ● ● BENSON NORTHOLT ● ● KEW ● MUMBLES HEAD ● SHOEBURYNESS ETON ● ● ● ● ST. ATHAN LYNEHAM HEATHROW ● GRAVESEND ● MANSTON ● AIRF. BRISTOL S. FARNBOROUGH ● KENLEY ● ● ● WISLEY LARKHILL ● EAST MALLING ● GATWICK ODIHAM ● ● ●● ● CHIVENORLISCOMBE BOSCOMBE DOWN MIDDLE WALLOP CHARLWOOD ● ● LANGDON BAY YEOVILTON ● SOTON:EASTLEIGH ● HERSTMONCEUX DUNKESWELL AEROD. ● ● SOTON:OCEANOG. ● THORNEY ISLAND ● HURN ● ● NORTH WYKE ● SHOREHAM AIRP. ● SOLENT ● EXETER AIRP. ● WIGHT:NEEDLES ● BODMIN ● WIGHT:ST. CATHERINES PT ● ISLE OF PORTLAND ●
● ●
CAMBORNE
PLYMOUTH
BERRY HEAD
●
●
CULDROSE
Hourly wind speeds collected at UK Met. Office weather stations (T = 721 records, P = 102 stations). Data source: British Atmospheric Data Centre (BADC) / Centre for Environmental Data Archival (CEDA). Matt Nunes
Analysis of Network Time Series
5 / 22
Correlation: Scampton and Rochdale
0.8 0.6 0.0
0.2
0.4
0.6 0.4 0.0
0.2
ACF
0.8
1.0
381 & 1125
1.0
381
10
15
20
25
5
15
1125 & 381
1125
20
25
20
25
0.8 0.0
0.2
0.4
0.6
0.8 0.6 0.4 0.0 −25
−20
−15
−10
−5
0
0
5
Lag
Matt Nunes
10 Lag
0.2
ACF
0
Lag
1.0
5
1.0
0
10
15 Lag
Analysis of Network Time Series
6 / 22
Where is the Network?
Mumps / wind data does not come with a “natural” network.
Matt Nunes
Analysis of Network Time Series
7 / 22
Where is the Network?
Mumps / wind data does not come with a “natural” network. Can use a tree structure to link weather stations by geography, or county borders to link data.
Matt Nunes
Analysis of Network Time Series
7 / 22
Where is the Network?
Mumps / wind data does not come with a “natural” network. Can use a tree structure to link weather stations by geography, or county borders to link data. Other possible ways to construct networks: link with transport corridors, e.g. rail connections, roads etc “friends” / followers in social media academic collaborations (common papers, number of citations)
Matt Nunes
Analysis of Network Time Series
7 / 22
County Network
● Morpeth ● Carlisle
● Newcastle ● Durham ● Middlesborough
● Lancaster ● York ● Leeds Kingston ●Upon Hull ● Manchester ● Liverpool ● Chester
● Sheffield
● Shrewsbury ● Stafford
● Rhayader
● Lincoln ● Derby ● Nottingham ● Leicester ● Norwich
● Birmingham ● Worcester ● Northampton ● Warwick ● Gloucester ● Oxford
● Bristol
● Devizes ● Taunton ● Exeter
● Winchester ● Dorchester
● Cambridge ● Bedford ● Ipswich ● Aylesbury ● Hertford ● Chelmsford ● London ● Reading
● Guildford
● Maidstone
● Lewes
● Newport ● Chichester
● Truro
Network structure using distances between county towns. Matt Nunes
Analysis of Network Time Series
8 / 22
Weather station network
HUMBERSIDE
WINTER HILL ●
●
●
●
ROCHDALE EMLEY MOOR ● MANCHESTER
CROSBY ●
RINGWAY
HAWARDEN AIRP.
●
WADDINGTON ● CONINGSBY
●
LEEK
CAPEL CURIG
●
●
●
●
NOTTINGHAM EAST MIDS.
SHAWBURY
●
●
●
●
ABERDARON
DONNA NOOK SCAMPTON
●
WOODFORD
SPEKE
●
●
●
●
RHYL
VALLEY
●
CRANWELL
● ●
●
WAINFLEET
●
●
● ●
●
●
SHOBDON AIRF. ●
●
ELMDON ●COVENTRY ● CHURCH LAWFORD
LAKENHEATH
MONKS WOOD ●
●
BEDFORD
●
●
●
WATTISHAM
SENNYBRIDGE CREDENHILL LUTON
LITTLE RISSINGTON
●
●
●
PEMBREY SANDS
● BRIZE NORTON ● FAIRFORD AIRF.
●
MILFORD HAVEN
●
MILDENHALL ●
PERSHORE
●
MARHAM
WITTERING
COLESHILL TRAWSGOED ABERPORTH ●
WEYBOURNE
●
HOLBEACH
SUTTON BONINGTONCOTTESMORE
LAKE VYRNWY
ANDREWSFIELD ●
●
STANSTED HIGH WYCOMBE ●
● SOUTHEND AIRP. LONDON CITY ● ● BENSON NORTHOLT ● KEW ● SHOEBURYNESS ETON ● ● ● ● HEATHROW GRAVESEND MANSTON ● AIRF. S. FARNBOROUGH ● KENLEY ● ● ● WISLEY LARKHILL ● EAST MALLING ● GATWICK ODIHAM ● ● ●● ● CHIVENORLISCOMBE BOSCOMBE DOWN ● MIDDLE WALLOP CHARLWOOD ● LANGDON BAY YEOVILTON ● SOTON:EASTLEIGH ● HERSTMONCEUX DUNKESWELL AEROD. ● ● SOTON:OCEANOG. ● THORNEY ISLAND ● HURN ● ● NORTH WYKE ● SHOREHAM AIRP. ● SOLENT ● EXETER AIRP. ● WIGHT:NEEDLES ● BODMIN ● WIGHT:ST. CATHERINES PT ● ISLE OF PORTLAND ●
MUMBLES HEAD ST. ATHAN ●
FILTON ●
●
LYNEHAM
●
BRISTOL
● ●
CAMBORNE
PLYMOUTH
BERRY HEAD
●
●
CULDROSE
Weather stations network created by minimal spanning tree. Matt Nunes
Analysis of Network Time Series
9 / 22
Analysis question Characteristics of network data: complex structure over time lots of dependence between series possible redundancy in the system
Matt Nunes
Analysis of Network Time Series
10 / 22
Analysis question Characteristics of network data: complex structure over time lots of dependence between series possible redundancy in the system Can we capture the network structure? Can we reduce the dimension of the analysis problem?
Matt Nunes
Analysis of Network Time Series
10 / 22
Analysis question Characteristics of network data: complex structure over time lots of dependence between series possible redundancy in the system Can we capture the network structure? Can we reduce the dimension of the analysis problem? Trick: transform complex data into simpler time series for data analysis (dimension reduction) with lifting schemes
Matt Nunes
Analysis of Network Time Series
10 / 22
Dimension reduction with lifting Lifting schemes (Sweldens 1995) can provide efficient representations for data arising on complex data domains e.g. graphs / networks (“wavelet transform on a network”).
Matt Nunes
Analysis of Network Time Series
11 / 22
Dimension reduction with lifting Lifting schemes (Sweldens 1995) can provide efficient representations for data arising on complex data domains e.g. graphs / networks (“wavelet transform on a network”).
Why lifting? Wavelets are well-known for their ability to decorrelate systems (for decorrelation properties of lifting for time series, see KNN2014 in preparation)
Lifting schemes can naturally account for network structure They are computationally fast and invertible
Matt Nunes
Analysis of Network Time Series
11 / 22
Network lifting algorithm: NetTree1
7 3
2
3
4
1
3 2
6
4
5 4
4
Split: Pick network node to be ‘lifted’.
1
5
Jansen, M., Nason, G.P. and Silverman, B.W. (2009) Multiscale methods for data on graphs and irregular
multidimensional situations. J. Roy. Statist. Soc. Series B, 71, 97–126. Matt Nunes
Analysis of Network Time Series
12 / 22
Network lifting algorithm: NetTree1
7
Predict: Use function values at neigbouring nodes Ji to predict ci using distance-weighted average. The lifted coefficient is the residual P di = ci − cˆi = ci − j∈Ji ai,j cj .
1
3
2
3
4
1
−5/3 2
6
4
5 4
4 5
Jansen, M., Nason, G.P. and Silverman, B.W. (2009) Multiscale methods for data on graphs and irregular
multidimensional situations. J. Roy. Statist. Soc. Series B, 71, 97–126. Matt Nunes
Analysis of Network Time Series
12 / 22
Network lifting algorithm: NetTree1
6.63 3
1.59 1
−5/3 2
Update: Remove node i and redistribute lost signal content to neighbours using the lifted coefficient di .
1
4.58 4
3.52 5
Jansen, M., Nason, G.P. and Silverman, B.W. (2009) Multiscale methods for data on graphs and irregular
multidimensional situations. J. Roy. Statist. Soc. Series B, 71, 97–126. Matt Nunes
Analysis of Network Time Series
12 / 22
Network lifting algorithm: NetTree1
6.63 3
1.59
Repeat 1–3 until no network nodes are left.
1
We get
−5/3 2
4.58 4
3.52 5
Y = WX .
1
Jansen, M., Nason, G.P. and Silverman, B.W. (2009) Multiscale methods for data on graphs and irregular
multidimensional situations. J. Roy. Statist. Soc. Series B, 71, 97–126. Matt Nunes
Analysis of Network Time Series
12 / 22
Dimension reduction for network time series Proposed dimension reduction procedure: For each timepoint t = 1, . . . , T , perform NetTree lifting algorithm
Matt Nunes
Analysis of Network Time Series
13 / 22
Dimension reduction for network time series Proposed dimension reduction procedure: For each timepoint t = 1, . . . , T , perform NetTree lifting algorithm Get new time series in the wavelet domain: Yt = W(Xt ), wavelet coefficients series: P − Nc series; scaling coefficients series: Nc
Matt Nunes
Analysis of Network Time Series
13 / 22
Dimension reduction for network time series Proposed dimension reduction procedure: For each timepoint t = 1, . . . , T , perform NetTree lifting algorithm Get new time series in the wavelet domain: Yt = W(Xt ), wavelet coefficients series: P − Nc series; scaling coefficients series: Nc the transform takes into account network structure the resulting time series object is sparse
Matt Nunes
Analysis of Network Time Series
13 / 22
Dimension reduction for network time series Proposed dimension reduction procedure: For each timepoint t = 1, . . . , T , perform NetTree lifting algorithm Get new time series in the wavelet domain: Yt = W(Xt ), wavelet coefficients series: P − Nc series; scaling coefficients series: Nc the transform takes into account network structure the resulting time series object is sparse What does this sparsity give us? Matt Nunes
Analysis of Network Time Series
13 / 22
Mumps: Correlation before lifting
0.6 0.2 −0.2
−0.2
0.2
ACF
0.6
1.0
Durham & Tyne and Wear
1.0
Durham
2
4
6
8
10
12
14
0
2
4
6
8
10
Lag
Tyne and Wear & Durham
Tyne and Wear
12
14
12
14
0.6 0.2 −0.2
−0.2
0.2
ACF
0.6
1.0
Lag
1.0
0
−14
−10 −8
−6
−4
−2
0
0
2
Lag
Matt Nunes
4
6
8
10
Lag
Analysis of Network Time Series
14 / 22
Mumps: Correlation after lifting
0.6 0.2 −0.2
−0.2
0.2
ACF
0.6
1.0
Durham & Tyne and Wear
1.0
Durham
2
4
6
8
10
12
14
0
2
4
6
8
10
Lag
Tyne and Wear & Durham
Tyne and Wear
12
14
12
14
0.6 0.2 −0.2
−0.2
0.2
ACF
0.6
1.0
Lag
1.0
0
−14
−10 −8
−6
−4
−2
0
0
2
Lag
Matt Nunes
4
6
8
10
Lag
Analysis of Network Time Series
15 / 22
Shobdon Airf. and Coleshill: ACF (before lifting)
0.8 0.6 0.0
0.2
0.4
0.6 0.4 0.0
0.2
ACF
0.8
1.0
669 & 19187
1.0
669
10
15
20
25
5
15
19187 & 669
19187
20
25
20
25
0.8 0.0
0.2
0.4
0.6
0.8 0.6 0.4 0.0 −25
−20
−15
−10
−5
0
0
5
Lag
Matt Nunes
10 Lag
0.2
ACF
0
Lag
1.0
5
1.0
0
10
15 Lag
Analysis of Network Time Series
16 / 22
Shobdon Airf. and Coleshill: ACF (after lifting)
ACF
0.0 0.2 0.4 0.6 0.8 1.0 10
15
20
25
0
5
−25
15 Lag
19187 & 669
19187
−20
−15
−10
−5
0
0
5
Lag
Matt Nunes
10
Lag
20
25
20
25
0.0 0.2 0.4 0.6 0.8 1.0
5
0.0 0.2 0.4 0.6 0.8 1.0
0
ACF
669 & 19187
0.0 0.2 0.4 0.6 0.8 1.0
669
10
15 Lag
Analysis of Network Time Series
17 / 22
A Model
Massive decorrelation across network and in time...
Matt Nunes
Analysis of Network Time Series
18 / 22
A Model
Massive decorrelation across network and in time... Let G be the graph (network), and let S(G) be set of scaling function coefficient nodes.
Matt Nunes
Analysis of Network Time Series
18 / 22
A Model
Massive decorrelation across network and in time... Let G be the graph (network), and let S(G) be set of scaling function coefficient nodes. Then divide the wavelet coefficients into TWO sets:
Matt Nunes
Analysis of Network Time Series
18 / 22
A Model
Massive decorrelation across network and in time... Let G be the graph (network), and let S(G) be set of scaling function coefficient nodes. Then divide the wavelet coefficients into TWO sets: Z(G) set of nodes whose series are white noise. Q(G) set of nodes whose series are not white noise.
Matt Nunes
Analysis of Network Time Series
18 / 22
A Model
Massive decorrelation across network and in time... Let G be the graph (network), and let S(G) be set of scaling function coefficient nodes. Then divide the wavelet coefficients into TWO sets: Z(G) set of nodes whose series are white noise. Q(G) set of nodes whose series are not white noise. Can model Q(G) as appropriate, e.g. low-order ARMA.
Matt Nunes
Analysis of Network Time Series
18 / 22
A Model
For the Mumps Data: Scaling coefficients: S(G) = {30, 47} = {“NorthYorkshire00 , “Wiltshire00 }.
Matt Nunes
Analysis of Network Time Series
19 / 22
A Model
For the Mumps Data: Scaling coefficients: S(G) = {30, 47} = {“NorthYorkshire00 , “Wiltshire00 }. Of the remaining 45 time series:
Matt Nunes
Analysis of Network Time Series
19 / 22
A Model
For the Mumps Data: Scaling coefficients: S(G) = {30, 47} = {“NorthYorkshire00 , “Wiltshire00 }. Of the remaining 45 time series: 28 are deemed to be white noise Z(G); 17 are low order ARMA, Q(G).
Matt Nunes
Analysis of Network Time Series
19 / 22
A Model
For the Mumps Data: Scaling coefficients: S(G) = {30, 47} = {“NorthYorkshire00 , “Wiltshire00 }. Of the remaining 45 time series: 28 are deemed to be white noise Z(G); 17 are low order ARMA, Q(G). That is over 60% of the series are white noise.
Matt Nunes
Analysis of Network Time Series
19 / 22
Potential benefit for forecasting
Decorrelation across network means can treat series separately.
Matt Nunes
Analysis of Network Time Series
20 / 22
Potential benefit for forecasting
Decorrelation across network means can treat series separately. Forecasting procedure: 1
Perform network lifting dimension reduction procedure.
Matt Nunes
Analysis of Network Time Series
20 / 22
Potential benefit for forecasting
Decorrelation across network means can treat series separately. Forecasting procedure: 1 2
Perform network lifting dimension reduction procedure. (Few) scaling coefficient series in S(G) can be extrapolated forward. White noise processes in Z(G): extremely easy to predict (i.e. the mean). Q(G): use favourite forecasting method, e.g. Box-Jenkins for ARMA processes.
Matt Nunes
Analysis of Network Time Series
20 / 22
Potential benefit for forecasting
Decorrelation across network means can treat series separately. Forecasting procedure: 1 2
3
Perform network lifting dimension reduction procedure. (Few) scaling coefficient series in S(G) can be extrapolated forward. White noise processes in Z(G): extremely easy to predict (i.e. the mean). Q(G): use favourite forecasting method, e.g. Box-Jenkins for ARMA processes.
Then invert the lifting transform to obtain forecasts for the original series.
Matt Nunes
Analysis of Network Time Series
20 / 22
Improved Forecasting Performance
HUMBERSIDE
WINTER HILL ●
●
●
●
ROCHDALE EMLEY MOOR ● MANCHESTER
CROSBY ●
RINGWAY SPEKE
●
●
●
●
RHYL
VALLEY
●
LEEK
CAPEL CURIG
●
●
●
●
NOTTINGHAM EAST MIDS.
SHAWBURY
●
ABERDARON
●
●
WADDINGTON ● CONINGSBY
●
HAWARDEN AIRP.
●
DONNA NOOK SCAMPTON
●
WOODFORD
●
CRANWELL
● ●
●
WAINFLEET
●
●
● ●
●
TRAWSGOED ●
SHOBDON AIRF. ●
●
ELMDON ●COVENTRY ● CHURCH LAWFORD
LAKENHEATH
●
●
●
●
WATTISHAM
SENNYBRIDGE CREDENHILL ●
●
PEMBREY SANDS
● BRIZE NORTON ● FAIRFORD AIRF.
●
MILFORD HAVEN
ANDREWSFIELD
LUTON
LITTLE RISSINGTON ●
●
MILDENHALL
BEDFORD
●
●
MONKS WOOD ●
PERSHORE
ABERPORTH
MARHAM
WITTERING
COLESHILL
●
WEYBOURNE
●
HOLBEACH
SUTTON BONINGTONCOTTESMORE
LAKE VYRNWY
●
●
STANSTED HIGH WYCOMBE ●
● SOUTHEND AIRP. LONDON CITY ● ● BENSON NORTHOLT ● ● KEW ● ● SHOEBURYNESS ETON ● ● ● ● LYNEHAM HEATHROW GRAVESEND MANSTON ● AIRF. S. FARNBOROUGH ● KENLEY ● ● ● WISLEY LARKHILL ● EAST MALLING ● GATWICK ODIHAM ● ● ●● ● BOSCOMBE DOWN ● MIDDLE WALLOP CHARLWOOD ● LANGDON BAY YEOVILTON ● SOTON:EASTLEIGH ● HERSTMONCEUX DUNKESWELL AEROD. ● ● SOTON:OCEANOG. ● THORNEY ISLAND ● HURN ● ● NORTH WYKE ● SHOREHAM AIRP. ● SOLENT ● EXETER AIRP. ● WIGHT:NEEDLES ● ● WIGHT:ST. CATHERINES PT ISLE OF PORTLAND ●
MUMBLES HEAD ST. ATHAN ●
FILTON ●
BRISTOL
CHIVENORLISCOMBE
BODMIN ●
● ●
CAMBORNE
PLYMOUTH
BERRY HEAD
●
●
CULDROSE
Forecasting performance improves when compared to time domain forecasting (ignoring network structure) ∼ 20%. Additional improvement gained by bootstrapping over lifting orderings (“nondecimation”) and aggregating forecasts. Matt Nunes
Analysis of Network Time Series
21 / 22
Summary
Developed a new technique for network time series data reduction, which takes into account network structure decorrelation much simpler modelling and forecasting in the transform domain Preliminary results encouraging.
Matt Nunes
Analysis of Network Time Series
22 / 22
Summary
Developed a new technique for network time series data reduction, which takes into account network structure decorrelation much simpler modelling and forecasting in the transform domain Preliminary results encouraging. Open questions: Choice of network Optimal / adaptive lifting strategies for maximal decorrelation effect Other analysis tasks, ...e.g. changepoints?
Matt Nunes
Analysis of Network Time Series
22 / 22
References
Sweldens, W. (1995). The lifting scheme: A new philosophy in biorthogonal wavelet construction. In A. Laine and M. Unser (Eds.), Wavelet Applications in Signal and Image Processing III, pp. 68–79. Proc. SPIE 2569. Jansen, M., Nason, G.P. and Silverman, B.W. (2009). Multiscale methods for data on graphs and irregular multidimensional situations. J. Roy. Statist. Soc. B 71(2), 97–126. Knight, M. I., Nason, G. P. and Nunes, M. A. (2014). Hurst exponent estimation for long memory processes using wavelet lifting (in preparation).
Matt Nunes
Analysis of Network Time Series
22 / 22