Propensity+Score+Matching+in+SPSS: How+to+turn+an+Audit

What+is+Propensity+score+matching? 2.+Applypropensityscoreto+balancethedata.Fourmain,applications. Propensity+score+matching:Match,one,ormore,controlc...

275 downloads 1014 Views 3MB Size
 

Mario  D  Hair   Independent  Statistics  Consultant  

Propensity   Score  Matching   in  SPSS: How  to  turn  an  Audit  into  a  RCT Outline •

What  is  Propensity  score  matching?



Propensity  Score  Matching  in  SPSS



Example:  Comparing  patients  with  both  Gout  &  diabetes  to  those  with  diabetes  only



Dealing  with  missing  data

Mario D Hair

Independent Statistics Consultant

1

What  is  Propensity  score  matching? Developed  by  Rosenbaum  &  Rubin  (1983).  Two  aspects 1.  Generate  the  propensity  score 2.  Apply  it  to  balance  the  data.

Search  hits  using  ‘Propensity  score  matching’  by  year.   Slide  provided  by  Beng  So,  ST6  Queen  Elizabeth  Hospital,  Glasgow Mario D Hair

Independent Statistics Consultant

2

What  is  Propensity  score  matching? 1.  Generate  the  propensity  score The  propensity  score  is  the  probability  (from  0  to  1)  of  a  case  being  in  a  particular   group  based  on  a  given  set  of  covariates.   Generally  calculated  using  logistic  regression  with  group  (Treatment  /Control)  as   dependent  ,  covariates  as  independent  variables.   Caveats  &  Limitations • Can  only  be  two  groups.  If  more  groups  need  to  analyse  them  pairwise. • The  propensity  score  is  only  as  good  as  the  predictors  used  to  generate  it.   • Propensity  score  not  generated  for  any  case  with  any  missing  data.   • Not  interested  in  any  aspect  of  the  logistic  model  other  than  the  probabilities.   The  propensity  score  is  a  balancing  score:  The  differences  between  groups  on  the   covariates  condensed  down  into  a  single  score  so  if  two  groups  balanced  on  the   propensity  score  then  balanced  on  all  the  covariates.

Mario D Hair

Independent Statistics Consultant

3

Slide  provided  by  Beng  So,  ST6  Queen  Elizabeth  Hospital,  Glasgow Mario D Hair

Independent Statistics Consultant 4

What  is  Propensity  score  matching? 2.  Apply  propensity  score  to  balance  the  data.  Four  main  applications. Propensity  score  matching :  Match  one  or  more  control  cases  with  a  propensity  score   that  is  (nearly)  equal  to  the  propensity  score  for  each  treatment  case   Stratification: Divide  sample  into  strata  based  on  rank-­ordered  propensity  scores.   Comparisons  between  groups  are  then  performed  within  each  stratum. Regression  adjustment: Include  propensity  scores  as  a  covariate  in  a  regression   model  used  to  estimate  the  treatment  effect.   Weighting:  Inverse  probability  of  treatment  weighting  (IPTW)  weights  cases  by  the   inverse  of  propensity  score.  Similar  to  use  of  survey  sampling  weights  used  to  ensure   samples  are  representative  of  specific  populations.  Often  used  in  survival  analyses. Austin  (2011)  reports  that  propensity  score  matching  is  better  than  stratification  or   regression  adjustment  and  is  at  least  as  good  as  IPTW.  It  is  increasingly  the  most   widely  used  method. Mario D Hair

Independent Statistics Consultant

5

Propensity  Score  Matching  in  SPSS Available  in  SPSS V22  but  Prior  to  that  only  as  ‘PS  matching’  an  extension  command  that  requires   both  r  and  the  r  plug-­in.  Developed  by  Felix  Thoemmes  at  Cornell  University.  

PS  matching:  http://sourceforge.net/projects/psmspss/       contains   • Latest  version  of  the  software,  psmatching  3.04  June  2015.  (this  talk  uses  3.03) • Installation  instructions  (in  a  file  called  ‘readme.txt’)   • Thoemmes  2012  paper  describing  the  software  (called  ‘arxiv  preprint.pdf’).

Comparison  of  PS  matching  &  SPSS  Propensity  score  matching   Loading   Generate  propensity   score   Score  matching  

PS  matching   Can  be  tricky.  Requires  R  plug-­in  &  R   but  available  for  V18  onwards   SPSS  logistic  regression   GAM  logit??   Uses  R  packages:     MatchIt,  Rltools  ,  cem  

Speed  

Can  be  slow  for  large  files  

Precision  

Very  good  

Diagnostics  

Very  good    

Missing  data    

Cannot  handle  any  missing  data,   covariate  or  not.    

 

Mario D Hair

SPSS  Propensity  score  matching   Pre-­loaded  in  V22   SPSS  logistic  regression   Uses  Python  essentials     FUZZY  extension  command   Also  slow  but  speed  can  be  increased   by  sacrificing  precision   Can  be  poor  unless  match  tolerance  set   very  low   Poor   No  problem  but  missing  data  in  the   covariates  will  result  in  omission  of   cases  

Independent Statistics Consultant

6

Propensity score matching SPSS V22

Mario D Hair

PS Matching

Independent Statistics Consultant 7

Example:  Comparing  1714  patients  with  BOTH Gout  &  diabetes to  15,224  patients  with  ONLY diabetes

Covariates

Mario D Hair

Independent Statistics Consultant 8

Univariate  stats:  Comparing  BOTH Gout  &  diabetes  to  those  with  ONLY  diabetes  

Group  

Total  Cholesterol   Type  2  only   Gout  &  type  2   HDL  Cholesterol   Type  2  only   Gout  &  type  2   LDL  Cholesterol   Type  2  only   Gout  &  type  2   Triglycerides   Type  2  only   Gout  &  type  2   BMI  at  risk   Type  2  only   Gout  &  type  2       Group  

N  

Mean  



14332   1633   14971   1691   14274   1593   14906   1678   14670   1652  

Std.  Dev   Mean  diff  (both-­type2)   Effect  size  (d)   /Odds  ratio  (95%  CI)   4.24   1.02   -­0.16*  (-­0.21,-­0.11)   0.10   4.08   1.01   1.28   .38   -­0.05*  (-­0.07,-­0.03)   0.08   1.23   .38   2.06   .87   0.12   -­0.17*    (-­0.21,-­0.13)   1.89   .84   2.06   1.23   0.16*  (0.10,  0.22)   0.08   2.21   1.31   54.6%     1.12*  (1.01,  1.24)   0.06   57.4%    

N  

Mean  



Std.  Dev   Mean  diff  (both-­type2)   Effect  size  (d)   /Odds  ratio  (95%  CI)   Age   Type  2  only   15224   65.51   12.42   4.42*  (3.81,  5.03)   0.22   Gout  &  type  2   1714   69.93   10.70   Gender  (%Male)   Type  2  only   15224   54.2%     1.70*  (1.53,1.89)   0.29   Gout  &  type  2   1714   66.7%     Smoker  (%Current)   Type  2  only   15224   18.9%     0.35   0.53*    (0.45,0.62)   Gout  &  type  2   1714   11.0%     Type  2  only   Thiazide   15224   16.5%     0.81*  (0.70,  0.94)   0.11   Gout  &  type  2   1714   13.8%     Diuretic   Type  2  only   15224   30.7%     1.92*  (1.73,  2.12)   0.36   Gout  &  type  2   1714   46.0%       *  p  <  0.05   †using  t  to  d  c onversions  d  =  2t/sqrt(df)  &  d  =  ln(OR)*(√3/π)

Mario D Hair

Independent Statistics Consultant 9

PS  Matching:  Using  a  file  with  only  the  covariates

Warning: PS Matching will not work if there are missing values on any variable Mario D Hair

Independent Statistics Consultant 10

Propensity  score  matching  SPSS  V22

PS  Matching

However Propensity Score Matching does work if there are missing values on any variable Mario D Hair

Independent Statistics Consultant 11

PS  Matching  has  more  options  &  diagnostics

Mario D Hair

Independent Statistics Consultant 12

PS  Matching  Outputs   :  Datasets Matched  cases  

Paired  cases  wide  format

Mario D Hair

Independent Statistics Consultant 13

Propensity  score  Matching  SPSS  V22  Output

Mario D Hair

Independent Statistics Consultant 14

PS  Matching  Outputs   :  Diagnostics  

  All   Matched   Unmatched   Discarded  

Samples  sizes  of  matched  data

Control   15224   1714   13510   0  

Treated   1714   1714   0   0  

Overall   balance  test  (Hansen  &  Bowers,   2010)

Overall balance tests

chisquare

df

p.value

.883

5.000

.971

Overall

Relative   multivariate   imbalance  L1  (Iacus,  King,  &  Porro,   2010)

Multivariate   imbalance   measure  L1 Covariates

Detailed   balance

Means  Treated

Before  matching

After  matching

.290

.167

Means  Control

SD   Control

Std.  Mean  Diff.

Before

After

Before

After

Before

After

Before

After

propensity

.136

.136

.097

.135

.058

.073

.524

.005

Age

69.928

69.928

65.511

69.938

12.416

11.049

.409

-­.001

sex0

.667

.667

.542

.661

.498

.473

.266

.014

sex1

.333

.333

.458

.339

.498

.473

-­.266

-­.014

curr_smok1

.110

.110

.189

.118

.392

.323

-­.252

-­.026

Thiazide11

.138

.138

.165

.134

.371

.341

-­.077

.012

Diuretic11

.460

.460

.307

.461

.461

.499

.306

-­.002

Summary  of  any  unbalanced   covariate  terms  inc  interactions Mario D Hair

Summary   of  unbalanced  covariates  (|d|  >  .25) No  covariate   exhibits  a  large  imbalance   (|d|  >  .25).

Independent Statistics Consultant 15

PS  Matching  Outputs   :  Diagnostic   plots Histogram  of  propensity  scores

Mario D Hair

Jitter  plot

Independent Statistics Consultant 16

PS  Matching  Outputs   :  Diagnostic   plots Dotplot  of  standardized  mean  differences Graphical  representation   of  data  from  detailed   balance  stats Covariates

Mario D Hair

Std.  Mean  Diff. Before

After

propensity

.524

.005

Age

.409

-­.001

sex0

.266

.014

sex1

-­.266

-­.014

curr_smok1

-­.252

-­.026

Thiazide11

-­.077

.012

Diuretic11

.306

-­.002

Independent Statistics Consultant 17

Adding  the     lipid  data  to  matched  file  using  merge   where  original  (non  active)  is  the  keyed  file

Mario D Hair

Independent Statistics Consultant 18

Univariate  stats:  Comparing  BOTH Gout  &  diabetes  to  those  with  ONLY  diabetes   Covariates  after  matchingA   Age  

Group  

Type  2  only   Matched  type  2   Gout  &  type  2   Gender   Type  2  only   (%Male)   Matched  type  2   Gout  &  type  2   Smoker   Type  2  only   (%Current)   Matched  type  2   Gout  &  type  2   Thiazide   Type  2  only   Matched  type  2   Gout  &  type  2   Diuretic   Type  2  only   Matched  type  2   Gout  &  type  2    

N  

Mean  

15224   1714   1714   15224   1714   1714   15224   1714   1714   15224   1714   1714   15224   1714   1714  

65.51   69.94   69.93   54.2%   66.1%   66.7%   18.9%   11.8%   11.0%   16.5%   13.4%   13.8%   30.7%   46.1%   46.0%  

Mario D Hair

Std.  Dev   Mean  diff/Odds   ratio  (95%  CI)     12.42   4.42*  (3.81,  5.03)   0.01  (-­0.72,  0.74)   11.05   10.79     1.70*  (1.53,1.89)   1.03  (0.89,  1.19)         0.53*    (0.45,0.62)   0.92  (0.75,  1.14)         0.81*  (0.70,  0.94)   1.04  (0.85,  1.26)         1.92*  (1.73,  2.12)   0.99  (0.87,  1.14)      

Effect   size  (d)   0.22   0.001     0.29   0.02     0.35   0.05     0.11   0.02     0.36   0.01    

g e   B e f o r e

A g e A f   t e r

Independent Statistics Consultant 19

Univariate  stats:  Comparing  BOTH Gout  &  diabetes  to  those  with  ONLY  diabetes   Lipids  after  matching   Total   Cholesterol  

Group  

N  

Type  2  only   Matched  type  2   Gout  &  type  2   HDL  Cholesterol   Type  2  only   Matched  type  2   Gout  &  type  2   LDL  Cholesterol   Type  2  only   Matched  type  2   Gout  &  type  2   Triglycerides   Type  2  only   Matched  type  2   Gout  &  type  2   BMI  at  risk   Type  2  only   Matched  type  2   Gout  &  type  2    

Mean  

14332   1619   1633   14971   1696   1691   14274   1625   1593   14906   1682   1678   14670   1655   1652  

Mario D Hair

Std.   Dev   4.24   1.02   4.09   0.97   4.08   1.01   1.28   .38   1.26   .37   1.23   .38   2.06   .87   1.94   .84   1.89   .84   2.06   1.23   2.01   1.16   2.21   1.31   54.6%     51.5%     57.4%    

Mean  diff/Odds  ratio  (95%   CI)   -­0.16*  (-­0.21,0.11)   -­0.01  (-­0.06,  0.08)  NS  

Effect   size   0.10   0.01  

-­0.05*  (-­0.07,-­0.03)   -­0.035*  (-­0.06,  -­0.01)  

0.08   0.09  

-­0.17*    (-­0.21,-­0.13)   -­0.05  (-­0.01,  0.11)  NS  

0.12   0.06  

0.16*  (0.10,  0.22)   0.20*  (0.12,  0.29)  

0.08   0.17  

1.12*  (1.01,  1.24)   1.27*  (1.11,  1.46)  

0.06   0.13  

Independent Statistics Consultant 20

Dealing  with  missing  data  1 Missing data in non covariate data: Use the paired data format Green  is  paired  comparison,  red  is  matched.  There  are  no  substantive  changes   Group   Total  Cholesterol   Matched  type  2   Paired    type  2   Gout  &  type  2   HDL  Cholesterol   Matched  type  2   Paired    type  2   Gout  &  type  2   LDL  Cholesterol   Matched  type  2   Paired    type  2   Gout  &  type  2   Triglycerides   Matched  type  2   Paired    type  2   Gout  &  type  2   BMI  at  risk   Matched  type  2   Paired    type  2   Gout  &  type  2    

Mario D Hair

N   1619   1543   1543   1696   1674   1674   1625   1513   1513   1682   1647   1647   1655   1598   1598  

Mean   4.09   4.09   4.08   1.26   1.27   1.23   1.94   1.93   1.89   2.01   2.01   2.21   51.5%   51.3%   57.5%  

Std.  Dev   0.97   0.97   1.02   .37   .37   .38   .84   .83   .84   1.16   1.16   1.31        

Mean  diff/Odds  ratio  (95%  CI)   Effect  size   -­0.01  (-­0.06,  0.08)  NS     0.01   -­0.01  (-­0.06,  0.08)  NS   0.01   -­0.035*  (-­0.06,  -­0.01)   -­0.035*  (-­0.06,  -­0.01)  

0.09   0.09  

-­0.05  (-­0.01,  0.11)  NS     -­0.04  (-­0.02,  0.10)  NS  

0.06   0.05  

0.20*  (0.12,  0.29)   0.20*  (0.12,  0.29)  

0.17   0.16  

1.27*  (1.11,  1.46)   1.28*  (1.12,  1.48)*  

0.13   0.14  

Independent Statistics Consultant 21

Dealing  with  missing  data  2 Missing data in covariates: Use multiple imputation •

Separate creation of propensity scores from the matching



Run logistic regression on imputed datasets



Aggregate to get mean (median) propensity score



Use the aggregate file to do the matching



Load in the other variables



Use imputation again if missing data in non-covariates

Mario D Hair

Independent Statistics Consultant 22

References

Austin,  P.  C.  (2011).  An  introduction  to  propensity  s core  methods  for  reducing  the  effects  of  c onfounding  in  observational  s tudies.   Multivariate  Behavioral  Research,  46,  399-­424.  doi:10.1080/00273171.2011.568786  One  of  the  foremost  authors  on  the  subject. Beal  S  J  &  Kupzyk  K  A,  An  Introduction  to  Propensity  Scores  What,  When,  and  How.  The  J ournal  of  Early  Adolescence  J anuary   2014 vol.  34 no.  1 66-­92  doi:10.1177/0272431613503215.  Easy  to  read  introduction Iacus,  S.  M.,  King,  G.,  &  Porro,  G.  (2009).  CEM:  Coarsened  exact  matching  s oftware.  J ournal  of  Statistical  Software,  30,  1-­27.   Reference  for  ‘relative  m ultivariate  imbalance  test’ Mitra,  R.,  &  Reiter,  J .  P.  (2012).  A  c omparison  of  t wo  methods  of  estimating  propensity  s cores  after  multiple  imputation. Statistical   methods  in  medical  research,  0962280212445945.   Rosenbaum,  P.  R.,  &  Rubin,  D.  B.  (1983).  The  c entral  role  of  the  propensity  s core  in  observational  s tudies  for  c ausal  effects.   Biometrika,  70,  41-­55.  doi:10.1093/biomet/70.1.41.  Seminal  paper. Rubin,  D.  B.  (1997).  Estimating  c ausal  effects  from  large  data  s ets  using  propensity  s cores. Annals  of  internal  medicine, 127(8_Part_2),   757-­763.  Example  of  stratification. Thoemmes,  F.  (2012).  Propensity  s core  matching  in  SPSS. arXiv  preprint  arXiv:1201.6385.  Explains  use  of  ‘ps  m atching’. Mario D Hair

Independent Statistics Consultant 23

 

Mario  D  Hair   Independent  Statistics  Consultant  

Propensity   Score  Matching   in  SPSS: How  to  turn  an  Audit  into  a  RCT Outline • What  is  Propensity  score  matching? • Propensity  Score  Matching  in  SPSS • Example:  Comparing  patients  with  both  Gout  &  diabetes  to  those  with  diabetes  only • Dealing  with  missing  data

Thank  you:  Questions?

Mario D Hair

Independent Statistics Consultant

24