APLIKASI KAEDAH MULTIGRID BAGI MENYELESAIKAN

Download 2015 Jurnal Karya Asli Lorekan Ahli Matematik ... approach gives a better performance than ordinary least square (OLS) in dealing with outl...

0 downloads 463 Views 497KB Size
Jurnal Karya Asli Lorekan Ahli Matematik Vol. 8 No.1 (2015) Page 023-028

Jurnal Karya Asli Lorekan Ahli Matematik

MODELING BODY MASS INDEX USING MULTIPLE LINEAR REGRESSION AND ROBUST REGRESSION Nor Azlida Aleng1, Nyi Nyi Naing2, Zurkurnai Yusof3 and Norizan Mohamed4 1,4

School of Informatics and Applied Mathematics, University Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia. 2,3 School of Medical Sciences, Health Campus, Universiti Sains Malaysia, 16150 Kota Bharu, Kelantan, Malaysia. [email protected]

Abstract : Regression analysis is a statistical technique for investigating and modeling the relationship between two or more variables. In regression analysis, the fundamental idea is to use data set to fit a prediction equation that relates dependent variable to independent variables. Then, the prediction equation is used to estimate future values of the dependent variable. Practically, all regression analysis relies on the method of ordinary Least Squares (OLS) for estimation of the parameters in the model. However, this method is less performance and biased when outliers exist in the data. Thus, to remedy this problem, robust M-estimation is proposed. This is an alternative approach in dealing with outliers in regression analysis. Results show that robust regression gives a better result than multiple linear regression in modeling body mass index. Keywords: Multiple linear regression, robust regression, outliers, body mass index.

1. Introduction The body mass index (BMI) is a measure of body fat based on height and weight. It is calculated by dividing weight in kilograms (kg) by height in meters squared (m2). BMI is one method used by healthcare professionals worldwide use BMI as a reliable indicator to determine whether a person is overweight or clinically obese [1]. Differences in BMI between people of the same age and sex are usually due to body fat. However, BMI measurement can sometimes be misleading, for example, a muscleman may have a high BMI but have much less fat than an unfit person whose BMI is lower. However, in general, the BMI measurement can be a useful to screen for weight categories that can lead to serious health problems, such as heart disease and diabetes [2]. Table 1 gives the body mass index classifications.

2

BMI (kg/m ) < 18.5 18.5-24.9 25.0-29.9 30.0-39.9 > 40.0

Table1: Body Mass Index Classifications. Classification Description Underweight Thin Healthy, normal Overweight (Grade 1) Overweight Overweight (Grade 2) Obesity Overweight (Grade 3) Morbid obesity (Source: WHO Expert Committee, 1995)

Generally speaking, the healthy recommendation BMI range for adults is 18.5 - 24.9. However, children are constantly growing, which makes it difficult to have set values for BMI cut-offs. In Australia, for older over the age of 70 years, general health status may be more important than being mildly overweight. Some researchers have suggested that a BMI range of 22-26 is desirable for older Australians. In recent years, different ranges of BMI cut-off points for overweight and obesity have been proposed, in particular for the Asia-Pacific region [3]. In USA, over half (53%) of all deaths in women with a BMI>29 kg/m2 could be directly attributed to their obesity [4]. Eating behaviors that have been linked to overweight and obesity include snacking/eating frequency, binge-eating patterns, eating out, and exclusive breastfeeding. Physical activity is an important determinant of body weight. © 2015 Jurnal Karya Asli Lorekan Ahli Matematik Published by Pustaka Aman Press Sdn. Bhd.

Jurnal KALAM Vol. 8 No. 1, Page 023-028 BMI is usually related to body fat measurement. Higher BMI usually indicated to higher body fat and a high body fat percentage can put us at risk for many serious diseases [1]. According to the previous research, if someone is overweight, he or she is at risk for many diseases and health condition as such heart disease, stroke, diabetes, cancer, high blood pressure, high cholesterol and blood lipids (LDL) and many more. When overweight or obese people lose their weight, they also lower their blood pressure, total cholesterol, LDL cholesterol, increase their HDL cholesterol, improve their blood sugar levels, and reduce their amount of abdominal fat [2]. The objectives of the current study is to show that, robust M-estimation is an alternative approach gives a better performance than ordinary least square (OLS) in dealing with outliers presence in the data. A total of 300 respondents were selected and diagnosed to have BMI problem based on WHO criteria. Material of this study is a hypothetical sample which is composed of ten variables. Material of this study is a hypothetical sample which is composed of ten variables. The explanation of the variables is shown in Table 2 and data were collected from Health Centre in Malaysia. Code Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10

Variables BMI SBP DBP TOTCHOL GLUCOSE HDL HEIGHT WEIGHT AGE TRIG WAIST

Table 2: Explanation of the Variables Explanation of the variables Body Mass Index Systolic Blood Pressure Diastolic Blood Pressure Total cholesterol (Mmol/L) Serum fasting glucose HDL cholesterol Height of a patient in cm Weight of a patient in kg Age In Years Triglycerides The length of waist in cm

2. Materials and Methods This study will focus on the relationship between BMI as a dependent variable and independent variables namely in Table 2. Hence, we analyze the medical data by using two models; multiple linear regression (MLR) and robust regression (RR).

2.1

Multiple Linear Regression (MLR)

Multiple linear regression attempts to model the relationship between independent and dependent variables by fitting a linear equation to observed data. Consider a sample data consists of n observations on a dependent (response) variable and ten independent (regressor) variables. The relationship between variables is formulated as a linear model, yi  0  1 x1i  2 x2i  ...  12 x12i   i

(1)

The linear regression model can be expressed in terms of matrices as y  X 

(2) where y is the n x 1 vector of observed response value, X is the nxp matrix of p regressors (design matrix),  is the p x 1 regression coefficients and  is the n x 1 vector error terms. The objective of regression analysis is to find the estimates of the unknown parameter which is the regression coefficients  from the observed sample. The ordinary least squares (OLS) method has been generally implement to

24

Nor Azlida Aleng et. al. find the best estimate of  .

Estimation with OLS OLS minimize the squared distances between the observed and the predicted dependent variable y: N

S (  )   ( yi  xi )2  ( y  X  )( y  X  )  min

(3)



i 1

The resulting OLS estimator of  is

ˆ  ( X X )1 X  y

(4)

Given the OLS estimator, we can predict the dependent variable by yˆi  xiˆ and the error term ˆi  yi - xiˆ is called the residual. Stepwise procedure is applied in this study to select the significant variables in the MLR.

2.2

Robust Regression (RR)

Generally, all regression analysis relies on the method of ordinary Least Squares (OLS) for estimation of the parameters in the model. However, this method is less performance and biased when outliers exist in the data. The outliers can lead to model misspecification, incorrect analysis result and can make all estimation procedures meaningless as discussed by (Rousseeuw and Leroy, 1987; Barnett and Lewis, 1994 and Alma, 2011). Robust regression have been developed as an improvement to least squares estimation in the presence of outliers. Robust regression is an important tool for analyzing data that are contaminated with outliers. This method is robust when the outlier is present in the data set. In this case, we are interested in using M-estimation to model the BMI data, since the normal QQ- plot shows the existence of outliers in the set data. In such cases, the influence of extreme data can be minimized by using robust M-estimators.   eˆi  is In robust M-estimation, this idea is generalized and values of ˆ are chosen so that



as small as possible, where  (e) some function of e. Therefore, least square estimation and least absolute deviation estimation can be regarded as the special case of M-estimation where  (e)  e2 and  (e)  e respectively. The Huber M-estimates of ˆ are the values b that minimize

   y  (b  b x i

0

1 i1

 ...  bp xip 

(5)

Huber (1973) defined the objective function of  (e) as follows: 2  for e 2  2k e  k for

 ( e)  

e k

(6)

e >k

The value k is called a tuning constant. Generally, suggestion of Huber’s, take k  1.345ˆ , where ˆ is an estimate of the standard deviation of the errors produce 95% efficiency when the errors are normal and still offer protections against outliers. (Fox, 2002). The performances of these two models are compared using the determination coefficient, R 2 . R 2 n

can be measure as 1 

  y  yˆ  i

i 1 n

2

i

 y  y 

2

, where yi is the actual observation, yˆ is the predicted values and

i

i 1

25

Jurnal KALAM Vol. 8 No. 1, Page 023-028 y is the mean of the observations. The value of r is such that -1 < r < +1. The + and – signs are used for positive linear correlations and negative linear correlations, respectively.

3.

Results and Discussion

Multi regression analysis was used to analyze the direct relationship between all the predictor variables (independent variable) and body mass index (dependent variable). Table 3. Model parameter estimates for OLS estimation Std. Coefficient Beta ( β ) Dependent Variable Independent Variable Body Mass Index

R 2 = 0.955

56.135 -0.002 -0.312 0.324

(Constant) X1 X6 X7

Sig. 0.000** 0.029* 0.000** 0.000**

Note: Significant levels: **p < 0.01, *p < 0.05

From the above output, the model of BMI is, = 56.135 – 0.002X1 – 0.312 X6 + 0.324X7 +e and the R value is 0.995. Therefore, about 95.5% of the variation in y is explained by Xs and this indicated the greater the ability of that model to predict a trend. The residual analysis plays an important role for determining the adequacy of the model to ensure that the interpretation of the model is valid. Figure 1 presents the scatter plot of error distribution, histogram and normal probability plot for BMI model. From the scatter plot of error against predicted values, we found that there is no clear relationship between the errors and predicted values; this is consistent with the assumptions of linearity. The normal probability plot approximately straight lines, which clearly indicate that the error is normally distributed and the histogram also demonstrate normality. The results are shown in Figure 1. 2

26

Nor Azlida Aleng et. al.

Figure 1: The error distribution, histogram and the normal P-P plot for MLR model. Robust regression analysis provides an alternative model to improve the fit of the model. Mestimation usually used for outlier detection and robust regression when contamination is mainly in the response direction.

Dependent Variable

Table 4: Model parameter estimates for M-estimation Std. Coefficient Beta ( β ) Independent Variable

Body Mass Index

R 2 = 0.965

64.3073 0.0016 0.3100 0.3250

(Constant) X1 X6 X7

Sig. 0.000** 0.010** 0.000** 0.000**

Note: Significant levels: **p < 0.01, *p < 0.05

For the BMI data, M-estimation yields the fitted linear model yˆ  64.3073  0.016 X1  0.31X 6  0.325 X 7  e . R squared value is 0.965, this indicated the greater the ability of that model to predict a trend. Based on Figure 2, proved the existing of outlier observations in the data set. 25 observations are considered as outliers. Although, the BMI data presence of outlier, results remain robust. Similar to MLR model, the distribution errors also investigated. The histogram clearly indicates that the error is normally distributed. Thus, the conclusion can be made that the model is valid.

27

Jurnal KALAM Vol. 8 No. 1, Page 023-028

Figure 2: The leverage diagnostics, Q-Q plot and histogram for RR model.

4.

Conclusion

The performances of these two models are compared using the determination coefficient, R 2 . R squared values for multiple linear regression and robust regression model are 0.955 and 0.965, respectively. Results show that robust M-estimation gives a better results than OLS in modeling body mass index. Therefore, we can conclude that the robust M-estimation is an alternative approach in dealing with outliers in regression analysis and improving modeling performances for BMI data.

Acknowledgement The authors would like to thank the Research Management Centre, Universiti Malaysia Terengganu and the Ministry of Science, Technology and Innovation (MOSTI), Malaysia for the financial and moral support in the form of FRGS Grant no. 59266 for the delivery of this paper.

References [1]

Cosmetic Surgery Consultants. http://www.cosmeticsurgeryconsultants.co.uk/obesity-what-is-bmi.htm [16 April 2014].

[2]

Walker, S. P., Rimm, E. B., Ascherio, A., Kawachi, I., Stampfer, M. J., & Willett, W. C. (1996). Body size and fat distribution as predictors of stroke among US men. Am J. Epidemiol, 144, 1143-1150.

[3]

WHO (1990). Diet, nutrition, and the prevention of chronic disease.Technical Report Series no 797. Geneva: World Health Organization.

[4]

Manson, J. E., Willett, W. C., Stampfer, M. J., Colditz, G. A., Hunter, D. J., Hankinson, S. E., Hennekens, C. H., & Speizer, F. E. (1995). Body weight and mortality among women. New England Journalof Medicine, 333, 677-685.

[5]

WHO (1995). Expert Committee.Physical Status: the use and interpretation of anthropometry. WHO Technical Report Series no. 854. Geneva.

[6]

Rousseeuw, P. J., & Leroy, A. (1987). Robust regression and outlier detection. New York: John Wiley and Sons.

[7]

Barnett, V., & Lewis, T. (1994). Outliers in statistical data. New York: John Wiley and Sons.

[8]

Alma, O. G. (2011). Comparison of robust regression methods in linear regression. Int. Journal Contemp. Math. Sciences, 6(9), 409-421.

[9]

Fox, J. (2002). Robust Regression. http://cran.r-project.org.

28