Generalized Estimating Equations in Longitudinal Studies:
A Non-Parametric Alternative for Two-Way Repeated Measures Mixed ANOVA
Kalesh M Karun1*, Deepthy M S2
1Scientist C, ICMR- National Institute of Traditional Medicine, Dept. of Health Research (Govt. of India),
Nehru Nagar, Belagavi, Karnataka, Pin- 590010, India.
2Department of Biostatistics, Jawaharlal Institute of Postgraduate Medical Education and Research,
Puducherry, Pin - 605006, India.
*Corresponding Author E-mail: karunkmk@gmail.com, deepthyms27@gmail.com
ABSTRACT:
Two group pre-post designs are very commonly used in medical research to study the effect of interventions on numerical outcome variables. Sometimes these measurements don’t follow the fundamental statistical assumption of normality and two-way repeated measures mixed ANOVA cannot be used. Generalized Estimating Equation (GEE) with Gamma log link function is a non-parametric analogue that can be used when data is skewed. When compared to other methods GEE has fewer assumptions and provides precise estimates.In the present study, the application of GEE is demonstrated using a simulated data. Different steps involved in the GEE analysis using SPSS software were also provided as an easy guide to researchers. This study could serve medical researchers understand, perform and interpret GEE in a better way.
KEYWORDS: Generalized Estimating Equation; Non-parametric; Continuous variable; Repeated measures; longitudinal.
INTRODUCTION:
In longitudinal studies, assessments from study subjects are routinely gathered to assess the change in various clinical outcomes over a period of time. This is critical for monitoring a patient's health state and deciding on future treatment choices. For analysing this type of repeated measurements, many parametric and non-parametric techniques are available.1,2 These tests are also being used frequently in healthcare research.3–10 McNemar’s test will be used for the comparison of two dependent proportions, i.e, when the outcome of interest is binary. Cochran’s Q test will be used to study the change in proportion (binary outcome) across more than two time points. For continuous measurements obtained at two time points from the same group of subjects (i.e., baseline and post intervention), paired t test or Wilcoxon matched pair signed rank test can be used based on the normality.
But, when the continuous measurements are obtained from the same set of subjects at more than two time points, one way repeated measures ANOVA/ Friedman’s ANOVA by ranks will be used. In the case of longitudinal quantitative measurements obtained from two or more groups (change in average value of the numerical variable over time as well as between groups), two-way repeated measures mixed ANOVA will be used. However, if data is not normal, repeated measures mixed ANOVA is not preferred since it requires balanced, complete data sets and normally distributed response variables (Keselman HJ et al.,2001).11 Also, RMANOVA does not allow for the analysis of categorical covariates that change over time.
Generalized Estimating Equations (GEE) is one of the convenient non-parametric equivalents for two-way repeated measures mixed ANOVA (Liang KY and Zeger SL,1986, Zeger SL and Liang KY, 1986, Pekar S and Brabec M, 2018, Hanley JA et al.,2003).12–15 GEE is comparatively easy to perform and available in many of the well-known softwares such as SPSS, STATA, R etc. However, in medical research use and familiarity of GEE is limited. The main advantages of Generalized Estimating Equations (GEE) are,
· Can be used on numerical data even if data is skewed
· Unbiased estimation of population-averaged regression coefficients despite possible misspecification of the correlation structure.
· Uses all available data for each subject
· Allows for specification of both time-varying and individual difference variables
· GEE also allows adjusting for the continuous and categorical covariates in the model.
· Can be applicable for the binary, count and ordinal level repeated measurements.
In medical research usually researchers directly perform two-way repeated measures mixed ANOVA even though data violates the normality assumption as there are no straight forward non-parametric analogues available. The present study aimed to provide an overview of the usage of GEE as a non-parametric analogue of two-way repeated measures mixed ANOVA based on simulated data.
MATERIALS AND METHODS:
Data description:
A dummy data was created to demonstrate the GEE. The outcome variable is knowledge regarding Covid 19 infection among Asha workers and the score ranges between 0-100. Data was simulated for two groups namely intervention group (who received education on Covid 19) and control group (who didn’t receive any education on Covid 19 infection) for two different time points such as pre-test and post-test (after one month). Variables in the dummy data named as ID, Group, Time and Knowledge.
Data analysis:
In the simulated data, knowledge score did not follow normality assumptions and hence a non-parametric alternative of Two-way repeated measures mixed ANOVA was more appropriate. Median and quartiles [Q1, Q3] were used to summarize the knowledge score as data violated the normality. A generalized estimating equation (Gamma with log link) was performed to check the significant difference in the average knowledge score regarding Covid 19 infection across different time points as well as between intervention and control groups. Generalized estimating equation with Gamma with log link function should be used when the outcome is positively skewed (not normally distributed). Wald statistic (p-value) based on GEE was used to generate the conclusion. A p-value less than 0.05 was considered to be significant. In addition, Box plots were generated using SPSS to show the change in knowledge scores at two time points as well as between groups graphically.
Procedure to perform GEE in SPSS:
· Open long format data in SPSS.
· Go to Analyze > Generalized Linear Models > Generalized Linear Models...
· On the type of repeated tab, Select ID in the within subject box
· On the type of model tab, Select Gamma log link function (various options for model selection are given below in Table 1).
· On the Response tab, select a dependent variable
· On the Predictors tab, select factors and covariates used for predicting the dependent variable. (categorical variables under factors and quantitative variables under covariates)
· On the Model tab, specify model effects using the selected factors and covariates.
· On the EM Means tab, specify the pair wise comparisons if required.
The syntax provided below can be used to estimate the interaction between groups and time points [between group comparisons over time] as well as Bonferroni pairwise comparison of outcome variable of various paired time points within each group.
Syntax:
Genlin Knowledge By Group Time (ORDER=Ascending)
/MODEL GROUP TIME GROUP*TIME INTERCEPT=YES
DISTRIBUTION=GAMMA LINK=LOG
/CRITERIA SCALE=1 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95
LIKELIHOOD=FULL
/EMMEANS TABLES=GROUP SCALE=ORIGINAL COMPARE=GROUP CONTRAST=PAIRWISE PADJUST=BONFERRONI
/EMMEANS TABLES=TIME SCALE=ORIGINAL COMPARE=TIME CONTRAST=PAIRWISE PADJUST=BONFERRONI
/EMMEANS TABLES=GROUP*TIME SCALE=ORIGINAL COMPARE=GROUP*TIME CONTRAST=PAIRWISE PADJUST=BONFERRONI
/REPEATED SUBJECT=id SORT=YES CORRTYPE=INDEPENDENT ADJUSTCORR=YES COVB=ROBUST MAXITERATIONS=100
PCONVERGE=1e-006(ABSOLUTE) UPDATECORR=1
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.
Type of model:
Depending on the outcome measures one can select an appropriate model. For positively skewed numerical data, Gamma distribution with log link function can be used. For binary and ordinal variables, logit and probit link functions and for count data log link function can be used16. Details are given below in Table 1.
Table 1: Selection criteria for appropriate Model [distribution and link function]
|
Type of Model |
Situation to use |
|
Linear |
The outcome measures are numerical and normally distributed |
|
Gamma with log link |
The outcome measures are numerical and skewed toward larger positive values (Observations should be greater than zero). |
|
Binary logistic or binary probit |
The outcome measures are binary - such as disease present and absent |
|
Ordinal logistic or ordinal probit |
The outcome measure is ordinal-such as a ranking from 1 to 50 |
|
Poisson log linear |
The outcome measure is a count, such as number of deliveries |
RESULT:
Generalized estimating equation with Gamma with log link function is used to estimate the change in knowledge score across different time points in each group as well as helps in between group comparison (Interaction term in the model- TIME*GROUP).In SPSS output, the table under ‘pairwise comparisons’ provides the p-value for change in outcome variable over time for different groups. The table under ‘Tests of Model Effects’ provides the between group comparison results (interaction effect).
For the simulated data, generalized estimating equations (GEE) revealed that there was a significant increase in the knowledge score before and after the teaching programme/intervention (p<0.001) in the experimental group. In the control group no significant difference was observed in the average knowledge score between two time points (p=0.75). In addition, it was observed from the between groups comparison that there is a significant difference in change in knowledge score between experimental and control groups (p<0.001) [Table 2].
Table 2 Effectiveness of nurse led multi-intervention programme on Quality of life
|
Groups |
Knowledge score Median (Q1, Q3) |
Within group comparison: Wald statistic (p-value) |
Between group comparison: Wald statistic (p-value) |
|
|
Pre test |
Post test |
|||
|
Experimental group [n=70] |
12 [8,14] |
18 [15,20] |
371.05 (<0.001*) |
160.67 (<0.001*) |
|
Control group [n=70] |
12 [9,14] |
12 [9, 14] |
0.09 (0.75) |
|
*significant (p<0.05)
Hence it can be interpreted that the intervention is effective in increasing the knowledge regarding Covid 19 infection among Asha workers. The distribution of knowledge score is presented using box plot [Figure 1].
Figure 1. Box plot showing the distribution of knowledge score at two time points between groups.
DISCUSSION:
The non-parametric alternatives available for RMANOVA are Generalized Linear Mixed Models (GLMM), (Tan F, et al.,2007); Generalized Additive Mixed-Effect Models (GAMM), (Wood SN, 2006); Kaptein et al.’s (2010) non-parametric method; Wobbrock JO et al.’s (2011) Aligned Rank Transform (ART); ATS (ANOVA-Type Statistic) and WTS (Wald-Type Statistic) using ‘nparLD’ R Package (Nouguchi K et al., 2012) and Generalized Estimating Equations (GEE).17–21 Even though non-parametric equivalents for two-way repeated procedures for analysis exist, their use and familiarity are limited. Among all these methods GEE is comparatively easy to perform and is available in many of the well-known softwares such as SPSS, STATA, R etc. GEE is a convenient and general approach to the analysis of several kinds of correlated data in medical and nursing research.
When compared to GLM, GEE provides precise estimates and inferences. It is inherently less ambitious than GLMM as it does not deal with random-effect estimates. The GEE model is often based on fewer assumptions than GLMM and hence can be less prone to misspecification errors.
GEE is also very commonly used to analyse cluster randomised control trials to account for the cluster effect. A robust variance (sandwich) GEE will be more appropriate especially when there are few number of clusters.These robust variance estimates allow the correct specification of the mean model while relaxing the assumption of correctly specifying the form of the variance- covariance structure (the working correlation), such as independent, exchangeable, or autoregressive. That means, GEE is generally robust to misspecification of the variance covariance structure of the model.13,22
CONCLUSION:
Two group pre-post designs are very common in medical research. As there is no straightforward non-parametric analogue available for two-way RMANOVA, clinical researchers usually perform a two-way RMANOVA even though data violates the normality assumption. The present study explains the advantage of GEE as a non-parametric analogue of two-way repeated measures mixed ANOVA in longitudinal studies and also provides the steps to perform GEE in SPSS software.
CONFLICT OF INTEREST:
The authors have no conflicts of interest regarding this investigation.
REFERENCES:
1. Bhardwaj R. A study of the theoretical framework of parametric and non-parametric tests used social sciences. Research Journal of Humanities and Social Sciences. 2017;8(2):225-8. doi: 0.5958/2321-5828.2017.00034.1
2. Sharma A. Role of Statistics in Different Fields. Research Journal of Science and Technology. 2017;9(1):118-22. doi:10.5958/2349-2988.2017.00018.3
3. Sinha G. The Pharmaceutical Industry accompanied by the patient through manifold Therapies. Research Journal of Pharmacy and Technology. 2020 Jul 1;13(7):3399-401. doi:10.5958/0974-360X.2020.00604.6
4. Sudhakar S, Paul J, Selvam PS, Mahendranath P. Serum creatine kinase response on exercise induced delayed onset muscle soreness: a pilot single blind randomized clinical trial. Research Journal of Pharmacy and Technology. 2020;13(8):3638-42. doi:10.5958/0974-360X.2020.00643.5
5. Sarumathy S, Johnson LA. Knowledge, Attitude and Practice of Diabetic Patients regarding Diabetic Retinopathy in a Tertiary care Hospital. Research Journal of Pharmacy and Technology. 2017;10(7):2153-6. doi: 10.5958/0974-360X.2017.00379.1
6. Uhm TH, Kim JH. Effectiveness of 5, 10, 15-min Video Self-Instruction in Cardiopulmonary Resuscitation Training. Research Journal of Pharmacy and Technology. 2018 Feb 1;11(2):649-652. doi: 10.5958/0974-360X.2018.00121.X
7. Zainab. The effect of the application of topical shallots on infant pain post-immunization. Research Journal of Pharmacy and Technology. 2022 Apr 23;15(4):1775–8. doi: 10.52711/0974-360X.2022.00297
8. Vasan M. Impact of Job Stress on Job Satisfaction among the Pharmaceutical Sales Representatives. Research Journal of Pharmacy and Technology. 2018;11(9):3759-64. doi: 10.5958/0974-360X.2018.00688.1
9. Gupta R, Rai N. An In-vitro analysis of the staining effect of different chemical mouthwashes. Research Journal of Pharmacy and Technology. 2020; 13(12): 6007–8. doi: 10.5958/0974-360X.2020.01047.1
10. Ryu JH, Kang YS. Effect of Psychomotor Program on the Athletic Abilities of Children with Developmental Delays. Research Journal of Pharmacy and Technology. 2018;11(10):4597. doi: 10.5958/0974-360X.2018.00841.7
11. Keselman HJ, Algina J, Kowalchuk RK. The analysis of repeated measures designs: A review. British Journal of Mathematical and Statistical Psychology. 2001 May;54(1):1–20. doi: 10.1348/000711001159357.
12. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13-22.doi.org/10.1093/biomet/73.1.13
13. Zeger SL, Liang KY. Longitudinal Data Analysis for Discrete and Continuous Outcomes. Biometrics. 1986 Mar;42(1):121. doi.org/10.2307/2531248
14. Pekár S, Brabec M. Generalized estimating equations: A pragmatic and flexible approach to the marginal GLM modelling of correlated data in the behavioural sciences. Bshary R, editor. Ethology. 2018 Feb;124(2):86–93. doi.org/10.1111/eth.12713
15. Hanley JA. Statistical Analysis of Correlated Data Using Generalized Estimating Equations: An Orientation. American Journal of Epidemiology. 2003 Feb 15;157(4):364–75. doi:10.1093/aje/kwf215
16. Owusu-Darko I, Adu IK, Frempong NK. Application of generalized estimating equation (GEE) model on students’ academic performance. Applied Mathematical Sciences. 2014;8(68):3359-74.doi.org/10.12988/ams.2014.44277
17. Tan F, Jiang Z, Bae SJ. Generalized linear mixed models for reliability analysis of multi-copy repairable systems. IEEE Transactions on Reliability. 2007 Mar 5;56(1):106-14. doi:10.1109/TR.2006.884596
18. Wood SN. Low-Rank Scale-Invariant Tensor Product Smooths for Generalized Additive Mixed Models. Biometrics. 2006 Dec;62(4):1025–36. doi:10.1111/j.1541-0420.2006.00574.x
19. Kaptein MC, Nass C, Markopoulos P. Powerful and consistent analysis of likert-type ratingscales. In: Proceedings of the 28th international conference on Human factors in computing systems - CHI ’10. Atlanta, Georgia, USA: ACM Press; 2010:2391-94. doi.org/10.1145/1753326.1753686
20. Wobbrock JO, Findlater L, Gergle D, Higgins JJ. The aligned rank transform for nonparametric factorial analyses using only anova procedures. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems [Internet]. Vancouver BC Canada: ACM; 2011 [cited 2022 Sep 1]: 143–6. doi/10.1145/1978942.1978963
21. Noguchi K, Gel YR, Brunner E, Konietschke F. nparLD: an R software package for the nonparametric analysis of longitudinal data in factorial experiments. Journal of Statistical Software. 2012 Sep 18;50:1-23.doi.org/10.18637/jss.v050.i12
22. Leyrat C, Morgan KE, Leurent B, Kahan BC. Cluster randomized trials with a small number of clusters: which analyses should be used? International Journal of Epidemiology. 2018 Feb 1;47(1):321–31. doi: 10.1093/ije/dyx169.
Received on 25.02.2022 Modified on 04.07.2022
Accepted on 16.11.2022 © RJPT All right reserved
Research J. Pharm. and Tech 2023; 16(5):2381-2384.
DOI: 10.52711/0974-360X.2023.00392