Survival Analysis of Colon Cancer Data using Quantile Regression
Vidya Bhargavi M^{1}, Sireesha Veeramachaneni^{1}*, Venkateswara Rao Mudunuru^{2}
^{1}GITAM Institute of Science, GITAM (deemed to be) University, Visakhapatnam, Andhra Pradesh, India.
^{2}Department of Mathematics and Statistics, University of South Florida, Tampa, FL, USA.
*Corresponding Author Email: vsirisha80@gmail.com
ABSTRACT:
Quantile regression emerged as an alternative and robust technique to the commonly used regression models. Even in the survival analysis, quantile regression is offering more flexible modelling of survival data without any constraints attached. Unlike traditional Cox hazards models or accelerated failure models, quantile regression does not restrict the variation of the coefficients for different quantiles. In this research we modelled and compared traditional survival regression method with quantile regression applied to colon cancer data.
KEYWORDS: Colon Cancer, Survival Analysis, Quantile Survival Regression, KaplanMeier Analysis, Cox Proportional Hazards Function, Parametric Survival Analysis.
INTRODUCTION:
Today, survival analysis is used in almost every scientific field. The word "survival analysis" refers to a method for assessing the likelihood of events such as death or failure following treatment of subjects. Time to event simulation data with censoring is the subject of survival analysis. Censoring is a means of defining data values that do not correspond to a predetermined set of requirements^{1}. In the follow up study, some patients will have time to recur directly. For other patients we know only their time for last checkup or diseasefree survival (DFS) because these patients can change physicians, move away, or leave the study for other reasons. These patients are referred to as cases that are censored. In this work, we are interested in survival analysis of colon cancer patients. When abnormal cells are developed in either colon or rectum, it is called colorectal cancer (CRC or colon cancer). According to American Cancer Society (ACS), there will be an estimated 104,610 new colon cancer cases in the year 2020 and majority of them will be adults above 50 years or older. It is estimated that there will be 53,200 deaths due to colorectal cancers in 2020.
In CRC, cancer tumors begin as a noncancerous polyp in the inner lining of the colon or the rectum. These polyps further develop and grow into cancerous tumors and will block the lymph vessels carrying cellular waste. The tumor cells break away and spread to parts of the body distant from there it started. This process is known as metastasis. The extent of metastasis at the time of diagnosis is described as the stage of the cancer^{2}. Staging systems used in literature include TNM (tumor, node, and metastasis) and SEER summary staging. The former staging system is mostly used in clinical setting while the latter is used for statistical analysis.
Survival analysis involves modeling (duration models) timetoevent data with an objective to investigate the effects of covariates on the survival time. In many cases these effects are heterogeneous. Covariates will play a major role effecting the probability of survival at the beginning of the study time and mostly vain off during the later times or even show no effect^{3}.
Many statistical software programs and tools are available to analyze timetoevent data. Using these tools and software, we can model the data using nonparametric KaplanMeier (KM) estimations, parametric models such as Weibull, exponential, loglogistic, and semiparametric approaches using Cox proportional hazards (Cox PH) and quantile regression^{4}. Most of the data analysis in this study is conducted using SAS®. Any test result with a pvalue below 5% were considered statistically significant.
If we can assume a strong homogeneous treatment effect, parametric survival models or accelerated failure time (AFT) models are the best approach that provides a direct interpretation of covariate effects on the event time. However, the assumption of homogeneity is close to impossible in timetoevent data analysis. Semiparametric Cox PH has many advantages over parametric approach. Cox approach models the effect of covariates on the hazard function assuming it to be constant over time. If the PH assumption holds fit, the major problem is interpreting the hazard ratio (HR) estimation. If the PH assumption fails, HR estimation will be misinterpreted. This leads us to quantile survival regression (QSR) approach to provide a dynamic modeling approach. QSR models provide a dynamic, quantilebased relationship between the covariates and the survival time. The interpretation of these models is also straightforward^{5, 6}. In the current study QSR models are developed at various quantiles of survival duration to find the overall survival using the contributing covariates. The performance of these QSR models is compared with parametric, semiparametric and nonparametric models.
Quantile survival regression (QSR) helps us to measure the importance of covariates in modeling survival time at different quantiles of survival time. Since the distribution of survival times is mostly skewed (right), QSR models are proven to provide more robust estimates for the covariates which are particularly useful for exploring the heterogeneity effects of covariates than the other modeling approaches.
The research on QSR is increasing exponentially. A PubMed search using the keyword “quantile survival regression” returned 200 publications from 2015 to 2021. Compared to parametric, nonparametric and semiparametric survival models, Cox’s proportional hazard model (semiparametric) survival model is most often used for survival analysis. However, QSR tampers the proportional hazards assumption of Cox’s. QSR models the outcome variable to the covariates by fully using the data.
MATERIALS AND METHODS:
The colon cancer data for this retrospective cohort study is obtained through the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) registry for the years 20042015. Data was chosen through the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) registry. The SEER database consists of 13 cancer population registers, covering around 26% of the United States' population^{7}. The patients that survived from colon cancer until the end of the study period were considered as right censored. The accessible data include demographic information of patients (age, gender, race, and marital status), tumor details (grade, size, and histology), and data on nodal stages (number of inspected nodes, number of positive nodes), vitality and survival.
In this work, we preprocessed the SEER information for colon tumors to expel redundancies and missing data. The resulting data set had 30,251 records, which is a combination of four races, Caucasians (88.9%), African Americans (10.4%), American Indians (0.3%) and other races or others (0.4%). Among the 30251 patients included, 49.7% were male and 50.3% female. The average survival among male and female gender is 52 months. The mean age at diagnosis was 67.7 years with a standard deviation 14 years. Most of the patients are white (88.9%), African Americans (10.4%), American Indians (0.3%) and Asian Indians (0.4%). The duration (survival duration) of the study is 143 months. The mean survival duration time is 52.16 months with a standard deviation of 39 months. 56.85% of the subjects are censored and 43.15% are dead.
Univariate analysis for qualitative variables race, gender and marital status variables are not statistically significant and hence are dropped from survival modeling. Only qualitative variable remained in the modeling is histology (in situ, 38.34%; localized, 45.79%; distant, 15.88). The chisquare statistic for the histology is significant at the 0.05 level (p value < 0.0001). This indicates a significant departure from the hypothesized percentages. The stagewise details, along with the statistics of the variables used in this work are given below in the Table 1. Table 2 provides the research work using quantile survival regression approach and comparison of QSR with traditional survival analysis approaches.
The histogram of the survival duration is given in Figure 1 (Left). Because of the positive skew often seen with survival times, medians act as a better indicator than average of the survival times. From the survival duration histogram graph, it is evident that shorter survival times are more probable, indicating a severe risk for a colon cancer patient to be uncensored and fizzleout along the time. This is more evident from the cumulative distribution graph Figure 1 (Right). The probability of surviving 30 months or fewer is approximately 25% and probability of surviving 90 months or less is approximately 75%. By 90 months, a colon cancer patient has a good chance of encountering an event, death due to cancer. The failure function becomes flat at 0.6185 indicating that not all the subjects have died, yet.
Figure 1. Distribution of the Survival Duration (Left) and Failure Curve (Right)
Table 1. Stagewise Colon Cancer Statistics of the Variables used in this Study
N (%) 
Death (%) 
Variable 
Mean 
SD 

Stage 0 
373 (1.23) 
77 (20.64) 
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes 
69.55 66.45 25.96 13.10 0.00 
40.00 11.51 27.63 9.37 0.05 
Stage 1 
5854 (19.35) 
1596 (27.26) 
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes 
64.06 68.72 32.17 15.45 0.00 
38.49 12.87 27.35 8.80 0.00 
Stage 2A 
8499 (28.1) 
2968 (34.92) 
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes 
58.67 69.89 51.63 17.48 0.00 
38.29 13.84 34.09 9.31 0.00 
Stage 2B 
1294 (4.28) 
588 (45.44) 
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes 
45.58 68.69 65.46 17.49 0.00 
36.99 14.38 48.30 8.89 0.00 
Stage 3A 
1095 (3.62) 
328 (29.95) 
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes 
62.19 65.25 32.58 15.87 1.45 
39.00 13.61 23.02 9.04 0.86 
Stage 3B 
5365 (17.73) 
2256 (42.05) 
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes 
52.96 67.11 51.04 17.50 1.66 
38.53 14.31 32.65 9.50 1.15 
Stage 3C 
3524 (11.65) 
1860 (52.78) 
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes 
45.85 65.86 52.38 19.61 7.66 
36.81 14.68 34.16 9.65 5.07 
Stage 4 
4247 (14.04) 
3379 (79.56) 
Survival_Duration 
24.83 64.92 56.66 16.83 5.41 
25.93 13.94 40.08 9.24 6.32 
RESULTS AND DISCUSSION:
Parametric ModelsAccelerated Failure Time Model:
Parametric survival modeling is performed to see the difference in the survival between the patients who registered their initial stage of colon cancer after adjusting for patient’s age, survival duration, tumor size, number of lymph nodes examined, number of positive lymph nodes, and histology.
To compare the fitted parametric survival models, we identified the best parametric model based on Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and loglikelihood values^{23}. The best fit model is the one with smaller AIC, BIC values and largest Loglikelihood. Results of fitted parametric models are shown in the Table 3. Gamma parametric model has the lowest AIC, BIC values and highest likelihood values and hence performs better than other models. The parameter estimates of the gamma model are given in the Table 4 below. Based on the estimates of Gamma model, Wald tests produced by default indicate that insitu (histology) is not significant (p value = 0.59), and the rest of the variables are strongly significant (p<.0001). The negative estimated βcoefficient for distant (histology) variable shows that patients have a shorter survival time compared to localized patients. Age, tumor sizes and positive nodes contribute a great effect on colon cancer survival. Negative estimates for these estimates indicate the same, a shorter survival of colon cancer patients. Stage 2B patients has a lesser survival estimate compared to stage 3A.
Table 2. Published Research works using Quantile Survival Regression
Author(s) 
Research Work 
Ying et al.^{8} 
A semiparametric procedure for median regression models with censored observations. 
Yang^{9} 
Median regression estimators based on weighted empirical survival and hazard functions. 
Portnoy^{10} 
Generalized the principle of the KaplanMeier estimate using a recursively reweighted estimator of a quantile regression process. 
Carey et al.^{11} 
Interpretation of growth (failure curves) in pediatric AIDS is studied by developing models using loess smoothing, penalized likelihood quantile regressions are fit to model agespecific growth velocity distributions for genderstratified cohorts. 
Yin et al.^{12} 
Quantile regression model with estimating equation approach for parameter estimation of rightcensored correlated survival data. 
Peng and Huang^{13} 
A new quantile regression approach for survival data subject to conditionally independent censoring. The estimators are computed using martingale based estimating equations. 
Cai Yuzhi^{14} 
Established a quantile survival model for the censored data along with, the survival density, survival, or hazard functions of the survival time 
Fan C et al.^{15} 
Power transformed linear regression on quantile residual life for censored competing risks data. 
Hsieh JJ et al.^{16} 
Quantile regression based on counting process approach under semicompeting risks data 
Xue et al.^{17} 
Use of censored quantile regression model, which permits a more sensitive analysis of time to event data together with the Cox proportional hazards model. 
Faradmal et al.^{18} 
Used censored quantile regression (CQR) to provide indepth insight in the multivariable association between prognosis factors and survival rates using breast cancer data. 
Flemming et al.^{19} 
The association between timetosurgery (TTS) and cancerspecific (CSS) and overall survival (OS) were examined using multivariate Cox regression and using quantile regression at 42 days and 90^{th} percentile. 
Zarean et al.^{20} 
Censored quantile regression was fitted to find the overall survival of the patients using adjusted effects of variables and was compared with Cox regression model. 
Hong et al.^{21} 
Quantile regression approach for right censored Boston Lung Cancer survivor cohort dataset with covariates of low or high dimensionality. 
Qiu et al.^{22} 
Estimators are calculated using augmented inverse probability weighting technique using a quantile regression model for survival data with missing censoring indicators. 
Table 3. Goodness of fit results of Parametric Models
Model 
LogLikelihood 
AIC 
BIC 
Weibull 
29290.46971 
58610.94 
58735.16 
Gamma 
29216.39756 
58464.8 
58597.29 
LogLogistic 
29289.46342 
58608.93 
58733.14 
LogNormal 
29604.90039 
59239.8 
59364.02 
Exponential 
29290.57883 
58609.16 
58725.09 
Table 4. Analysis of Maximum Likelihood Estimates for Gamma Distribution
Parameter 

Estimate 
Standard 
95% Confidence Limits 

Error 

Intercept 

6.8704 
0.0968 
6.6805 
7.0602 
Age 

0.0459 
0.0008 
0.0475 
0.0443 
Tumor_Size 

0.0023 
0.0002 
0.0027 
0.0019 
Histology 
Distant 
0.4438 
0.0735 
0.5878 
0.2998 
Histology 
In Situ 
0.0195 
0.036 
0.051 
0.09 
Histology 
Localized 
0 
. 
. 
. 
Nodes_Examined 

0.0237 
0.0012 
0.0213 
0.0261 
Stage 
Stage 0 
2.1123 
0.1266 
1.864 
2.3605 
Stage 
Stage 1 
1.5226 
0.0849 
1.3562 
1.6891 
Stage 
Stage 2A 
1.2474 
0.0798 
1.0909 
1.4039 
Stage 
Stage 2B 
0.7384 
0.0799 
0.5818 
0.895 
Stage 
Stage 3A 
1.3403 
0.0953 
1.1535 
1.5271 
Stage 
Stage 3B 
0.9164 
0.0785 
0.7626 
1.0702 
Stage 
Stage 3C 
0.7925 
0.078 
0.6396 
0.9454 
Stage 
Stage 4 
0 
. 
. 
. 
Positive_Nodes 

0.0708 
0.0025 
0.0757 
0.0658 
Nonparametric Analysis:
From the KaplanMeier (KM) estimates of the survival function, 868 (observed events = 868) of them are reported dead due to colon cancer in the interval of [0, 1) months. In the same interval, there are 205 censored observations. However, the censored observations do not change the survival estimates when they leave the study.
From the KM curve given in the Figure 2 (Left), it appears the probability of surviving beyond 50 months is approximately 65%. This probability is in the same lines as the cumulative distributive curve. From the hazard function graph of Figure 2 (Right), the hazard value is high in for the initial 40 to 50 months and exponentially declines to smaller values. At the beginning of the study, we expect around 0.02 failures per month, while 25 months later, for those who survived we expect 0.008 failures per month indicating a decline about two and half times than what is noticed in the beginning of the study. Table 5 provides the information related to first, second and third quantiles of the survival duration. The interval during which the first 25% of the population is expected to fail, [0, 29) months is much shorter than the interval during which the second 25% of the population is expected to fail, [29, 91). There is not enough failure data to generate the point estimate for the third 25% of the population. This is indicated by “.” in the Table 5. This clearly supports our understanding that the hazard of failure is greater during the beginning of the study.
Figure 2. KaplanMeier Survival Estimation (Left) and Hazard Function (Right)
Table 5. Summary Statistics for Time Variable Survival Duration
Quantile Estimates of Survival Times 

Percent 
Point Estimate 
95% CI 

[Lower 
Upper) 

75 
. 
. 
. 
50 
91.000 
89.000 
93.000 
25 
29.000 
28.000 
30.000 
Figure 3. Kaplan Meier Stage Stratification Survival Analysis Estimates and Negative Log of Estimated Survivor Functions
Figure 3 (Left) gives a stagewise KM estimate graph. The survival probabilities for the patients in Stage 0, Stage 1, Stage 3A and Stage 2A are higher than the survival probabilities for the patients in Stage 3B, Stage 2B, Stage 3C and Stage 4. The logrank, Wilcoxon and likelihood ratio tests for homogeneity indicate strong significant evidence among the survival curves for all the stages (p<0.0001). This behavior is evident in the negative log survival estimate curves given in the Figure 3 (Right). Neither curve in the negative log survival estimates versus survival duration approximates a straight line through the origin indicating that exponential parametric model is not appropriate for this survival data. The log of negative log of estimated survivor function given in the Figure 4 has more than one curve crossing over the other, violating the proportional hazards assumption.
Figure 4. Stagewise Log of Negative Log of Estimated Survival Function
Semiparametric ModelProportional Hazards Model:
Cox proportional hazard model^{24} is used to determine the difference of survival duration between a patients age, survival duration, tumor size, number of lymph nodes examined, number of positive lymph nodes, and histology. Cox model reached its convergence, and the model tests including likelihood, score test and Wald test are all significant. The parameter estimates values of Cox regression model along with hazard ratios are given in Table 6 below.
An increment of one year of age, hazard value increases by 4%. Histology of the patients reported as distant have a 54% greater hazard and insitu patients have a 4% lower hazard rate than localized histology patients. Compared to stage 4 patient’s stage 0 has 86%, stage 2B has a 46% and stage 3A has a 71.5% lower hazard rates. While positive lymph nodes reported a 7% greater hazard rate. These conclusions are in line with our previous KM and parametric analyses.
Table 6. Cox PH Model Estimates
Parameter 

Parameter Estimate 
Standard Error 
Hazard Ratio 
Age 

0.04512 
0.000744 
1.046 
Tumor_Size 

0.00192 
0.000184 
1.002 
Histology 
Distant 
0.4298 
0.06652 
1.537 
Histology 
In Situ 
0.0382 
0.03379 
0.963 
Nodes_Examined 
0.02424 
0.00117 
0.976 

Stage 
Stage 0 
1.93504 
0.12079 
0.144 
Stage 
Stage 1 
1.38193 
0.07701 
0.251 
Stage 
Stage 2A 
1.10618 
0.07186 
0.331 
Stage 
Stage 2B 
0.62215 
0.07089 
0.537 
Stage 
Stage 3A 
1.25569 
0.08829 
0.285 
Stage 
Stage 3B 
0.8197 
0.07087 
0.441 
Stage 
Stage 3C 
0.70982 
0.07039 
0.492 
Positive_Nodes 
0.06475 
0.00207 
1.067 
Quantile Regression Model– Examining Potential Heterogeneous Effects:
As discussed above, Quantile regression is the best approach when the data is skewed. In our case, the survival durations data is skewed, and we prefer to model the data using quantile regression to learn how are the extreme survival times related with the covariates of the model. Using QSR we fit a linear model for the log of survival duration of the colon cancer patients with the covariates patients age, survival duration, tumor size, number of lymph nodes examined, number of positive lymph nodes, and histology. 58% of the data is censored. Table 7 provides the parameter estimates for quantiles 0.1 – 0.4. Each of the requested quantiles has a set of parameter estimates and confidence limits. The confidence limits are computed by resampling methods. The QSR results and the plots given in Figure 5 does not report any estimates, and the 95% CI band after the 40th percent quantile time as there are no events at or after that timepoint. For quantiles 0.5 through 0.7, since the survival function does not reach beyond 0.38, we will not be able to obtain a standard error or CI bounds for the quantiles.
The behavior of the coefficients of the covariates are given below in the Figure 5. These are the scatter plots of the estimated regression parameters against the quantiles. Notice that the effect of tumor size and age variables is negative and small over the lower quantiles. The estimates of the tumor size gradually increase from lower quantiles as we move to higher. However, the effect of the age parameter reached to a constant value around 0.4 quantile. The estimate of the tumor size has a value of 0.006 for the 0.1 quantile and increased to 0.003 for the 0.4 quantile and increased until 0.7 quantile. Similarly, the age parameter estimate has a value of 0.051 for the 0.1 quantile and levels of around 0.042 at 0.4 and higher quantiles. Compared to the other covariates, we notice a positive trend in stage 2B, stage 3C and a negative trend for the positive nodes parameters. Positive nodes estimate initially followed a negative trend until quantile 0.4 and started to move in a positive slope path. A nonconstant curve is an indication of heterogeneity in the data. The QSR equation interpretation is similar to interpretation of a regression analysis equation.
Table 7. Quantile Survival Regression Estimates
τ 
Parameter 
Estimate 
Standard 
τ 
Parameter 
Estimate 
Standard 
Error 
Error 

0.1 
Intercept 
5.208 
0.201 
0.3 
Intercept 
5.967 
0.152 

Age 
0.051 
0.001 

Age 
0.043 
0.001 

Tumor_Size 
0.006 
0.001 

Tumor_Size 
0.004 
0.001 

Distant 
0.395 
0.154 

Distant 
0.484 
0.118 

In Situ 
0.009 
0.063 

In Situ 
0.060 
0.041 

Localized 
0.000 
0.000 

Localized 
0.000 
0.000 

Nodes_Examined 
0.024 
0.002 

Nodes_Examined 
0.023 
0.002 

Stage 0 
2.188 
0.239 

Stage 0 
1.914 
0.136 

Stage 1 
1.750 
0.188 

Stage 1 
1.374 
0.129 

Stage 2A 
1.395 
0.179 

Stage 2A 
1.172 
0.124 

Stage 2B 
0.571 
0.194 

Stage 2B 
0.563 
0.136 

Stage 3A 
1.247 
0.218 

Stage 3A 
1.271 
0.134 

Stage 3B 
0.894 
0.168 

Stage 3B 
0.790 
0.118 

Stage 3C 
0.692 
0.161 

Stage 3C 
0.611 
0.124 

Stage 4 
0.000 
0.000 

Stage 4 
0.000 
0.000 

Positive_Nodes 
0.062 
0.006 

Positive_Nodes 
0.076 
0.005 
0.2 
Intercept 
5.512 
0.150 
0.4 
Intercept 
6.101 
0.139 

Age 
0.045 
0.001 

Age 
0.042 
0.001 

Tumor_Size 
0.006 
0.001 

Tumor_Size 
0.004 
0.001 

Distant 
0.352 
0.116 

Distant 
0.371 
0.100 

In Situ 
0.041 
0.042 

In Situ 
0.035 
0.039 

Localized 
0.000 
0.000 

Localized 
0.000 
0.000 

Nodes_Examined 
0.023 
0.002 

Nodes_Examined 
0.023 
0.001 

Stage 0 
2.056 
0.156 

Stage 0 
1.943 
0.117 

Stage 1 
1.659 
0.134 

Stage 1 
1.416 
0.115 

Stage 2A 
1.406 
0.129 

Stage 2A 
1.235 
0.108 

Stage 2B 
0.653 
0.136 

Stage 2B 
0.715 
0.112 

Stage 3A 
1.368 
0.145 

Stage 3A 
1.444 
0.120 

Stage 3B 
0.929 
0.129 

Stage 3B 
0.911 
0.107 

Stage 3C 
0.742 
0.123 

Stage 3C 
0.779 
0.103 

Stage 4 
0.000 
0.000 

Stage 4 
0.000 
0.000 

Positive_Nodes 
0.070 
0.006 

Positive_Nodes 
0.077 
0.004 
τ = Quantile
CONCLUSION:
In this paper, the main aim was to study the factors affecting the survival of colon cancer patients. For this, we employed parametric, semiparametric, nonparametric, and quantile survival regression approaches. Parametric models, Weibull, Gamma, Lognormal, Loglogistic and exponential, in patients' survival analyses were analyzed. Gamma model performed the best among all parametric models. Compared to localized histology tumor patients, distant histology tumor patients have a lower survival rate. Similarly, compared to stage 4 patients, stage 2B patients have a lower survival and stage 3A and 3B have a higher survival rate. These data are later analyzed using nonparametric KaplanMeier approach and semiparametric Cox regression approach. In the KM approach, stage 2B has a higher median survival of 117 months and Stage 4 reported a shorter median time of 20 months. There is no median value reported for the survival of stage 0, stage 1 and stage 3A patients because the KM estimator for this group never reached a survival probability lower than 63.78%, 52.45% and 54.08% respectively.
Figure 5. Quantile Survival Parameter Estimates
From the results of Cox regression, we have a 0.045 unit increase in the expected log of the relative hazard for each oneyear increase in age. A 10.46% increase in the expected hazard relative to a oneyear increase in age or the expected hazard is 1.05 times higher in a person who is one year older than another. It appears that there is a decrease in the hazard rate of patients in stages in situ, localized, and regional compared to distant spread. The decrease in the parameter estimates of these parameters are significant.
Finally, using a Quantile Survival Regression (QSR), which is a distributionfree approach, we modeled the survival data. QSR is very useful when we are interested in modeling the survival time and when the effects of covariates on the survival distribution differ with the covariate level. While modeling using QSR approach, inference about the regression parameters for a particular quantile depends only on the conditional distribution near that quantile. In addition to these, QSR model is the direct interpretation of estimated parameter coefficients in terms of change in quantile of survival time distribution.
Consider quantile 0.1. Compared with the stage 4 patients, the overall survival of stage 0 patients’ is 2.19 months longer, which is also significant. The estimates for all stages are significant. Also, one unit increment in the tumor size resulted in 0.006 months loss in survival. All the estimates except for insitu are significant. Estimates above 0.7 quantile are not produced. Such details could not be detected using a Cox or a parametric model.
The parametric regression coefficients are interpreted as the effect on the mean or median of the survival time, whereas the QSR regression coefficients apply to specified quantiles of the survival time. Unlike these two methods, Cox proportional hazards modeling models the hazard function. However, the Cox model requires no parametric assumption about the baseline hazard can also incorporate timedependent covariates. We conclude that QSR models may be adopted if one wishes to achieve good quantile prediction for lower quantiles of the colon cancer data and Cox model may be preferred in terms of overall prediction performance.
CONFLICT OF INTEREST:
The authors have no conflicts of interest regarding this research work.
REFERENCES:
1. Mudunuru V. Comparison of activation functions in multilayer neural networks for stage classification in breast cancer. Neural, Parallel, and Scientific Computations. 2016; 24:8396.
2. Ahmed FE, Vos PW, Holbert D. Modeling survival in colon cancer: a methodological review. Molecular Cancer. 2007 Dec; 6(1):12.
3. Singh R, Mukhopadhyay K. Survival analysis in clinical trials: Basics and must know areas. Perspectives in clinical research. 2011 Oct; 2(4):145.
4. Allison PD. Survival analysis using SAS: a practical guide. Sas Institute; 2010 Mar 29.
5. Koenker R, Bassett Jr G. Regression quantiles. Econometrica: journal of the Econometric Society. 1978 Jan 1:3350.
6. Koenker R, Geling O. Reappraising medfly longevity: a quantile regression survival analysis. Journal of the American Statistical Association. 2001 Jun 1; 96(454):45868.
7. Howlader N, Noone AM, Krapcho M, Miller D, Bishop K, Altekruse SF, Kosary CL, Yu M, Ruhl J, Tatalovich Z, Mariotto A. SEER Cancer Statistics Review, 1975–2013. Bethesda, MD: National Cancer Institute; 2016.
8. Ying Z, Jung SH, Wei LJ. Survival analysis with median regression models. Journal of the American Statistical Association. 1995 Mar 1; 90(429):17884.
9. Yang S. Censored median regression using weighted empirical survival and hazard functions. Journal of the American Statistical Association. 1999 Mar 1; 94(445):13745.
10. Portnoy S. Censored regression quantiles. Journal of the American Statistical Association. 2003 Dec 1; 98(464):100112.
11. Carey VJ, Yong FH, Frenkel LM, McKinney RM. Growth velocity assessment in paediatric AIDS: smoothing, penalized quantile regression and the definition of growth failure. Statistics in Medicine. 2004 Feb 15; 23(3):50926.
12. Yin G, Cai J. Quantile regression models with multivariate failure time data. Biometrics. 2005 Mar; 61(1):15161.
13. Peng L, Huang Y. Survival analysis with quantile regression models. Journal of the American Statistical Association. 2008 Jun 1; 103(482):63749.
14. Cai Y. A quantile survival model for censored data. Australian & New Zealand Journal of Statistics. 2013 Jun; 55(2):15572.
15. Fan C, Zhang F, Zhou Y. Powertransformed linear regression on quantile residual life for censored competing risks data. Communications in StatisticsTheory and Methods. 2016 Oct 17; 45(20):5884905.
16. Hsieh JJ, Wang HR. Quantile regression based on counting process approach under semicompeting risks data. Annals of the Institute of Statistical Mathematics. 2018 Apr; 70(2):395419.
17. Xue X, Xie X, Strickler HD. A censored quantile regression approach for the analysis of time to event data. Statistical methods in medical research. 2018 Mar; 27(3):95565.
18. Faradmal J, Roshanaei G, Mafi M, SadighiPashaki A, Karami M. Application of censored quantile regression to determine overall survival related factors in breast cancer. Journal of research in health sciences. 2016; 16(1):36.
19. Flemming JA, Nanji S, Wei X, Webber C, Groome P, Booth CM. Association between the time to surgery and survival among patients with colon cancer: a populationbased study. European Journal of Surgical Oncology (EJSO). 2017 Aug 1; 43(8):144755.
20. Zarean E, Mahmoudi M, Azimi T, Amini P. Determining Overall Survival and Risk Factors in Esophageal Cancer Using Censored Quantile Regression. Asian Pacific journal of cancer prevention: APJCP. 2018; 19(11):3081.
21. Hong HG, Christiani DC, Li Y. Quantile regression for survival data in modern cancer research: expanding statistical tools for precision medicine. Precision clinical medicine. 2019 Jun 1; 2(2):909.
22. Qiu Z, Ma H, Chen J, Dinse GE. Quantile regression models for survival data with missing censoring indicators. Statistical methods in medical research. 2021 May; 30(5):132031.
23. Mudunuru VR. Modeling and Survival Analysis of Breast Cancer: A Statistical, Artificial Neural Network, and Decision Tree Approach. University of South Florida; 2016.
24. Klein JP, Zhang MJ. Survival analysis, software. Encyclopaedia of biostatistics. 2005 Jul 15; 8.
Received on 11.11.2021 Modified on 17.03.2022
Accepted on 20.06.2022 © RJPT All right reserved
Research J. Pharm. and Tech 2023; 16(3):14011408.
DOI: 10.52711/0974360X.2023.00231