Integration of Data Analytics and Mathematical Modellings
Sujatha. V1, Catherine Rexy .D2
1Assistant Professor, VIT University, Vellore
2Research Scholar, VIT University, Vellore
*Corresponding Author E-mail:
ABSTRACT:
Decision support data management system requires the representation and manipulation of both data and mathematical models, when data modelling features are small ,it represents the mathematical relationships among the elements of the domain .Mathematical Techniques enhances the capabilities of decision support system, when data are larger in number, Mathematical modelling Techniques lack to many of the facilities, qualitative relationships among the covariates Data Analytics provide a way for the qualitative relationships among the covariates. In this paper, Data Analytics techniques are applied to Medissor data set, and it is shown that it convinces the qualitative representation of the Decision Support data Management system.
KEYWORDS: Data Analysis, Data Analytics, Messidor data set.
INTRODUCTION:
Diabetic retinopathy is the leading cause of new blindness in persons aged 25-74 years in the United States. The exact mechanism by which diabetes causes retinopathy remains unclear, but several theories have been postulated to explain the typical course and history of the disease. The disease can be diagnosed by fluoresce in angiography and looking for the following features[5]:
· Micro aneurysms:
The earliest clinical sign of diabetic retinopathy; these occur secondary to capillary wall out pouching due to pericyte loss; they appear as small, red dots in the superficial retinal layers Dot and blot hemorrhages: Appear similar to micro aneurysms if they are small; they occur as micro aneurysms rupture in the deeper layers of the retina, such as the inner nuclear and outer plexiform layers.
· Flame-shaped hemorrhages:
Splinter hemorrhages that occur in the more superficial nerve fiber layer Retinal edema and hard exudates: Caused by the breakdown of the blood-retina barrier, allowing leakage of serum proteins, lipids, and protein from the vessels.
· Cotton-wool spots:
Nerve fiber layer infarctions from occlusion of precapillary arterioles; they are frequently bordered by micro aneurysms and vascular hyper permeability.
· Venous loops and venous beading:
Frequently occur adjacent to areas of no perfusion; they reflect increasing retinal ischemia, and their occurrence is the most significant predictor of progression to proliferative diabetic retinopathy (PDR).
· Intraretinal micro vascular abnormalities: Remodeled capillary beds without proliferative changes; can usually be found on the borders of the nonperfused retina
· Macular exsudates:
Leading cause of visual impairment in patients with diabetes.
Data Analysis is absolutely a science and not an art. Today’s statistical applications involve enormous data sets It starts with combining large datasets from multiple sources and then applying the statistical or Mathematical Techniques to that data in order to extract valuable insight. survival techniques may produce more accurate estimates from smaller samples sizes and allows the analyst to engage in multivariate analyses. This problem is far from being solved due to the lack of a large and adequate database accessible to the scientific community.
Databases:
The two main databases will contain colour images of the retina, acquired using a retino graph with or without pupil dilation during routine clinical examinations. These examinations will be performed in the four ophthalmology departments involved in the program. To make their diagnosis, ophthalmologists generally use a central picture and two peripheral pictures of the retina. We will proceed in the same way and we will record the three images in the databases. However, during the MESSIDOR project[3], only the central image will be annotated. The images will be saved as uncompressed TIFF format with a 1440 * 960 pixel resolution that is about 4 MB per image.
For each image, it will be indicated at least:
· The stage of Diabetic Retinopathy.
· The number of and/or the surface of micro aneurysms.
· The degree of exudation: the degree is function of the surface that is occupied by exudates and their locations with respect to the center of vision (Macula).
· The level of hemorrhage, which is defined with respect to the number and/or the surface occupied by hemorrahages.
· This database will contain about 300 images. Micro aneurysms, exudates and hemorrhages will be marked individually on fifty of these image
Data Set Information:
This dataset contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not. All features represent either a detected lesion, a descriptive feature of a anatomical part or an image-level descriptor.
Attribute Information:
0) The binary result of quality assessment. 0 = bad quality 1 = sufficient quality. 1) The binary result of pre-screening, where 1 indicates severe retinal abnormality and 0 its lack.
1) 2-7) The results of MA detection. Each feature value stand for the number of MAs found at the confidence levels alpha = 0.5, . . . , 1, respectively. 8-15) contain the same information as 2-7) for exudates. However, as exudates are represented by a set of points rather than the number of pixels constructing the lesions, these features are normalized by dividing the number of lesions with the diameter of the ROI to compensate different image sizes.
16) The Euclidean distance of the center of the macula and the center of the optic disc to provide important information regarding the patient’s condition. This feature is also normalized with the diameter of the ROI
17) The diameter of the optic disc18) The binary result of the AM/FM-based classification. 19) Class label. 1 = contains signs of DR (Accumulative label for the Messidor classes 1, 2, 3), 0 = no signs of DR.
|
BINARY_FITNESS_RESULT |
MA_0.5 |
MA_0.75 |
MA_0.90 |
MA_0.95 |
MA_0.98 |
|
1 |
22 |
22 |
22 |
19 |
18 |
|
1 |
24 |
24 |
22 |
18 |
16 |
|
1 |
62 |
60 |
59 |
54 |
47 |
|
1 |
55 |
53 |
53 |
50 |
43 |
|
1 |
44 |
44 |
44 |
41 |
39 |
|
1 |
44 |
43 |
41 |
41 |
37 |
|
0 |
29 |
29 |
29 |
27 |
25 |
|
1 |
6 |
6 |
6 |
6 |
2 |
|
1 |
22 |
21 |
18 |
15 |
13 |
|
MA_0.9 |
LSN_PT_0.5 |
LSN_PT_0.75 |
LSN_PT_0.90 |
LSN_PT_0.95 |
LSN_PT_0.98 |
LSN_PT_0.99 |
||||
|
14 |
49.89576 |
17.77599 |
5.27092 |
0.771761 |
0.018632 |
0.006864 |
||||
|
13 |
57.70994 |
23.79999 |
3.325423 |
0.234185 |
0.003903 |
0.003903 |
||||
|
33 |
55.83144 |
27.99393 |
12.68749 |
4.852282 |
1.393889 |
0.373252 |
||||
|
31 |
40.46723 |
18.44595 |
9.118901 |
3.079428 |
0.840261 |
0.272434 |
||||
|
27 |
18.02625 |
8.570709 |
0.410381 |
0 |
0 |
0 |
||||
|
29 |
28.3564 |
6.935636 |
2.305771 |
0.323724 |
0 |
0 |
||||
|
16 |
15.4484 |
9.113819 |
1.633493 |
0 |
0 |
0 |
||||
|
1 |
20.67965 |
9.497786 |
1.22366 |
0.150382 |
0 |
0 |
||||
|
10 |
66.69193 |
23.54554 |
6.151117 |
0.496372 |
0 |
0 |
||||
|
|
|
|
|
|
|
|
||||
|
LSN_PT_1 |
EUC_DT |
OP_DIS_DIA |
DR_MESSIR |
DR |
|
|||||
|
0.003923 |
0.00393 |
0.48693 |
0.10005 |
1 |
0 |
|
||||
|
0.003903 |
0.00393 |
0.52098 |
0.14444 |
0 |
0 |
|
||||
|
0.041817 |
0.00774 |
0.53094 |
0.12858 |
0 |
1 |
|
||||
|
0.007653 |
0.00151 |
0.48324 |
0.11479 |
0 |
0 |
|
||||
|
0 |
0 |
0.47595 |
0.12352 |
0 |
1 |
|
||||
|
0 |
0 |
0.50281 |
0.12671 |
0 |
1 |
|
||||
|
0 |
0 |
0.54173 |
0.13955 |
0 |
1 |
|
||||
|
0 |
0 |
0.57638 |
0.07101 |
1 |
0 |
|
||||
|
0 |
0 |
0.50003 |
0.11673 |
0 |
1 |
|
||||
MATERIALS AND METHODS:
To evaluate the qualitative relationships between the covariates in this paper common metrics are used to present the results. The results are based on performance measurements, and more detailed statistics. diabetes data sets from 1999 to 2008 were downloaded fromhttps://archive.ics.uci.edu/ml/datasets/Diabetic+Retinopathy+Debrecen+Data+Set#[6]. Based on data Analysis[2] (using R programming) we observe performance of the metrics It provides a good knowledge of the qualities on covariates measurements, It Convince the ophthalmologists to use automatic methods for diabetic retinopathy evaluation by providing them with quantitative assessments of the efficacy. even though analysis on large data bases it provides some indispensable results to the researches
RESULTS AND DISCUSSION:
Analysis On Medissor Data Set: (Diab. Retinopathy):
In the data set, the optical disc diameter[1] is presented with confidence levels 0.5 to 1.00[4]. We are observing the resultant P value as 0.77 Hence we can assume that the existence of DR is more significantly lies with confidence level >=0.75
|
|
DF |
Sum of Squares |
Mean square |
F value |
Pr(>F) |
|
OP_Disc_DIA |
1 |
0.02 |
0.02055 |
0.082 |
0.774 |
|
Residuals |
1149 |
286.63 |
0.24946 |
|
|
Analyzing Diabetes data set from 1999 to 2008 with 101676 distinct records:
Step 1: Analyzing Hospital “Readmission Rates”
Among Diabetes data set from 1999 to 2008 with 101676 distinct records of 50 different countries[5], we are observing,
1. The Age >30 has most of the readmission rates with 34.5%
2. The Age<30 has lesser readmission rates with 11.16%
3. The remaining 54.34% had never visited the hospital again.
Step 2: Analyzing “Gender” factor
1. 53.80 % of Diabetes were diagnosed for females and
2. 46.19% were Males
Step 3: Analyzing “Age” factor:
On 9 year data set of different countries, we are observing the diabetes diagnosis for ages;
1. 0-10 => 0.15%
2. 10-20 => 0.67%
3. 20-30 => 1.62%
4. 30-40 => 3.71%
5. 40-50 => 9.52%
6. 50-60 => 16.97%
7. 60-70 => 22.11%
8. 70-80 => 25.63%
9. 80-90 => 16.91%
10. 90-100 => 2.74%
Step 4: Analyzing “Patient Weights”:
(Dealing with unknown dataset for few records. We cannot conclude based on blank data) Observing the more number of patients with weights 75 to 100 KG of weights had reported diabetes. Next highest impact of diabetes is the persons who falls b/w 50-75 Kgs then followed by 100-125Kgs.
Step 5: Computing “Time in Hospital”
We are observing minimum LOS as 1 and at an average of 4.5 days to 6 days and at a maximum of 14 days. (not exceeding that)
Step 6 : Finding deterministic factor with respect to weight and gender for given age;
We are observing Patient Weight has more impact on diabetes implication than that of Patient’s gender.
Step 7: Finding deterministic factor for length of stay with respect to age, gender and weight
Patient’s Age is the primary criteria (+ve results) for determining length of stay for diabetes Geneder has significance but it won’t impact the los (-ve value)
Step 8 : Determining the factor that leads multiple number of diagnosis
We are observing, the historical medications and lab procedures along with the patients gender and age has more impact on need for diagnosis recurrence. We are observing “Hospital Readmission” factor only depends on procedures and medications. It is independent of age, weight or gender.
SUMMARY:
We integrate the mathematical techniques on Messidor data set (Diabetic-Retinopathy )on analyzing with the available features, we observed Age and gender are vital factors for most of the occurrences, and there p-value shows that the features are highly significant.
REFERENCES:
1. T. Damms and F. Dannheim, “Sensitivity and specificity of optic disc parameters in chronic glaucoma,” Investigative Ophthalmology and Visual Science, vol. 34, no. 7, pp. 2246–2250, 1993
2. Amir Gandomi, Murtaza Haider, Beyond the hype:Big data concepts, methods, and analytics, International Journal of information management 35(2015) 137-144.
3. MESSIDOR: Methods for Evaluating Segmentation and Indexing technique Dedicated to Retinal Ophthalmology, 2004, http://messidor.crihan.fr/index-en.php
4. X. Zhu and R. M. Rangayyan, “Detection of the optic disc in images of the retina using the Hough transform,” in Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS '08), pp. 3546–3549, IEEE, Vancouver, Canada, August 2008.
Received on 28/07.2016 Modified on 02.08.2016
Accepted on 08.08.2016 © RJPT All right reserved
Research J. Pharm. and Tech 2016; 9(11): 1978-1984.
DOI: 10.5958/0974-360X.2016.00404.2