top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

Writer's pictureOmar Mohamed

Parkinson Disease disease data analysis and modeling

Introduction


Let's first get a brief intro to our problem about Parkinson. This disease is a brain disorder that causes unintended or uncontrollable movements, such as shaking, stiffness, and difficulty with balance and coordination. Symptoms usually begin gradually and worsen over time. As the disease progresses, people may have difficulty walking and talking. Symptoms usually begin gradually and worsen over time. As the disease progresses, people may have difficulty walking and talking. They may also have mental and behavioral changes, sleep problems, depression, memory difficulties, and fatigue.

While virtually anyone could be at risk for developing Parkinson’s, some research studies suggest this disease affects more men than women. It’s unclear why, but studies are underway to understand factors that may increase a person’s risk. One clear risk is age: Although most people with Parkinson’s first develop the disease after age 60, about 5% to 10% experience onset before the age of 50. Early-onset forms of Parkinson’s are often, but not always, inherited, and some forms have been linked to specific gene mutations. As it's clear it's a crucial subject to be discussed and we will get dip into our tabular dataset trying to get some insights from data.

For more info checkout the Github repo in the following link: Link



Data importing


Let's start by importing the required libraries;

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

# Data source: https://www.kaggle.com/code/nvssrkameswar/parkinsons-disease-detection/data

Data can be found on the given source, once downloaded it can be easily unzipped.

!unzip /content/archive.zip

Data can be searched in the directory 'New' and found using the 'os' library;

for dirname, _, filenames in os.walk('/content/New'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Reading the dataset;

df = pd.read_csv('/content/New/parkinsons.data')

Let's get some info from data;

# overview of the dataset
# Given that we have 195 image examples, it only contains 22 features to classify the status
print("\n Overview of the dataset")
print(df.info())
>>>
Overview of the dataset
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 24 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              195 non-null    object 
 1   MDVP:Fo(Hz)       195 non-null    float64
 2   MDVP:Fhi(Hz)      195 non-null    float64
 3   MDVP:Flo(Hz)      195 non-null    float64
 4   MDVP:Jitter(%)    195 non-null    float64
 5   MDVP:Jitter(Abs)  195 non-null    float64
 6   MDVP:RAP          195 non-null    float64
 7   MDVP:PPQ          195 non-null    float64
 8   Jitter:DDP        195 non-null    float64
 9   MDVP:Shimmer      195 non-null    float64
 10  MDVP:Shimmer(dB)  195 non-null    float64
 11  Shimmer:APQ3      195 non-null    float64
 12  Shimmer:APQ5      195 non-null    float64
 13  MDVP:APQ          195 non-null    float64
 14  Shimmer:DDA       195 non-null    float64
 15  NHR               195 non-null    float64
 16  HNR               195 non-null    float64
 17  status            195 non-null    int64  
 18  RPDE              195 non-null    float64
 19  DFA               195 non-null    float64
 20  spread1           195 non-null    float64
 21  spread2           195 non-null    float64
 22  D2                195 non-null    float64
 23  PPE               195 non-null    float64

Checking out data target values;

# Data has 75% parkinson cases, and the remaining 25% are normal.
df.value_counts('status')
>>>
status
1    147
0     48
dtype: int64

Splitting data into training, and testing data;

# splitting the features in X and traget variable in y
X = df.drop(columns=['name','status']).values
y = df.status.values

# splitting the data into train and test datasets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, stratify=y, random_state=41)

Data Understanding


We can see the training data first and second features;

X_train[1,:]
X_train[0,:]
>>>
array([ 1.835200e+02,  2.168140e+02,  1.613400e+02,  1.466000e-02,
        8.000000e-05,  8.490000e-03,  8.190000e-03,  2.546000e-02,
        6.050000e-02,  6.180000e-01,  2.865000e-02,  4.101000e-02,
        6.359000e-02,  8.595000e-02,  6.057000e-02,  1.436700e+01,
        4.780240e-01,  7.689740e-01, -4.276605e+00,  3.557360e-01,
        3.142364e+00,  3.360850e-01])

        
array([ 1.797110e+02,  2.259300e+02,  1.448780e+02,  7.090000e-03,
        4.000000e-05,  3.910000e-03,  4.190000e-03,  1.172000e-02,
        4.313000e-02,  4.420000e-01,  2.297000e-02,  2.768000e-02,
        3.455000e-02,  6.892000e-02,  7.223000e-02,  1.186600e+01,
        5.909510e-01,  7.455260e-01, -4.379411e+00,  3.755310e-01,
        3.671155e+00,  3.320860e-01])

First question to ask about Parkinson's disease; Can we determine the status of the case using the rpde and vocal analysis -- MDVP:Shimmer - RPDE


# First question; Can we determine the status of the case using the rpde and vocal analysis -- MDVP:Shimmer - RPDE
plt.scatter(X_train[:,8][y_train ==0], X_train[:,16][y_train ==0], c='r')
plt.scatter(X_train[:,8][y_train ==1], X_train[:,16][y_train ==1], c='b')

plt.show()

Second question is that if we can determine the status of the case using the RPDE and NHR;


# Second question; Can we determine the status of the case using the RPDE and NHR
plt.scatter(X_train[:,14][y_train ==0], X_train[:,16][y_train ==0], c='r')
plt.scatter(X_train[:,14][y_train ==1], X_train[:,16][y_train ==1], c='b')

plt.show()

Third question if we can determine the status of the case using the RPDE and DDA;


# Third question; Can we determine the status of the case using the RPDE and DDA
plt.scatter(X_train[:,15][y_train ==0], X_train[:,16][y_train ==0], c='r')
plt.scatter(X_train[:,15][y_train ==1], X_train[:,16][y_train ==1], c='b')

plt.show()

We can also discuss some medical information related.


What causes Parkinson’s disease?


The most prominent signs and symptoms of Parkinson’s disease occur when nerve cells in the basal ganglia, an area of the brain that controls movement, become impaired and/or die. Normally, these nerve cells, or neurons, produce an important brain chemical known as dopamine. When the neurons die or become impaired, they produce less dopamine, which causes the movement problems associated with the disease. Scientists still do not know what causes the neurons to die. People with Parkinson’s disease also lose the nerve endings that produce norepinephrine, the main chemical messenger of the sympathetic nervous system, which controls many functions of the body, such as heart rate and blood pressure. The loss of norepinephrine might help explain some of the non-movement features of Parkinson’s, such as fatigue, irregular blood pressure, decreased movement of food through the digestive tract, and sudden drop in blood pressure when a person stands up from a sitting or lying position. Many brain cells of people with Parkinson’s disease contain Lewy bodies, unusual clumps of the protein alpha-synuclein. Scientists are trying to better understand the normal and abnormal functions of alpha-synuclein and its relationship to genetic mutations that impact Parkinson’s and Lewy body dementia. Some cases of Parkinson’s disease appear to be hereditary, and a few cases can be traced to specific genetic mutations. While genetics is thought to play a role in Parkinson’s, in most cases the disease does not seem to run in families. Many researchers now believe that Parkinson’s results from a combination of genetic and environmental factors, such as exposure to toxins. Symptoms of Parkinson’s disease Parkinson’s has four main symptoms: Tremor in hands, arms, legs, jaw, or head Muscle stiffness, where muscle remains contracted for a long time Slowness of movement Impaired balance and coordination, sometimes leading to falls Other symptoms may include: Depression and other emotional changes Difficulty swallowing, chewing, and speaking Urinary problems or constipation Skin problems The symptoms of Parkinson’s and the rate of progression differ among individuals. Early symptoms of this disease are subtle and occur gradually. For example, people may feel mild tremors or have difficulty getting out of a chair. They may notice that they speak too softly, or that their handwriting is slow and looks cramped or small. Friends or family members may be the first to notice changes in someone with early Parkinson’s. They may see that the person’s face lacks expression and animation, or that the person does not move an arm or leg normally. People with Parkinson's disease often develop a parkinsonian gait that includes a tendency to lean forward; take small, quick steps; and reduce swinging their arms. They also may have trouble initiating or continuing movement. Symptoms often begin on one side of the body or even in one limb on one side of the body. As the disease progresses, it eventually affects both sides. However, the symptoms may still be more severe on one side than on the other. Many people with Parkinson’s disease note that prior to experiencing stiffness and tremor, they had sleep problems, constipation, loss of smell, and restless legs. While some of these symptoms may also occur with normal aging, talk with your doctor if these symptoms worsen or begin to interfere with daily living.


Diagnosis of Parkinson’s disease

There are currently no blood or laboratory tests to diagnose non-genetic cases of Parkinson’s. Doctors usually diagnose the disease by taking a person’s medical history and performing a neurological examination. If symptoms improve after starting to take medication, it’s another indicator that the person has Parkinson’s.

A number of disorders can cause symptoms similar to those of Parkinson’s disease. People with Parkinson’s-like symptoms that result from other causes, such as multiple system atrophy and dementia with Lewy bodies, are sometimes said to have parkinsonism. While these disorders initially may be misdiagnosed as Parkinson’s, certain medical tests, as well as response to drug treatment, may help to better evaluate the cause. Many other diseases have similar features but require different treatments, so it is important to get an accurate diagnosis as soon as possible.

Treatments for Parkinson’s disease

Although there is no cure for Parkinson’s disease, medicines, surgical treatment, and other therapies can often relieve some symptoms.

Medicines for Parkinson’s disease

Medicines can help treat the symptoms of Parkinson’s by:

  • Increasing the level of dopamine in the brain

  • Having an effect on other brain chemicals, such as neurotransmitters, which transfer information between brain cells

  • Helping control non-movement symptoms

The main therapy for Parkinson’s is levodopa. Nerve cells use levodopa to make dopamine to replenish the brain’s dwindling supply. Usually, people take levodopa along with another medication called carbidopa. Carbidopa prevents or reduces some of the side effects of levodopa therapy — such as nausea, vomiting, low blood pressure, and restlessness — and reduces the amount of levodopa needed to improve symptoms.

People living with Parkinson’s disease should never stop taking levodopa without telling their doctor. Suddenly stopping the drug may have serious side effects, like being unable to move or having difficulty breathing.

The doctor may prescribe other medicines to treat Parkinson’s symptoms, including:

  • Dopamine agonists to stimulate the production of dopamine in the brain

  • Enzyme inhibitors (e.g., MAO-B inhibitors, COMT inhibitors) to increase the amount of dopamine by slowing down the enzymes that break down dopamine in the brain

  • Amantadine to help reduce involuntary movements

  • Anticholinergic drugs to reduce tremors and muscle rigidity


Modeling phase


Predictive analytics is driven by predictive modelling. It’s more of an approach than a process. Predictive analytics and machine learning go hand-in-hand, as predictive models typically include a machine learning algorithm. These models can be trained over time to respond to new data or values, delivering the results the business needs. Predictive modelling largely overlaps with the field of machine learning. There are two types of predictive models. They are Classification models, that predict class membership, and Regression models that predict a number. These models are then made up of algorithms. The algorithms perform the data mining and statistical analysis, determining trends and patterns in data. Predictive analytics software solutions will have built in algorithms that can be used to make predictive models. The algorithms are defined as ‘classifiers’, identifying which set of categories data belongs to.

The most widely used predictive models are: Decision trees: Decision trees are a simple, but powerful form of multiple variable analysis. They are produced by algorithms that identify various ways of splitting data into branch-like segments. Decision trees partition data into subsets based on categories of input variables, helping you to understand someone’s path of decisions. Regression (linear and logistic) Regression is one of the most popular methods in statistics. Regression analysis estimates relationships among variables, finding key patterns in large and diverse data sets and how they relate to each other.

Neural networks Patterned after the operation of neuronsin the human brain, neural networks (also called artificial neural networks) are a variety of deep learning technologies. They’re typically used to solve complex pattern recognition problems – and are incredibly useful for analyzing large data sets. They are great at handling nonlinear relationships in data – and work well when certain variables are unknown Other classifiers: Time Series Algorithms: Time series algorithms sequentially plot data and are useful for forecasting continuous values over time. Clustering Algorithms: Clustering algorithms organize data into groups whose members are similar. - Outlier Detection Algorithms: Outlier detection algorithms focus on anomaly detection, identifying items, events or observations that do not conform to an expected pattern or standard within a data set. - Ensemble Models: Ensemble models use multiple machine learning algorithms to obtain better predictive performance than what could be obtained from one algorithm alone. - Factor Analysis: Factor analysis is a method used to describe variability and aims to find independent latent variables. - Naïve Bayes: The Naïve Bayes classifier allows us to predict a class/category based on a given set of features, using probability. - Support vector machines: Support vector machines are supervised machine learning techniques that use associated learning algorithms to analyze data and recognize patterns.

Each classifier approaches data in a different way, therefore for organizations to get the results they need, they need to choose the right classifiers and models. We have experimented several models here; Ada Boost Classifier, Decision Tree Classifier, Random Forest Classifier, Gradient Boost, Hist Gradient Boosting, and XGBoost which has scored the best score.


from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import confusion_matrix

clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

from sklearn.metrics import precision_recall_fscore_support

pre,rec,f1,a = precision_recall_fscore_support(y_test, y_pred, average='macro')

acc = clf.score(X_test, y_test)

print('The accuracy is {0}, precision is {1}, recall is {2}, and f1-score is {3}'.format(acc,pre,rec,f1))

>>>
The accuracy is 0.9666666666666667, precision is 0.9791666666666667, recall is 0.9285714285714286, and f1-score is 0.9509001636661211 

The model has well scored and evaluated using the well-known metrics.


Conclusion


While the progression of Parkinson’s is usually slow, eventually a person’s daily routines may be affected. Activities such as working, taking care of a home, and participating in social activities with friends may become challenging. Experiencing these changes can be difficult, but support groups can help people cope. These groups can provide information, advice, and connections to resources for those living with Parkinson’s disease, their families, and caregivers. The organizations listed below can help people find local support groups and other resources in their communities. The article has tried to discuss the disease, some data analysis and answering some questions, data understanding, disease symptoms, data modeling and prediction. Hopefully it was an insightful and a useful article.


Data source: Link

0 comments

Recent Posts

See All

Comments


bottom of page