Diagnosing Schizophrenia Using EEG Biomarkers
Kira Bukharina Sergey Bukharin
Schizophrenia is a serious mental disorder that affects 1.1 percent of the population or approximately 2.6 million adults in United States aged 18 or older and about 300,000 Canadians . An estimated 40 percent of individuals with the condition are untreated in any given year. 
Schizophrenia has a number of ways to be diagnosed based on the symptoms. First, to see if the person has schizophrenia and not physical illness (for example traumatic brain injury) the doctor might use various diagnostic tests such as MRI, CT scans or blood tests. If the doctor finds no physical reason for the symptoms, he or she might refer the patient to a psychiatrist, psychologist or any other healthcare professionals who are specially trained to diagnose and treat mental illnesses. Psychiatrists and psychologists use specially designed interview or assessment tools to evaluate a person for schizophrenia. The main problem which we see in diagnosing schizophrenia is that it takes a long time to get the accurate diagnosis. 
The Diagnostic and Statistical Manual of Mental Disorders (DSM) is the book that different psychologists use to diagnose mental illnesses. This book is very helpful when it comes to learning more detailed information about mental illnesses. We have purchased this book last year. According to DSM-5 main symptoms are:
Disorganized or catatonic behavior
Negative symptoms  
We identified the following problems of diagnosing schizophrenia from the DSM-5 book:
It takes at least 6 months to diagnose schizophrenia correctly
The symptoms of schizophrenia are similar to other mental disorders for example: Delusional Disorder, Bipolar Disorder, and Psychotic Depression
In our research we would like to address these issues and develop diagnostics procedure that will identify schizophrenia immediately at high significance level.
Electroencephalogram (EEG) is used to evaluate the electrical activity in the brain and can be used to detect brain issues. We have already used EEG in our last science fair project which was related to estimation of attention levels affected by different electronic devices. We were measuring EEG using Fp1 and A1 electrodes. The device we used was the NeuroSky’s Mindwave Mobile (the total cost was $132.)
In general, an EEG is the most inexpensive method of neuroimaging. The typical cost of an EEG is $200-$700  It is possible to get a 14-channel system for $800.  Also potentially EEG might diagnose schizophrenia much faster than conventional methods.
We are comparing EEG of healthy people to EEG of people affected by schizophrenia, trying to find the difference and understand where to search for that difference in EEG, based on schizophrenia symptoms, brain anatomy and EEG electrodes location.
We process EEG data and test statistical hypothesis to compare different signals.
We started this project in 2018 and are planning to continue it in the future. On this stage of the project we are using standard techniques of signal analysis, but on the next stage we are planning to use machine learning algorithms.
The brain has cells called neurons and the neurons pass electrical signals (information) from one neuron to another neuron across a small gap called a synapse. At the synapse, electrical signals are translated into chemical signals in order to cross the gap. Once on the other side, the signal becomes electrical again. 
Figure 1: Neurons communicating via the synapse 
An electroencephalogram (EEG) is a test, which is used to evaluate the electrical activity in the brain. As we mentioned above neurons communicate with each other using electrical impulses and EEG is used to detect potential problems associated with this electrical activity. During this procedure electrodes (small flat metal discs) are attached to the scalp with wires, these electrodes analyze the electrical activity in the brain and send signals to a computer that records the results. When EEG data is recorded it must be postprocessed and analyzed. 
EEG Electrodes are placed in a specific order. Most common electrodes placement map is called 10-20. It is called 10-20 because of the way distances between electrodes locations are measured. The distances between key anatomical regions are segmented at increments of 10% and 20% of their value, those are the points where the electrodes are placed. 
Figure 2: 10-20 Electrodes Placement. From 
The letters represent the different areas of the brain.
Fp= pre frontal
Figure 3 - 10-10 Electrode placement 
The brain consists of many different areas. Those different areas have different functions. The brain has 2 sides the left hemispere and the right hemisphere. The right side of the brain is the creative side and the left side of the brain is the analytical and methodical side. The brain also has 4 lobes which are : The Parietal lobe, The Temporal lobe, the Occipital lobe, and The Frontal lobe. The picture states those different functions and areas. We have found a study where MRI scans have shown abnormalities such as structure change in the left hemisphere and on the temporal lobe. There was a MRI test preformed on 16 male patients who had chronic schizophrenia and 15 healthy males.  The occipital lobe can also be affected by schizophrenia beacause if you have schizophrenia you can experience hallucinations and the occipital lobe takes care of vision, a study shows that people with schizophrenia have less gray matter in the occipital lobe than healthy people. 
Figure 4 Brain areas and functions 
According to the DSM-5  the main schizophrenia symptoms are:
Our first experience of data processing was with the data which we collected ourselves using the MindWave Mobile device , which has 2 electrodes the Fp1 electrode and the A1 electrode. We have done an experiment in which we were figuring out how alcohol affected brain activity. Experiments were done in the evenings one week apart at the same time (8 pm), patient was at relaxed state with eyes closed (to avoid blinking noise). We did 5 measurements 60 seconds each. On Figure 7 we plotted raw EEG signal, on the y-axis we have voltage in volts and along the x-axis - time in seconds (in Excel). Blue is the signal after alcohol consumption and black is without alcohol consumption. We can see some visual difference in a signal, on the top graph we can see some low frequency wave which is propagating through all signal, we do not observe that on the bottom graph. To see that better we filtered out Delta waves using LabVIEW code (Figure 8).
Figure 7 Test with alcohol (blue) and without alcohol (black) (closed eyes to avoid muscle noise)
This simple code allows us to separate all types of waves and study them separately (we used same code for our last project related to attention levels study):
Figure 8: LabVIEW code to filter brain waves from EEG
From Figure 9 we can see that the Delta waves of the signal taken from after alcohol comsumption has a higher amplitude than when the patient had no alcohol at all. We were cutting out data from LabVIEW and exporting to Excel.
Figure 9: Delta waves filtered from EEG using the code (blue - after alcohol onsumption, black - without alcohol)
We did a research and found a study  where this mechanism was explained. This effect of increased delta waves is hypothetically related to a major inhibitory neurotransmitter called GABA (Gamma-aminobutyric acid) which regulates sleep. Alcohol might mimic or stimulate its activity and lead to sleepy condition and thus increase of delta waves amplitude. It is interesting to note that once alcohol effect is over, the brain starts to produce alpha waves which are stimulating awake condition, therefore it is easier to fall asleep after alcohol consumption but quality of the sleep will be much worse than in a case without alcohol.
The comparison of these two signals was easy for two reasons: first, the increased amplitude of delta waves was so obvious that we can even observe it in a raw signal and second, we compared EEG from the same person. When we tried to compare EEG signals from different people we faced a problem of proper comparison. Different brains work differently and difference in amplitude of different waves can be explained with different anatomy of the brain. Also in case of schizophrenia diagnostics the difference in EEG signal is not that obvious and it is impossible to see the difference just filtering the signal.
We were going to use LabVIEW (we have access to education license) for signal analysis since we already had some experience with it (we used it for our Lego robotics projects) but we came into a problem of signal comparison. We could not find a way to compare EEG signals within our knowledge of math and LabVIEW. We did a research and found Python code which was created for this purpose of EEG signal comparison . We contacted the author (Dr. Raphael Vallat) of the code and had a brief discussion on EEG signal analysis, he gave us some advice of how to remove the noise from the signal and how to compare EEG signals using his code. The main idea is to take power in a range of frequencies and divide it by the total power in all brain waves. The same advice we got from Dr Borisov. He also advised us not to use power from all frequencies since signal can contain electric noise (50 or 60Hz) but to use the range from 1Hz to 30Hz, and that elminates a lot of problems (low sample rate, electric noise, low-frequency drift noise etc.)
Doctors and scientists contacted:
Dr. Donald Addington (University of Calgary) ----> got answer, agreed to listen to our presentation, sent valuable information.
Dr. Donald Addington is a professor in the Department of Psychiatry, a member of the Mathison Centre for Mental Health Research and Education and the Hotchkiss Brain Institute. He is active in research, education, clinical practice and administration. His research activities focus on access and quality of mental health services with a particular focus on schizophrenia and early psychosis intervention.
Dr Phil Tibbo ----> got answer, sent valuable information
Dr. Phil Tibbo is a Professor in the Department of Psychiatry with a cross-appointment in Psychology at Dalhousie University and an Adjunct Professor in Department of Psychiatry at the University of Alberta. He is also Director of the Nova Scotia Early Psychosis Program (NSEPP) and co-director of the Nova Scotia Psychosis Research Unit (NSPRU).
Dr. Paolo Federico (University of Calgary) ----> got answer, sent valuable information (EEG).
Dr. Federico has an active research program focusing on functional and structural imaging of epilepsy directed at understanding how focal seizures are generated and how they affect the brain. His imaging studies include the study of cortical and subcortical circuits underlying the generation of interictal discharges, functional MRI analysis of the pre-ictal state, language and motor reorganization in focal epilepsy, and seizure-related structural brain changes using T2 relaxometry. He also has an interest in advanced EEG analytical techniques, including the study of high frequency oscillations in humans and animal models of epilepsy.
Talked to: Segey Borisov (Moscow State University) ---> received a lot of valuable information on EEG, explained data structure at http://brain.bio.msu.ru/eeg_schizophrenia.htm
Research Interests: Theoretical and experimental aspects of electroencephalography and, in particular, EEG alpha-activity structure and functions.
Contacted Dr Raphael Vallat ---> got extremely important information about EEG signal comparison and Python code
Contacted without result:
Dr Quentin J. Pittman (UofC) ---> referred to Dr. Federico.
Dr.Olejarczyk (Poland) ----> tried to get more data and asked questions on EEG (no answer)
Dr. Goghari (UofT) -----> tried to get more data and asked questions on EEG (no answer)
Dr. Ahmed (UofA) -----> tried to get more data and asked questions on EEG (no answer)
Dr. Johannesen (Yale University) -----> tried to get more data and asked questions on EEG (no answer)
Data sets from open sources:
14 effected by paranoid schizophrenia, 14 healthy. Sample rate: 250 Hz (around 15 minutes each set)
More detailed information about the data: Data set includes 14 patients (7 males: 27.9 ± 3.3 years, 7 females: 28.3 ± 4.1 years) with paranoid schizophrenia, who were hospitalized at the Institute of Psychiatry and Neurology in Warsaw, Poland, and 14 healthy controls (7 males: 26.8 ± 2.9, 7 females: 28.7 ± 3.4 years). The patients met International Classification of Diseases ICD–10 criteria for paranoid schizophrenia (category F20.0).
45 effected by schizophrenia, 39 healthy. Sample rate: 128 Hz (1 minute each set)
We are also using two EEG data sets which we obtained ourselves using MindWave Mobile device. For signal analysis demonstration purposes.
In total we have 112 sets of data. Which is not bad for this type of study.
More detailed information about the data: Data set includes EEGs of 45 male adolescents diagnosed with schizophrenia (10 years 8 months – 14 years) and 39 EEGs of healthy male adolescents (11 years to 13 years and 9 months). All measurements were done at calm wakefulness state with eyes closed.
Below you can see distribution of sample sizes across 184 studies in this EEG review: https://www.frontiersin.org/articles/10.3389/fnhum.2018.00521/full
This review includes following studies: depression, bipolar disorder, addiction, autism, ADHD, anxiety, panic disorder, obsessive compulsive disorder (OCD), post-traumatic stress disorder (PTSD) and schizophrenia
European Data Format (EDF) is a standard file format designed for exchange and storage of medical time series. . We are using MNE Python to convert a .edf file to a .csv file (comma-separated values file). MNE is an open source python software for exploring, visualizing, and analyzing human neurophysiological data such as: MEG, EEG, sEEG, ECOG, etc;
Sources of noise:
Muscle movement (Chewing jaw movement, Blinking), Electric noise,
The first source of data contains a lot of noise. We noticed 50Hz noise, data was collected in Poland, we checked Poland operates on a 230V supply voltage and 50Hz. Also we noticed a lot of "blinking" noise. Second set of data was already post-processed. We managed to contact to researcher who collected this data (Dr. Sergey Borisov). He explained that data is stored in txt format and arranged in one column. Also data is reported not in volts, but in some units which was used by the clinic where data was collected.
We are using three Python codes to process our data. We started our analysis from first set of data ( https://repod.icm.edu.pl/dataset.xhtml?persistentId=doi:10.18150/repod.0107441)
First, we open raw data file in edf format using mne Python package. Just using couple of functions and couple of lines of code we can plot data from all channels as well as power spectral density. That's how it looks like:
Figure 1. Example of raw data from EEG channels
Figure 2. Power Spectral Density
We can see from Figure 2 10Hz activity which corresponds to Alpha waves (healthy, awake while resting with the eyes closed). We can observe blinking noise on frontal electrodes. Also, we can see a spike on 50Hz which corresponds to electric noise. Unfortunately proper filtering of signal needs much more knowledge of mathematics and programming than we have. That is why to avoid the noise we were searching electrodes and samples of data without abnormal spikes. Also, we are concentrated our attention on Delta waves and chose the range of interest within 1 to 30 Hz to avoid 50Hz noise.
As we mentioned before it is extremely challenging to compare two EEG signals. To compare EEG signals we are calculating so- called relative spectral density. We calculate power which is contained in the delta range and divide it by the power contained in the range of 1 to 30Hz. That number how much energy is stored in Delta waves of a patient relative to the fixed range of frequencies. We managed to find a python code to do this operation. We just needed to adjust it a little bit for our data files. On Figure 3 we can see an output of this code from one of our data files in the same series of data:
Figure 3. Power Spectral Density. Highlighted area shows power in the delta waves range.
This code calculates area under the spectral density curve in any range which we need. So, we calculate the area in delta waves range (1-4Hz) and then calculate area in the range of 1-30Hz and divide.
Relative spectral power density = (delta waves power, range 1-4Hz)/(total power in the range of 1-30Hz)
We couldn't optimize the process using the cycle and we had to repeat procedure of such calculation manually for each temporal electrode. The results of these calculations are shown in Tables 1 and 2 (done in Excel).
Table 1. Relative spectral power density (patients effected by schizophrenia)
Table 2. Relative spectral power density (Healthy)
As we can see it is still not clear if delta waves activity is higher in case of patients effected by schesophrenia. That is why we had to test statistical hypothesis to compare mean sample values of relative spectral power densities (we found procedure for such hypothesis testing in Triola's Elementary Statistics):
Claim: Average relative delta waves activity from temporal electrodes of people effected by schizophrenia is less than average delta waves activity recorded in case of healthy people.
H0: μ1≥μ2; H1: μ1<μ2 (claim), where μ1 is average delta waves activity of people effected by schizophrenia from temporal electrodes and μ2 is average delta waves activity of healthy people.
Sample sizes: n1 = 56 (healthy), n2 = 56 (schizophrenic) as a sample size we count all data values from temporal electrodes (14*4 = 56)
Significance level: α = 0.05
According to Triola's book we need to use t- distribution to test this hypothesis. We used this formula to calculate test statistic (where μ1 - μ2 is taken as zero):
t = -1.83
After that we calculated p-value using t-distribution function in Excel p-value = 0.036. Since p-value is less than significance level - we reject null-hypothesis. That means we can conclude at significance level of 0.05 (with the confidence of 95%) that average relative delta waves activity from temporal electrodes of people effected by schizophrenia is less than average delta waves activity recorded in case of healthy people.
So far that's the only result we managed to obtain and it took us almost 2 years. We realize that we are still far from creating a solid schizophrenia diagnostics tool based on EEG data. It's obvious that comparing mean is not enough, we need to perform much more complex analysis. Unfortunately at this point of time we are far from that and do not have appropriate mathematical background. Also noise is a great concern, even though we tried to avoid it we are still not sure if we managed to do it. Proper noise clearning is a problem, even blinks are not that easy to remove a lot of papers were written just in the field of blinking removal. The problem appeared much more complicated than we expected but we did first step in the direction of solution.
1. Process second set of data which we have. Unfortunately we didn't have time to process second data set.
2. Investigate more MNE tools and try to apply them to solution of our problem
3. Continue working on math and programming background since at this point our knowledge is not enough to move effectively further
4. Study more signal analysis and data science. Our goal is to apply machine learning algorythms to solution of this problem
 An MRI study of temporal lobe abnormalities and negative symptoms in chronic schizophrenia https://doi.org/10.1016/s0920-9964(01)00372-3
 Alterations of the occipital lobe in schizophrenia https://doi.org/10.17712/nsj.2015.3.20140757
Dr. Donald Addington for helping us with the DSM criterea and explaining more about schizophrenia diagnostics.
Dr. Raphael Vallat for the python code for EEG data comparison and discussion about brain waves.
Dr. Phil Tibbo for giving us information about schizophrenia diagnostics
Dr. Paolo Fredrico for giving us a better understanding of EEG measurements.