CYSF

Presentation
Problem
Method
Analysis
Conclusion
Citations
Acknowledgement
Attachments

Precision Allocation of GLP1-RA and SGLT2i in Cardiac Rehabilitation

Constructed 3 medication variables from medication records in cardiac rehabilitation registry of over 550 patients with ASCVD and T2D.

Khalid Kassam

Westmount Mid/High School

Grade 11

Presentation

Not working? Open in a new tab.

Problem

Atherosclerotic cardiovascular disease (ASCVD) and Type 2 diabetes (T2D) form a high-risk combination, carrying an elevated burden of cardiovascular events. Currently, there is specific emphasis on two drug classes that have demonstrated significant benefits: GLP-1 receptor agonists (GLP-1RAs) and SGLT2 inhibitors (SGLT2is). GLP-1RA’s have demonstrated reductions in major cardiovascular events, weight loss, and glycemic control improvements. SGLT2 inhibitors have been helpful in the reductions in heart failure hospitalization, cardiovascular death, and renal decline. The combination of both agents is increasingly recognized as a strong therapy in patients with concurrent ASCVD and T2D. Despite this evidence, real world uptake of these medications remains suboptimal. Contains longitudinal medication record for each patient, captured across up to 14 discrete medication documentation events (Med1 through Med14). Each medication record contains a date and binary flags indicating whether the patient was taking a GLP-1RA, an SGLT2i, and whether the visit was linked to a cardiovascular event. However, these summary flags present a significant data integrity challenge. Because the registry is maintained through manual data entry by clinical staff, discrepancies can arise between the summary flags and the underlying Med1–Med14 records. A summary flag may be set based on a recent appointment that does not correspond to the intended clinical timepoint. For example, the "current" medication flag may reflect a medication list from a date well after the 12-week EST, or a patient's intake flag may have been updated retroactively. Furthermore, patients who undergo multiple exercise stress tests introduce ambiguity into which assessment constitutes the "first" 12-week EST. These issues create a fundamental tension: the summary flags are convenient for quick analysis, but they may not accurately reflect medication status at the specific clinical timepoints required for rigorous research. What was needed was a method to independently derive medication status at each timepoint directly from the longitudinal Med1–Med14 records, with transparent quality controls and a full audit trail. Prior to this work, there was no standardized, automated method to determine a patient's GLP-1RA or SGLT2i status at a specific clinical date using the underlying medication records. People relied on the pre computed summary flags, which, as described above, are subject to human error and may not correspond to the intended timepoints. This created 3 sub problems:

no reliable variable capturing medication status at the time of the initial exercise stress test the true baseline for cardiac rehabilitation. The existing intake flag (isGlpq1Intake) is based on a medication list that may not align precisely with the EST date.
no mechanism to ensure that the 12-week medication outcome variable reflected the patient's first 12-week EST. Patients who underwent repeat assessments, with Date12WeekEst falling more than 150 days after the initial EST, could have their outcome contaminated by later clinical encounters.
for patients who did not complete the 12-week program, there was no variable capturing their most recentmedication status at the time they left. The existing current medication flag reflects the most recent data query date, not the separation date, and therefore does not represent the patient's medication status when they actually exited the program.

Method

Methods

First! Data was extracted from the cardiac rehab dataset, which is maintained as an Excel workbook on an institutional server. The dataset included all enrolled patients with confirmed ASCVD and Type 2 diabetes. Patient records included demographics (age, sex), clinical characteristics (BMI, lipid panel, blood pressure, peak MET, PHQ-8, GAD-7), program participation data (referral date, initial EST date, 12-week EST date, program status, separation status and date), and longitudinal medication records (Med1 through Med14), each containing a documentation date and binary flags for GLP-1RA use, SGLT2i use, and cardiovascular linkage.

Work was done with Python, VS code x Jupyter Notebook environment.

Variables

Variable 1 — Medication Status at Initial Exercise Stress Test (Baseline) This variable captures whether a patient was taking a GLP-1RA or SGLT2i at the time of their initial exercise stress test. This represents the true baseline medication status upon cardiac rehabilitation entry. The target date for matching was the value in DateInitialEST. Variable 2 — Medication Status at First 12-Week Exercise Stress Test (Outcome) This variable captures medication status at the patient's first 12-week exercise stress test and is intended as the primary outcome variable for program completers. To ensure that only the first 12-week EST was used, a filter was applied. Date12WeekEst was only accepted if it fell between 60 and 150 days after DateInitialEST. Patients whose Date12WeekEst was greater than 150 days from the initial EST were flagged as likely having undergone another assessment and were excluded from this variable to accomplish this variables goals. This filter identified and excluded 39 patients (13.4% of those with a recorded 12-week EST). Variable 3 — Medication Status at Program Separation (Dropouts) This variable uses the most recent medication status for patients who left the program before completing the 12-week assessment. It applies to patients who have a recorded SeparationDate without a valid 12-week EST. This variable identifies the most recentmedication record on or before the separation date. All of this was done to capture the last known medication status before the patient left the program, regardless of how far back that record falls. Priority 1 — Record on or before target date (preferred): scans all Med1–Med14 records and identifies those dated on or before the target date. Among these, it selects the one closest to the target date. If this record falls within 7 days of the target, the match is assigned high confidence. If it falls between 8 and 30 days prior, the match is assigned medium confidence. Priority 2 — Record within 7 days after target date (grace window): If no medication record exists on or before the target date, the algorithm checks for records falling within 7 days after the target. If found, the match is used but flagged with low_post_date confidence. This accounts for real-world documentation delays while maintaining transparency about the temporal relationship. No match: If no medication record exists within the acceptable window, the variable is set to missing (none) and the confidence flag indicates the reason, either no medication records exist for the patient at all (no_med_records), or records exist but none fall within a suitable timeframe (no_suitable_record). For variable 3 specifically, a different method was used. Instead of finding the closest record to the target date, it identifies the most recent (latest-dated) record on or before the separation date. This ensures the variable reflects the patient's last documented medication status before exiting the program.

5. Confidence and Quality Flags

Each medication variable was accompanied by four metadata columns: the medication status (True/False/None), the source Med record used (e.g., Med3), the number of days between the source record and the target date, and a confidence flag. The confidence flag system provides a transparent audit trail, allowing downstream analysts to filter by data quality or conduct sensitivity analyses excluding low-confidence matches.

Confidence Level	Definition
high	Medication record within 7 days on or before target date
medium	Medication record 8–30 days before target date
medium_old_record	Medication record more than 30 days before target date (Variable 3 only)
low_post_date	Medication record used from after the target date (within 7-day grace window)
rejected_repeat_est	Date12WeekEst exceeded 150 days from Initial EST (Variable 2 only)
no_med_records	No Med1–Med14 records exist for this patient
no_suitable_record	Med records exist but none fall within the acceptable matching window

A patient lookup function was developed to enable individual-level auditing of each variable. Given a patient name or unique identifier, the function displays all three variables, the source Med record used for each, the temporal distance from the target date, the confidence flag, and a complete chronological listing of all Med1–Med14 records. This tool was used to manually verify variable accuracy for a sample of patients across different program outcomes (completers, dropouts, and patients with low-confidence matches).

Second! Harnessing wearable data, I am designing and implementing a novel personalized, machine learning system with the capability to learn any given individual’s normal physiological baseline subsequent to tracking real time data from continuous heart‑rate and motion signals, then detects deviations indicative of high cardiac exertion and fatigue. This is especially important for this project’s underlying theme of cardiac rehab, which is helpful as an added support for cardiac rehab patients; while not limited to them. By developing a 1D autoencoder trained on time series heart rate and motion data, that is constantly learning x patient’s “normal” patterns and computes an anomaly score (reconstruction error) reflecting how much the current physiology deviates from the baseline. If the custom ESP32‑based wearable device works as intended, this model is trained on data collected from the device; alternatively, if the hardware is not fully operational temporarily, I will use existing ethics approved, and high quality research based clinical datasets to simulate the same personalized modelling.

Analysis

Over 550 patients, all with confirmed ASCVD and Type 2 diabetes. 538 had medication data available at the time of initial exercise stress test (baseline). 290 patients had a recorded 12-week EST date; of these 251 were retained as valid first 12-week assessments (falling 60–150 days after the initial EST), 39 were excluded as likely repeat assessments. The remaining patients who did not complete the 12-week program and had a recorded separation date contributed to the dropout cohort, with 360 patients having a most recent medication list available at time of discharge. Among 538 patients with medication data at program entry, 28 (5.2%) were taking a GLP-1RA, 113 (21.0%) were taking an SGLT2i, and 24 (4.5%) were taking both agents. This indicates that the majority of patients entering cardiac rehabilitation with ASCVD and T2DM had not yet been initiated on GLP-1RA therapy, despite guideline recommendations, though SGLT2i use was notably more common at baseline. Among 295 patients who attended a 12-week discharge appointment, 54 (18.3%) were taking a GLP-1RA, 215 (72.9%) were taking an SGLT2i, and 46 (15.6%) were taking both agents. To determine whether these changes were statistically significant, chi-squared tests were performed comparing the observed medication proportions at intake versus at 12 weeks against the expected proportions under the null hypothesis of no change. Among 360 patients with a most recent medication list at the time of discharge 64 (17.8%) were taking a GLP-1RA, 263 (73.1%) were taking an SGLT2i, and 55 (15.3%) were taking both agents. These figures are broadly consistent with the 12-week completer results, suggesting that medication optimization occurred across the program regardless of completion status. cardiac_rehab_executive_summary.txt \

Example summary: CARDIAC REHAB MEDICATION TRACKING ANALYSIS STUDY OVERVIEW This analysis examines medication usage patterns (specifically GLP 1 receptor agonists and SGLT2 inhibitors) among cardiac rehabilitation patients at three critical time points: baseline (Initial EST), 12 week follow up, and separation for early dropouts. DATASET SUMMARY Total Patients: 551 Study Period: 2024 01 03 to 2024 12 28 Patient Distribution: • Completed 12 week program: 290 (52.6%) • Separated early: 259 (47.0%) • Still active/other: 2 KEY FINDINGS

MEDICATION PREVALENCE AT BASELINE (Initial EST) GLP 1 Agonists: 77 patients (15.6%) SGLT2 Inhibitors: 322 patients (65.2%) Both medications: 60 patients Neither medication: 155 patients
MEDICATION CHANGES (Initial to 12 Week) Patients with both timepoints: 251 Medication Initiation (among completers): • Started GLP 1: 9 patients • Started SGLT2: 26 patients Medication Discontinuation: • Stopped GLP 1: 4 patients • Stopped SGLT2: 8 patients
DATA QUALITY ASSESSMENT Overall Data Quality Score: 94.0% Temporal Consistency: 83.7% pass rate Medication Flag Consistency: 13.8% average pass rate Confidence Distribution for New Variables: • High confidence: 789 records • Medium confidence: 80 records • Low confidence: 2 records

VARIABLES CREATED This analysis created 24 new variables (8 for each timepoint): Variable Set 1: Medications at Initial EST (Baseline) • Var1_GLP1_AtInitialEST, Var1_SGLT2_AtInitialEST • Plus source, days_diff, and confidence metrics Variable Set 2: Medications at 12 Week EST (Outcome) • Var2_GLP1_At12WeekEST, Var2_SGLT2_At12WeekEST • Plus source, days_diff, and confidence metrics Variable Set 3: Medications at Separation (Dropouts) • Var3_GLP1_AtSeparation, Var3_SGLT2_AtSeparation • Plus source, days_diff, and confidence metrics METHODOLOGY Medication status was determined by finding the closest Med1 14 record to each target date, with the following priority:

Records on or before the target date (preferred)
Records up to 7 days after target date (if no prior record exists)
Confidence flags indicate data quality for each match

CLINICAL IMPLICATIONS

Mean BMI: 28.2 • Mean Peak MET: 6.5 • Program completion rate: 52.6%

RECOMMENDATIONS

High quality medication tracking achieved for 88.6% of baseline records
Consider prospective medication tracking at standardized timepoints
Data quality checks identify 410 records needing review
Medication changes observed in 35 patients warrant further analysis

Report Generated: 2026 01 22 13:37:11 \

Results Summary

Timepoint	N	GLP-1RA	SGLT2i	Both
CR Entry (Baseline)	538	28 (5.2%)	113 (21.0%)	24 (4.5%)
12-Week EST	295	54 (18.3%)	215 (72.9%)	46 (15.6%)
Discharge	360	64 (17.8%)	263 (73.1%)	55 (15.3%)

GLP-1RA increase (intake → 12-week): χ² = 54.2, p < 0.001
SGLT2i increase (intake → 12-week): χ² = 27.8, p < 0.001
Variable construction completed: January 22, 2026

As the second part of this project, I am able to utilize the wearable data to evaluate how well the anomaly score tracks high cardiac exertion and fatigue by comparing it to labeled activity periods (rest, exercise, sleep) and fatigue based ratings. Using statistical analysis and machine learning, I will analyze whether high anomaly score days correspond to days of intense exercise, prolonged stress, or self‑reported high fatigue. If there is a strong correlation, then the wearable and my model aren’t just tracking heart rate, but rather how how your body responds to cardiac stress.

Conclusion

conclusions

Cardiac rehab is an effective setting for GLP-1RA and SGLT2i optimization in patients with ASCVD and T2D
GLP-1RA use increased 3.5-fold during the program (5.2% → 18.3%)
SGLT2i use increased 3.5-fold during the program (21.0% → 72.9%)
The new variables provide a reliable, independently verified method for tracking medication status at specific clinical timepoints
Confidence flags allow quality control and sensitivity analysis
All 3 variables (baseline, 12-week outcome, dropout separation) completed and validated

This project addressed a major challenge in cardiac rehabilitation research: the need to reliably determine medication status at specific clinical timepoints from longitudinal, human-entered medication records. Result is the 3 time specific medication variables at initial exercise stress test, first 12-week exercise stress test, and program separation. This work was able to derive time-specific medication variables from longitudinal records in registry-based studies. The temporal matching algorithm, confidence flagging system, and individual patient verification tool provide a model that can be adapted to other medications, clinical timepoints, or patient populations. The built-in data quality validation also serves an ongoing operational function: by independently checking medication status against the underlying records, the project can identify subtle discrepancies in the existing summary flags, supporting continuous improvement in data entry accuracy. This project has significant potential to work in the scheme of cardiac rehab, and when coupled with my wearable introduction, the sky is the limit.

The connection between the variable work and the wearables?

The actual data and variables help understand population‑level patterns: How often are patients on GLP‑1s or SGLT‑2is at key timepoints in rehab?
The wearable helps us understand individual‑level patterns: When your heart rate, motion, and recovery patterns look abnormal, how do you feel, and might that relate to your risk?

As next steps, the wearable and model, could be used to guide safe exercise according to intensity for patients with ASCVD and T2D. Secondly, flag when someone is over‑exerting themselves, which could be important for early decompensation detection. Finally, provide a personal baseline that doctors can use to compare against future episodes.

Citations

Al-Nafjan, A., Aljuhani, A., Alshebel, A., Alharbi, A., & Alshehri, A. (2025, September 24). Artificial Intelligence in predictive healthcare: A systematic review. Journal of clinical medicine. https://pmc.ncbi.nlm.nih.gov/articles/PMC12525484/

Built In. (2022). Train-test split explained. https://builtin.com/data-science/train-test-split

Canadian Cardiovascular Society 2023 guidelines on the Fitness to drive - canadian journal of cardiology. (n.d.-a). https://onlinecjc.ca/article/S0828-282X(23)01755-5/fulltext

Cleveland Clinic. (n.d.). Cardiology heart risk calculators. https://my.clevelandclinic.org/health/articles/17085-heart-risk-factor-calculators

Heart disease. UCI Machine Learning Repository. (n.d.). https://archive.ics.uci.edu/dataset/45/heart+disease

Home. OARC Stats. (n.d.-a). https://stats.oarc.ucla.edu/other/mult-pkg/whatstat/what-is-the-difference-between-categorical-ordinal-and-interval-variables/

Hongn, A., Bosch, F., Prado, L., & Bonomini, P. (2025, June 24). Wearable device dataset from induced stress and structured exercise sessions. PhysioNet. https://physionet.org/content/wearable-device-dataset/1.0.1/#files-panel

I reinforcement learning: An introduction second edition, in progress. (n.d.-a). https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

Ijones. (2021, April 12). 8. the chi squared tests: The BMJ. The BMJ | The BMJ: leading general medical journal. Research. Education. Comment. https://bmj-chicken.bmj.com/thebmj/about-bmj/resources-readers/publications/statistics-square-one/8-chi-squared-tests

Landajuela. (n.d.). Cardiac machine learning resources. GitHub. https://github.com/landajuela/cardiac_ml

SCITEPRESS. (2022). ML-based cardiac diagnostics paper. https://www.scitepress.org/Papers/2022/110883/110883.pdf

Sokas, D., Butkuvienė, M., Tamulevičiūtė-Prascienė, E., Beigienė, A., Kubilius, R., Petrėnas, A., & Paliakaitė, B. (2022, March 31). Wearable-based signals during physical exercises from patients with frailty after open-heart surgery. PhysioNet. https://physionet.org/content/wearable-exercise-frailty/1.0.0/

Total Cardiology TRUST Project Dataset - Private

Wagner, P., Strodthoff, N., Bousseljot, R.-D., Samek, W., & Schaeffter, T. (2020, April 24). PTB-XL, a large publicly available electrocardiography dataset. PhysioNet. https://physionet.org/content/ptb-xl/1.0.1/

Acknowledgement

Specifically, I would like to thank TotalCardiology, and Dr. Rhys Beaudry for their guidance, time and support throughout this project.

Finally, I want to acknowledge my mom for her significant support throughout my project, and acting as second voice on every decision of mine along the way. I loved the love and support that you gave me, to make my project better. Thank you Mom!

Technical tools: VS code, Python, Jupyter notebooks.

Attachments

View Log Book
(may download a file)