Let-7b: A Natural Answer to Cancer?
Ivan Raizada
Grade 8
Presentation
Problem
The problem that this study aims to solve is to find the best let-7 member for diagnosis of ccRCC. This biomarker will be evaluated based on the difference in Reads per Million miRNA Mapped (RPM). By finding this biomarker using direct samples from the kidney, ccRCC will be less of a problem and using this biomarker as a diagnosis can improve patients' prognosis.
Method
Acquiring the Data
To get the data, the Genomic Data Commons (GDC) was used. On it, the ccRCC data were downloaded. The miRNA levels of 557 patients were downloaded, 20 Normal tissue and 537 Cancerous, spread across Stage 1 to Stage 4. The following filters were used to filter only samples for the ccRCC:
- Experimental Strategy: miRNA-Seq
- Data Category: Transcriptome Profiling
- Access: Open
- Primary Site: Kidney
- Primary Diagnosis: clear cell adenocarcinoma (ccRCC)
- Data Type: miRNA Expression Quantification (We didn't need specific data on isoforms, using Expression Quantification over Isoforms would also conserve space. 50 kB vs. 286 kB)
- Tissue Type: tumour
From there, all the files available for ccRCC were downloaded by manually going to the Case, selecting miRNA Expression Quantification, and downloading the .zip file.
When opening the .zip file, the .txt file with the miRNA expression data is nested in two folders.
Once all Cancer files were downloaded, the primary diagnosis filter was removed and the tissue type filter was set to Normal. Then, only 20 Normal cases were downloaded, as SMOTE was later going to be used to oversample the data. This was done because there wasn't much variability between the cases.
The organization of the folders were as follows:
FULL_DATA
↳ Stages (Ex. Normal, Stage 1, Stage 2, etc.)
↳ Unzipped folder of the sample, renamed to CaseID
↳ Folder containing .txt, named after the FileUUID column
↳ .txt file with the expression of the miRNAs, name is FileName
Extracting the Data
This organization of files is necessary, as it allows for an ingenious way of structuring the spreadsheet, and acquiring data from the file like FileUUID and CaseID. When rgrep or grep -r obtains the instance from the file, it also gives the path it took. If the terminal was ‘at’ FULL_DATA, then the path would be Stage.CaseID.FileUUID.FileName.
For example, one instance that grep got is:
./Normal/TARGET-50-PAJNDU/3ecf64b6-69a3-45da-8019-06c6f7e28644/6ee8d860-91ba-49bb-9d2a-2bc7a1b88a7c.mirbase21.mirnas.quantification.txt:hsa-let-7a-1 34934 3420.235351 N
Here, the first element is the stage, the second is the CaseID, and so on. By renaming the files in this way, all the information on the file can be obtained with one command, making it easier to compile the data into spreadsheets and other data formats.
For each let-7 member, the text outputted from the command was copied and pasted into the Sublime text editor. This allowed us to format the text in a way that it could be copied and pasted into the spreadsheet. Once done, FileName, FileUUID, and CaseID columns were added due to the clever naming of the files. When the grep outputs text, it also gives the location in this format: Folder1.Folder2…file1
We named the Outer Folder by the CaseID, the Inner folder is defaulted to be named by the FileUUID and the name of the text file is the same as FileName.
Once the data was transferred from all the miRNA into the spreadsheet (each sheet is its miRNA), the analysis of the data began.
The spreadsheet showed the stage of the sample (Normal/Stage of Cancer). It also showed the CaseID and the ID of the person from which the sample was taken. Putting this into the GDC would give the user all the files related to the person, including the miRNA expression from the kidneys. Also, the FileUUID which, when put into the GDC portal, would give the user access to the file from which the data was taken. There is also the name of the file, and the URL of the file, from which it can be easily accessed.
Analysis of the Data
For every miRNA sheet, there were two graphs. One was a dot plot with every sample, which was RPM vs Stage. The other was a bar graph of the mean RPM vs. Stage. Using an independent two-sample t-test, the p-value for each miRNA was obtained and it was deduced that let-7b had the smallest p-value. After that, conclusions were made, and other results are in the Results section.
References
- GDC. https://portal.gdc.cancer.gov/analysis_page?app=CohortBuilder&tab=general. Accessed January 8, 2025.
- Google Sheets: Sign-in. https://docs.google.com/spreadsheets/u/0/?ec=wgc-sheets-[module]-goto&tgif=d.
- Google Sheets: KIRC Project Data: ALL. https://docs.google.com/spreadsheets/d/1OxaipNosENnGiObiOAnu9GqviznvuBl8RHOFlvrl7Ro/edit?gid=0#gid=0
Research
miRNAs are one of the many ways cells regulate proteins. The functioning unit of a miRNA consists of a complex called RNA-Induced Silencing Complex (RISC). This complex, consisting of Ago2, the coding miRNA, and other proteins, binds to the protein-coding mRNA and destroys it through a variety of mechanisms.
The biogenesis of miRNAs begins when the gene gets transcribed by RNA polymerase II into pri-miRNA. Then it gets processed by a protein called Drosha (an RNase III enzyme), along with a multitude of helper proteins helping Drosha (like DGCR8) and then gets exported into the cytoplasm by Exportin 5. Next, the protein Dicer (again with some helper proteins, Dicer is also an RNAse III enzyme) cleaves the ‘stem-loop’ of the pri-miRNA into a ~22nt double-stranded pre-miRNA (it is a dsRNA). Then, an Argonaute protein, Ago2, unravels the double-stranded (as it was in a double helix) and then the double strand splits into two single-stranded miRNAs (ssRNAs). One side (passenger strand) splits off and the coding strand stays on. The coding strand and the Ago2, with some other proteins, form the RISC complex. This complex is now ready to bind to target mRNAs.
First, the miRNA binds to the 3’ Untranslated Region (3’ UTR) of the mRNA that would become the target protein. By binding to the mRNA, it blocks the ribosome from accessing the mRNA, preventing translation. Another mechanism is deadenylation, where the complex recruits an enzyme called CCR4-NOT deadenylase, which removes the poly(A) tail (a tail of adenine) from the mRNA. This makes the mRNA more unstable, leading to its degradation. Lastly, the miRNA recruits decapping enzymes (like XRN1), which remove the 5’ cap of the mRNA, making it even more unstable. Additionally, the removal of the 5’ cap can signal to enzymes called 3’-5’ exonucleases to degrade the mRNA.1-2
In summary, all these methods work in tandem to fine-tune the translation of the target protein. Some proteins control the cell cycle, and thus, some miRNAs control the cell cycle. The miRNAs that promote the cell cycle, usually by repressing tumour suppressor proteins, are known as oncogenes. Conversely, miRNAs that regulate the cell cycle, often by repressing oncogene proteins, are known as tumour suppressors. The balance between these two types of regulators ensures a healthy cell; however, an imbalance can cause cancer.
Cancer occurs when the cells of the body replicate uncontrollably. This typically happens due to mutations in tumour suppressor genes, oncogenes, or apoptosis regulators (regulators of suicide of the cell). If mutations occur in all three of these types of genes, the organism develops cancer. When tumour suppressor genes become underexpressed, the cell cycle becomes uncontrolled, allowing cancer cells to progress. This leads to unchecked reproduction of cancer cells. This underexpression can occur due to histone modifications, which cause the tumour suppressor gene to be tightly wrapped around histone proteins in DNA, making them inaccessible to RNA polymerase. Another common reason is promoter hypermethylation, which occurs when methyl groups (CH3) attach to the cysteines in the promoter region of the tumour suppressor, making it inaccessible.
The ‘seeds’ of tumours, the cells that start it all, are known as cancer stem cells (CSCs). CSCs are a subset of tumour cells that progress through the cell cycle rapidly and also can differentiate into different variations within a tumour. This is known as ‘stemness’. It is a property that is similar to how stem cells behave, as they have rapid proliferation and can differentiate into different types of cells. Another property of cancer cells is ‘self-renewal’. This is the ability of CSCs to replicate and still maintain their stem-like state. This allows quick recovery from decreases in their population, resulting in increased tolerance to therapies. If a normal cell mutates into a CSC it can kickstart cancer, the spread of CSCs (through cell division) can lead to the invasion of tumours in places away from its starting point (metastasis).
Another cause of cancer is mutations in oncogenes. Mutations to oncogenes can make the cell cycle move forward, even when it should not and should have committed apoptosis. The mutations cause cells to multiply excessively and are commonly known as “cancer/tumour supporters.”
These mutations in the oncogenes can occur if specific regulators of the oncogene are disrupted due to mutations in their genetic code. Lastly, mutations in apoptosis regulators cause cells to evade programmed cell death. In healthy cells, the corruption of cellular machinery triggers apoptosis, where the cell self-destructs, and its components are recycled. However, when apoptosis regulators are damaged, the cell survives and continues causing harm.
A common example of cancer, which will be studied in this paper, is kidney clear cell adenocarcinoma. This cancer, known as KIRC or ccRCC (clear cell renal cell carcinoma), is one of the most common forms of kidney cancer. It is a subset of Renal Cell Carcinoma (RCC), the most common type of kidney cancer. KIRC accounts for 8 out of 10 RCC cases. It occurs when the cells of the kidney’s tubules, where blood is filtered, become cancerous. Interestingly, KIRC is more prevalent in men than in women. Smoking, obesity, and high blood pressure are all possible causes of KIRC. Additionally, KIRC is more common in families with a history of kidney cancer. I chose KIRC specifically because it has the most available data, ensuring that the data I extract will be the most accurate, and because it is one of the most prevalent cancers, affecting hundreds of thousands of people worldwide.3
The miRNA in this paper that is suspected to be related to KIRC is let-7. The let-7 gene codes for the miRNA family let-7. This gene was originally discovered in C. elegans (a type of roundworm). In C. elegans, the expression of let-7 determines the fate of adult cells in the worm. The let-7 gene is highly conserved, one of the most conserved miRNAs in the biosphere. It is present in a variety of organisms, including humans, fleas, mice, and worms. The fact that let-7 is so conserved reflects its importance in cellular and developmental phases.
Humans possess 10 mature let-7 miRNAs, let-7a, 7b, 7c, 7d, 7e, 7f, 7g, 7i and miR-98 and miR-202. These 10 miRNAs are derived from 13 precursor genes that are located on different chromosomes. The reason that miR-98 and -202 are named differently than the rest is that they were discovered in a cluster different from the let-7 clusters. Despite this, they still share the same seed sequence (nt 2 - 8) which means that they are functionally part of the let-7 cluster. Since initially scientists didn’t know that these miRNAs are part of the same family, they are named differently, and to not spark confusion, they are named the same today. Since they are outside the traditional clusters, they might be regulated differently. miR-202, for example, is involved in reproductive processes, unlike the tumour suppressor nature of the let-7 family, despite having the same seed sequence.4
The expression of let-7 is peculiar and unique. In embryonic stem cells (ESCs) they are silent. This is likely to promote the proliferation of the embryo. In later developmental stages, let-7 is reactivated to regulate differentiation. In adult cells, let-7 functions to maintain homeostasis and control proliferation. The uncontrolled expression of let-7 results in rapid proliferation, leading to fast growth in tumours. Unlike its unique expression, the biogenesis of let-7 is fairly normal. However, there are a lot of regulators (primarily negative regulators) that regulate let-7.
There are a lot of factors that negatively affect the expression of let-7. One of them is the Lin28 protein. Lin28A, for example, binds to pre-miRNA and recruits TUT4/7 and adds a uridine chain (not uracil, they’re different since uridine doesn’t have a phosphate chain and the way the ribose is attached is different). This blocks Dicer from binding, preventing the creation of the RISC for let-7. Meanwhile, Lin28B binds to pre-miRNA in the nucleus, again preventing the activation of let-7. Other regulators that negatively impact let-7 include lncRNA (ex. H19) which acts like a sponge, binding to let-7. This results in the prevention of let-7 interaction with target mRNAs. Another similar regulator is circRNAs (circular RNAs, ex. circHMCU) which also ‘sponges’ let-7. This ‘frees’ oncogenes that are being repressed by let-7. Another negative regulator of let-7 is a phenomenon called ‘promoter hypermethylation’. This is when the promoter part of the gene of miRNAs gets too many methyl groups (CH3) blocking RNA Pol II from attaching to the gene, effectively blocking the making of let-7. These are catalyzed by DNA Methyltransferases (DNMTs). DNMT1 maintains the methylation state, and DNMT3A&B add more methyl groups. The methyl groups are supplied from S-Adenosylmethionine (SAM). SAM is a methyl donor, perfect for this job. DNMT adds the methyl groups from SAM to the fifth carbon in the cytosine in CpG dinucleotides (a Cytosine followed by a Guanine), which are concentrated in the promoter regions.4-5
Let-7’s role in cancer is an important one, regulating oncogenes, suppresses CSC characteristics, and negatively regulates a lot of key biochemical pathways that result in stronger cancers. Firstly, let-7 regulates oncogenes. These include c-Myc (which regulates cell proliferation), K-Ras (which drives cancer cell survival and growth), and HMGA2 (which enhances the self-renewal of a cancer). Increased let-7 correlates to lower expression of these oncogenes resulting in a less severe tumour, however, reduced let-7 results in stronger tumours. Next, let-7 represses CSC characteristics. Let-7 inhibits stemness and self-renewal, tumour initiation and chemoresistance. Reduced let-7 expression is associated with increased presence of CSCs in breast, pancreatic and gastric cancers. Let-7 is also involved in the suppression of many tumour-supporting signalling pathways. In these signalling pathways, it acts as a tumour suppressor, suppressing the oncogenic pathways. This crucial role is necessary in the regulation of tumour growth and genesis.4
One of these pathways is the Wnt/β-catenin pathway which controls cell growth, division, and the maintenance of stem cells. This is often overactive in CSCs, which leads to uncontrolled growth. Let-7 inhibits this pathway by suppressing a key component, TCF-4 (a transcription factor) which turns on Wnt genes by binding to the promoter regions of those genes. It can only do this if β-catenin binds to it, let-7 indirectly inhibits β-catenin, resulting in the inhibition of transcription of the oncogene Wnt. Lin28B negatively regulates let-7, this directly correlates to the Wnt signal remaining active, leading to tumorigenesis. In ER+ breast cancer, it has been shown that let-7 increases sensitivity to tamoxifen, a therapy drug. Since let-7 blocks Wnt signalling, it prevents CSC-driven resistance to this drug, increasing effectiveness.
Another pathway that let-7 is involved in is the NOTCH pathway, and it controls cell communication and differentiation. If the pathway gets overactivated, it can lead to the cancer cells exhibiting the properties of stemness and self-renewal. Let-7 targets and suppresses the downstream effectors (molecules directly affected by NOTCH signalling), like HMGA2. HGMA2 is a protein that increases cancer cell plasticity (the ability of cancer to adapt to changes in its environment and still proliferate). The suppression of HGMA2 results in the decreased stemness of CSCs.
Thirdly, the Hedgehog pathway is essential for the development of tissues and the maintenance of stem cells. When it is overactive, it increases the survival chance of CSC, increases its growth, and its therapy resistance. The activation of Hedgehog results in reduced let-7 levels, allowing the previously stated pathways to flourish, thus allowing the previously mentioned CSC traits to become prominent. Hedgehog inhibitors (drugs that target this harmful pathway) increase the expression of let-7, which suppresses CSC characteristics, making cancer cells weaker and more responsive to therapies.
Next, let-7 has a critical role in the STAT3/NFκB pathway. STAT3 and NFκB are transcription factors that are activated by inflammation. They drive proliferation, and if overactive in cancer cells, result in the promotion of pro-inflammatory signals. This leads to tumour growth, resistance to therapy, and promotion of CSC traits. Pro-inflammatory signals (like IL-6 and -8) increase Lin28B expression. Since Lin28B represses let-7, the increased expression of Lin28B results in the increased suppression of let-7. Thus, this creates a loop, an overactive STAT3/NFκB results in increased pro-inflammatory signals which result in increased Lin28B which results in decreased let-7 expression. The decreased let-7 expression results in increased severity of tumours, and since there is less let-7, there will be less let-7 regulating the STAT3/NFκB pathway, restraining the lethal loop. If more let-7 is artificially added, it directly inhibits the pro-inflammatory signals (like cytokines), like IL-6 and -8 which breaks the inflammatory loop, reducing the severity of the tumour. This leads to reduced tumour-caused inflammation and limits CSC-driven tumour progression.
Cancer relies on a different metabolic pathway compared to regular cells. Let-7 blocks these metabolic pathways, stunting the growth of the tumour.
For example, CSCs do aerobic glycolysis (also known as the Warburg effect), which is when cells do anaerobic glycolysis, (creating lactate from the pyruvate made using the enzyme LDH) even though there is enough oxygen available (aerobic environment). This results in a much lower yield of ATP, 2 ATP in aerobic glycolysis compared to ~36 from aerobic respiration. Cancer cells make this sacrifice since aerobic glycolysis is much faster than doing the entire cellular respiration, allowing for faster growth. Aerobic glycolysis also makes the cancer more resistant to a low-oxygen environment, as its method of making energy doesn’t require the presence of oxygen. Let-7 combats this by targeting PDK1 (pyruvate dehydrogenase kinase 1), an enzyme which promotes glycolysis by ‘turning off’ PDC (pyruvate dehydrogenase complex). This results in the pyruvate produced in glycolysis not being turned into acetyl-CoA, thus turning off the Citric acid cycle and aerobic respiration as a whole. Let-7 suppresses this enzyme, resulting in the blocking of aerobic glycolysis and the continuation of regular cellular respiration. This reduces the rapid growth of the tumour and reduces its plasticity (flexibleness) in regards to oxygen levels.
Secondly, let-7 disrupts the synthesis of fatty acids (FAS). Cancer cells do FAS since they are rapidly proliferating, and they need the fatty acid to make cellular membranes to make new cells. Without FAS, cancer wouldn’t be able to sustain high proliferation. Cancer cells also use fatty acids as an energy reservoir, as fatty acids can be broken down to form ATP (4 ATP indirectly synthesized from two carbons from the fatty acid). They use this method mainly when glucose is scarce, increasing their plasticity and strength. Fatty acids are also precursors to signalling molecules (like prostaglandins and leukotrienes), which both promote tumour growth. These signalling pathways also contribute to the stemness and resistance to therapies in CSCs. Let-7 targets SREBP-1, a key regulator of FAS. The inhibition of FAS in CSCs has been shown to make them more sensitive to treatments and reduce their growth.
Lastly, Let-7 also inhibits EMT (Epithelial-to-Mesenchymal Transition). This is where cells move from an epithelial-like state (skin tissue), such as cell adhesion, to a mesenchymal-like state (connective tissue), such as mobility and invasiveness. If an EMT occurs, the tumour becomes more invasive and harmful. Let-7 suppresses the expression of EMT transcription factors like snail, twist, (which promote EMT) and vimentin (a mesenchymal marker). Let-7 also increases epithelial markers (like E-cadherin). These regulations keep cancer in an epithelial, less invasive, state. In addition, let-7 targets the mRNAs of key stemness transcription factors, like SOX2, OCT4, and NANOG. These transcription factors are essential for maintaining properties of cancer stem cells (CSCs) like self-renewal, which results in resistance to therapies. In simpler terms, they are transcription factors that allow cancer cells to sustain traits similar to stem cells. These two key elements of tumours that let-7 inhibit, resulting in a decreased ability of cancer cells to migrate, invade, and start new tumours.4-5
In summary, let-7 is a class of invaluable miRNAs which control the severity of cancer in countless ways. Let-7 accomplishes this by inhibiting the characteristics of the ‘seeds of cancer’, making sure that cancer doesn’t start in the first place. It then tries to make the cancer less severe and invasive by inhibiting EMT. All the while inhibiting major oncogenes like c-Myc, K-Ras and HMGA2. It also disrupts and inhibits countless signalling pathways, reducing the magnitude of the tumour. It even disrupts the metabolic processes of cancer, like its method of getting energy and storing energy. In conclusion, let-7 is a major tumour suppressor, responsible for the blocking and regulation of countless tumours, saving lives.
References
- Davis-Dusenbery BN, Hata A. MicroRNA in cancer: The involvement of aberrant microRNA biogenesis regulatory pathways. Genes Cancer. 2010;1(11):1100-1114. doi:10.1177/1947601910396213.
- Gowrishankar B, et al. MicroRNA expression signatures of stage, grade, and progression in clear cell RCC. Cancer Biol Ther. 2013;15(3):329-341. doi:10.4161/cbt.27314.
- National Cancer Institute. Clear cell renal cell carcinoma. Published March 17, 2020. Accessed February 1, 2025. https://www.cancer.gov/pediatric-adult-rare-tumor/rare-tumors/rare-kidney-tumors/clear-cell-renal-cell-carcinoma.
- Ma Y, et al. The roles of the let-7 family of microRNAs in the regulation of cancer stemness. Cells. 2021;10(9):2415. doi:10.3390/cells1009241
- Barh D, et al. Microrna let-7: An emerging next-generation cancer therapeutic. Curr Oncol. 2010;17(1):70-80. doi:10.3747/co.v17i1.356.
- Pandey J, Syed W. Renal cancer. StatPearls - NCBI Bookshelf. Published October 4, 2024. Accessed February 1, 2025. https://www.ncbi.nlm.nih.gov/books/NBK558975/#:~:text=Renal%20cell%20carcinoma%20(RCC)%20is,percent%20of%20all%20renal%20malignancies.
- He X, et al. Circulating exosomal mRNA signatures for the early diagnosis of clear cell renal cell carcinoma. BMC Med. 2022;20(1):232. doi:10.1186/s12916-022-02467-1.
- Smolarz B, et al. miRNAs in cancer (review of literature). Int J Mol Sci. 2022;23(5):2805. doi:10.3390/ijms23052805.
- Pokrovenko DA, et al. MicroRNA let-7: A promising non-invasive biomarker for diagnosing and treating external genital endometriosis. J Turk Soc Obstet Gynecol. 2021;18(4):291-297. doi:10.4274/tjod.galenos.2021.07277.
- Osada H, Takahashi T. Let‐7 and miR‐17‐92: Small‐sized major players in lung cancer development. Cancer Sci. 2010;102(1):9-17. doi:10.1111/j.1349-7006.2010.01707.x.
- Park S, et al. MiR-9, miR-21, and miR-155 as potential biomarkers for HPV positive and negative cervical cancer. BMC Cancer. 2017;17(1):701. doi:10.1186/s12885-017-3642-5.
- Torrisani NJ, et al. Enjoy the silence: The story of let-7 microRNA and cancer. Curr Genomics. 2007;8(4):229-233. doi:10.2174/138920207781386933.
- Gilles ME, Slack FJ. Let-7 microRNA as a potential therapeutic target with implications for immunotherapy. Expert Opin Ther Targets. 2018;22(11):929-939. doi:10.1080/14728222.2018.1535594.
- Shell SC, et al. Let-7 expression defines two differentiation stages of cancer. Proc Natl Acad Sci U S A. 2007;104(27):11400-11405. doi:10.1073/pnas.0704372104.
- Fedorko M, et al. Detection of let-7 miRNAs in urine supernatant as potential diagnostic approach in non-metastatic clear-cell renal cell carcinoma. Biochem Med (Zagreb). 2017;27(2):411-417. doi:10.11613/bm.2017.043.
- Cochetti G, et al. Detection of urinary miRNAs for diagnosis of clear cell renal cell carcinoma. Sci Rep. 2020;10(1):21407. doi:10.1038/s41598-020-77774-9.
Data
To determine if let-7 could be used as a biomarker for cancer, or more specifically, which member of let-7 would be most useful for classifying tissues, the difference in expression between Normal and Cancer tissues needs to be known. The bigger the difference in expression, i.e., the smaller the p-value (how statistically significant the difference is), the clearer the difference between normal and cancer tissues, enabling our SVM model to achieve a higher accuracy, making it a more promising alternative for diagnosing.
The samples' expression was measured in Reads per Million miRNAs mapped (RPM). This measure measures the raw number of times a particular miRNA is detected, adjusted for the total number of reads in the sample (in this case, 100 million or more).
RPM =Raw Read CountNumber of Reads (in millions)106 (Eq.1)
The spreadsheet displays the raw read count and the patient's RPM in individual columns. For each stage, the mean is also displayed in MEAN_STAGE. See Figure 1 for the columns of the sheet.
Figure 1: Screenshot of one of the tables, displaying the columns used. FileURL is also displayed, and this is the link to the website on the GDC from which it was downloaded. Stage displayed whether the tissue sample is Normal or Cancerous, CaseID gives the ID of the person from which it was taken, searchable. FileUUID is the ID of the file, and it is also searchable. FileName gives the name of the file, for double-checking. Read_count and RPM are information from the file, and MEAN_STAGE is the mean of all RPM values for the stage.
From this data on individual people, the means between the Normal and the Cancer were graphed, and it was concluded that let-7b had the largest difference between Normal expression and Cancer expression. This was further reinforced by the fact that it had the lowest p-value (1.18 × 10^-7) (two-sample independent t-test), and thus would be most useful as a biomarker.
Let-7 miRNA name |
Mean Normal RPM |
Mean Cancer RPM |
hsa-let-7a-1 |
6250.636071 |
8432.59372075 |
hsa-let-7a-2 |
6242.89712 |
8422.4537895 |
hsa-let-7a-3 |
6291.272781 |
8475.590666 |
hsa-let-7a mean |
6261.601991 |
8443.5460055 |
hsa-let-7b |
5346.257558 |
14112.522715 |
hsa-let-7c |
4194.958516 |
2009.2517075 |
hsa-let-7d |
543.0560768 |
568.057907125 |
hsa-let-7e |
1961.439444 |
1218.57812075 |
hsa-let-7f-1 |
2916.348039 |
4,737.340283 |
hsa-let-7f-2 |
2981.74886 |
4,818.15606025 |
hsa-let-7g |
411.54179 |
641.166637425 |
hsa-let-7i |
252.5371283 |
507.634797675 |
Table 1: This table displays the mean values for Normal and Cancer RPM for each let-7 miRNA. It can be seen that let-7b had the biggest difference, while the rest showed an increase, slight decrease, or relatively no change. The highlighted column (let-7b) displays the highest change, resulting in its suspected value as a biomarker.
Other members of the let-7 family, like let-7a, showed a slight increase. Let-7c showed a drastic decrease, from ~4000 RPM in Normal to ~2000 RPM in Cancer; however, this was usually due to three huge outliers with ~ 20000 RPM. Let-7d saw relatively no change, and let-7e saw a statistically significant decrease. Let-7f, -7g, and -7i saw a statistically significant increase, but nothing as drastic as let-7b. See Table 1 for the means of each miRNA.
Along with the detailed sheet on the individual samples, a dot plot of the RPM expression versus sample and a bar graph of the means of each stage were present. These allow for easy and quick grasping of data.
Figure 2: This shows the box plot of the mean RPM to the stage. As can be seen, Normal tissues have small variability, while cancerous tissues have a large variability in RPM, some skyrocketing to 40000+ RPM or as low as 2000 RPM.
For the most drastic and useful biomarker out of the other let-7 members, let-7b, the average expression of normal tissues was 5346.257558 RPM and the average expression of cancer tissues was 14,067.522715 (See Figure 2 for Box Plot).
For the raw data, this is the hyperlink for only let-7b:Let-7b ALL, this is the hyperlink for the entire sheet of all let–7 members.: KIRC Project Data: ALL).
The p-value was very low compared to the threshold, it suggested that there is a statistically significant difference.
This high increase in expression and low p-value show that the expression of let-7b can be used to create a model to classify tissue as cancerous based on the RPM of let-7b. To demonstrate this, a Python SVM model was coded to take the input of a let-7b RPM and give the predicted tissue, with a probability.
To do this, the SVC class was first utilized from the sklearn library to develop an SVM classification model. Then, using the pandas library, key data columns, including Stage and RPM, were extracted from the spreadsheet. To address the imbalance in sample sizes, normal samples were oversampled using SMOTE.
SMOTE (Synthetic Minority Oversampling Technique) is a popular way for oversampling datasets when limited data is available. SMOTE was chosen to oversample the dataset because it had no obvious disadvantages. A large dataset (~500 samples) was not available, and the data wasn’t very noisy, especially for Normal RPM. Furthermore, the case dealt with a one-dimensional scenario (RPM), so SMOTE didn’t encounter any issues. The classes were also very well separated (the p-value was extremely low), so SMOTE was expected to do an excellent job in oversampling. Since SMOTE doesn’t perform well with regression issues, it was a good choice, as the problem being solved was classification. With no obvious disadvantages, SMOTE was chosen as the best course for oversampling.
After applying SMOTE, the data was then split into training and testing sets, and the model was trained accordingly. To optimize efficiency, the trained model was saved as a .joblib file, enabling reuse without repeated computational expense.
Predictions were made using .predict and .predict_proba, providing both outcomes and confidence probabilities.
Here is an example of how it is used::
What is the given RPM for let-7b?
>> 5346.257558
Predicted: Normal, 85.3% chance
This is the link to the code for the model: [https://github.com/IvanR2625/ccrcc_svm_model]
Here is the performance of the model:
Accuracy: 0.9069767441860465
Precision: 0.8869565217391304
Recall: 0.9357798165137615
ROC-AUC: 0.9309330102129133
Confusion Matrix:
[[ 93 13]
[[ 7 102]]
Classification Report:
Precision|recall|f1-score|support
0 0.93 0.88 0.90 106
1 0.89 0.94 0.91 109
accuracy 0.91 215
macro avg 0.91 0.91 0.91 215
weighted avg 0.91 0.91 0.91 215
The model has a high accuracy, f1 score, and ROC-AUC score. This suggests that using let-7b as a biomarker for cancer is a viable idea with more advanced neural networks in the future. These advanced neural networks will allow the classification of tissues using let-7b to be a reliable and mainstream form of identification, allowing cancer patients to reap the benefits of early identification and more reliable and less invasive results.
Conclusion
let-7 is known to have therapeutic potential. It has been shown that it can be used to classify tissues for cancers before, however, the aim was to see which miRNA would be best suited as a biomarker for ccRCC, and the variation of let-7 as a whole. The analysis showed that the expression of let-7 members either had a dramatic decrease, an increase, or relatively no change between normal and cancer tissues. The let-7 member with the largest difference between Normal and Cancer tissues was let-7b. Thus, if a model were to be used to diagnose a tissue as cancerous, the model should look for the expression of let-7b as it would have the highest accuracy since the classes are more separate. Since the RPM is higher (in the thousands), it would also be easier to identify.
As let-7 is a tumour suppressor, it is expected that the number of members will decrease overall. For example, lung cancer, it exhibits downregulation, as expected8. Therefore, in ccRCC, the expression of let-7b should decrease. However, in this case, that hasn’t happened.
There was a 2.63-fold increase in expression, which suggests that in ccRCC, let-7b might act as an oncogene, which hasn’t been shown yet. This poses a question requiring further research, if let-7b exhibits such drastic oncogenic roles in other cancers, to better understand let-7b and its role in cancer. With this knowledge, there is a possibility for new advancements in cancer research.
While this unknown, unusual behaviour of let-7b sparks unanswered questions, it also is excellent for a model based on identifying a tissue as cancerous or not. Since the p-value was so low, far below the threshold, a model based on the let-7b expression would perform very well. This influenced the creation of the model in the first place to prove this fact.
Python was used to create the SVM model. Using the SVC class in the sklearn (Scikit-learn) library, which is the class for an SVM model to classify, the model was created. SVC is the right model, as the goal of the SVM model is to classify between Normal and Cancer tissues.
Then, using the panda's library, the data was extracted from the spreadsheet, only considering the important columns, Stage and RPM. Then the Normal samples were oversampled, as the ~500 samples for cancer were unavailable, using SMOTE. Next, the data was split into training and testing, and using this, the model was trained and tested. 20% of the data is allocated to testing the model, and the remaining 80% is for training the SVM model (after applying SMOTE for even class distribution). To ensure models are not created repeatedly, which would be computationally expensive, it was saved into a .joblib file. Next, the model was utilized by loading it and using .predict_proba and .predict to get the percent chance and the outcome predicted.
The model has a high accuracy, f1 score, and ROC-AUC score. This suggests that using let-7b as a biomarker for cancer is a viable idea, and with more advanced neural networks in the future, it can prove to be a better method for early identification.
Possible errors in this study include inaccuracies during data transfer from files to spreadsheets, such as omitting samples, duplicating entries, or splicing values (e.g., 14356 → 4356). Errors could also arise during data extraction with grep, leading to misidentified or missed miRNA entries. The use of SMOTE to oversample normal samples, while necessary, may have introduced unintended biases. Misnaming or disorganizing files, such as CaseID mismatches, could have resulted in referencing incorrect samples. Additionally, manual data entry into spreadsheets is prone to typographical errors, and mistakes in statistical calculations, such as p-value determination or graphing, could misrepresent findings. Inadequate handling of outliers or assumptions about consistent sequencing depth during RPM normalization may further affect the reliability of results.
To minimize these potential errors, double-checking procedures were implemented and outliers were validated by cross-referencing sample data with RPM values. File organization and naming conventions were carefully verified to ensure accurate data matching. By inspecting statistical results for anomalies and checking data transfer processes, human and technical errors were reduced. Future improvements could involve directly importing .txt files into the spreadsheet to eliminate manual transfer errors, and leveraging additional datasets to enhance model accuracy and robustness.
A possible better way to avoid these errors is to directly input the .txt files with the RPM values into the spreadsheet, avoiding any transfer errors. Using more samples from websites other than the GDC would also be useful in achieving higher accuracy in the model and more concrete trends between normal and cancer expression levels for let-7b.
With the data and the model developed, the potential for let-7b can be seen. The difference between the expressions shows that in the future, the expression of let-7 can be used to tell if the patient has ccRCC. Here is an example of how it could work.
First, let-7 samples are taken and by using RNA extraction kits, cells are lysed and RNA, including miRNA, is isolated. Then the RNA can be cleaned, and by using an miRNA-specific microarray chip that includes let-7b probes, the expression of let-7 can be quantified. Using an advanced model, it can be determined whether the patient has cancer.
With this method, the presence of ccRCC can be identified earlier and non-invasively, saving countless more lives by giving medical professionals more time to prepare for the cancer.
While the method stated above could in theory work, there are likely still better ways to do it. In the future, work could be done to put the method above into practice, and check the results. Also, more research on the nature of let-7b’s unusual increase in expression is required.
In conclusion, let-7b’s expression between Normal and Cancer more than doubles, which is unexpected as let-7b usually acts as a tumour suppressor. However, this dramatic increase allows it to be the perfect target for a biomarker, and to show this, an SVM model was created which performed well with a 90.7% accuracy. In the future, better models, better methods for getting let-7 expression from patients, and more research on the unusual behaviour of let-7b’s expression are needed. When these questions are answered, the role of let-7 will be better understood, and ccRCC can be better identified and prevented.
Citations
References
-
Clear Cell Renal Cell Carcinoma. Cancer.gov. https://www.cancer.gov/pediatric-adult-rare-tumor/rare-tumors/rare-kidney-tumors/clear-cell-renal-cell-carcinoma. Published March 17, 2020.
-
Pandey J, Syed W. Renal Cancer. StatPearls - NCBI Bookshelf. https://www.ncbi.nlm.nih.gov/books/NBK558975/#:~:text=Renal%20cell%20carcinoma%20(RCC)%20is,percent%20of%20all%20renal%20malignancies. Published October 4, 2024.
-
Davis-Dusenbery BN, Hata A. MicroRNA in Cancer: The Involvement of Aberrant MicroRNA Biogenesis Regulatory Pathways. Genes & Cancer. 2010;1(11):1100-1114. doi:10.1177/1947601910396213
-
Ma Y, Shen N, Wicha MS, Luo M. The Roles of the Let-7 Family of MicroRNAs in the Regulation of Cancer Stemness. Cells. 2021;10(9):2415. doi:10.3390/cells10092415
-
Pokrovenko DA, Vozniuk V, Medvediev MV. MicroRNA Let-7: A Promising Non-invasive Biomarker for Diagnosing and Treating External Genital Endometriosis. Journal of Turkish Society of Obstetric and Gynecology. 2021;18(4):291-297. doi:10.4274/tjod.galenos.2021.07277
-
Gilles ME, Slack FJ. Let-7 microRNA as a Potential Therapeutic Target With Implications for Immunotherapy. Expert Opinion on Therapeutic Targets. 2018;22(11):929-939. doi:10.1080/14728222.2018.1535594
-
Gowrishankar B, Ibragimova I, Zhou Y, et al. MicroRNA Expression Signatures of Stage, Grade, and Progression in Clear Cell RCC. Cancer Biology & Therapy. 2013;15(3):329-341. doi:10.4161/cbt.27314
-
Smolarz B, Durczyński A, Romanowicz H, Szyłło K, Hogendorf P. miRNAs in Cancer (Review of Literature). International Journal of Molecular Sciences. 2022;23(5):2805. doi:10.3390/ijms23052805
-
Barh D, Malhotra R, Ravi B, Sindhurani P. Microrna Let-7: An Emerging Next-Generation Cancer Therapeutic. Current Oncology. 2010;17(1):70-80. doi:10.3747/co.v17i1.356
-
Shell S, Park SM, Radjabi AR, et al. Let-7 Expression Defines Two Differentiation Stages of Cancer. Proceedings of the National Academy of Sciences. 2007;104(27):11400-11405. doi:10.1073/pnas.0704372104
-
Fedorko M, Juracek J, Stanik M, et al. Detection of Let-7 miRNAs in Urine Supernatant as Potential Diagnostic Approach in Non-metastatic Clear-cell Renal Cell Carcinoma. Biochemia Medica. 2017;27(2):411-417. doi:10.11613/bm.2017.043
-
Cochetti G, Cari L, Nocentini G, et al. Detection of Urinary miRNAs for Diagnosis of Clear Cell Renal Cell Carcinoma. Scientific Reports. 2020;10(1). doi:10.1038/s41598-020-77774-9
-
Osada H, Takahashi T. Let‐7 and miR‐17‐92: Small‐sized Major Players in Lung Cancer Development. Cancer Science. 2010;102(1):9-17. doi:10.1111/j.1349-7006.2010.01707.x
-
Schiavoni V, Campagna R, Pozzi V, et al. Recent Advances in the Management of Clear Cell Renal Cell Carcinoma: Novel Biomarkers and Targeted Therapies. Cancers. 2023;15(12):3207. doi:10.3390/cancers15123207
-
He X, Tian F, Guo F, et al. Circulating Exosomal mRNA Signatures for the Early Diagnosis of Clear Cell Renal Cell Carcinoma. BMC Medicine. 2022;20(1). doi:10.1186/s12916-022-02467-1
-
Park S, Eom K, Kim J, et al. MiR-9, miR-21, and miR-155 as Potential Biomarkers for HPV Positive and Negative Cervical Cancer. BMC Cancer. 2017;17(1). doi:10.1186/s12885-017-3642-5
-
GDC. https://portal.gdc.cancer.gov/analysis_page?app=CohortBuilder&tab=general. Accessed January 8, 2025.
-
Google Sheets: Sign-in. https://docs.google.com/spreadsheets/u/0/?ec=wgc-sheets-[module]-goto&tgif=d.
-
MiRNA Target IP Kit. MyBio Ireland. https://mybio.ie/products/mirna-target-ip-kit.
-
Fedorko M, Juracek J, Stanik M, et al. Detection of let-7 miRNAs in urine supernatant as potential diagnostic approach in non-metastatic clear-cell renal cell carcinoma. Biochemia Medica. 2017;27(2):411-417. doi:10.11613/bm.2017.043
-
Kalantzakos TJ, Sebel LE, Trussler J, et al. MicroRNA Associated with the Invasive Phenotype in Clear Cell Renal Cell Carcinoma: Let-7c-5p Inhibits Proliferation, Migration, and Invasion by Targeting Insulin-like Growth Factor 1 Receptor. Biomedicines. 2022;10(10):2425. doi:10.3390/biomedicines10102425
-
Cinque A, Vago R, Trevisani F. Circulating RNA in kidney Cancer: What we know and what we still suppose. Genes. 2021;12(6):835. doi:10.3390/genes12060835
-
Jiang C, Li X, Zhao H, Liu H. Long non-coding RNAs: potential new biomarkers for predicting tumor invasion and metastasis. Molecular Cancer. 2016;15(1). doi:10.1186/s12943-016-0545-z
-
Yao Q, He YL, Wang N, et al. Identification of potential genomic alterations and the circRNA-miRNA-mRNA regulatory network in primary and recurrent synovial sarcomas. Frontiers in Molecular Biosciences. 2021;8. doi:10.3389/fmolb.2021.707151
-
Wang H, Peng R, Wang J, Qin Z, Xue L. Circulating microRNAs as potential cancer biomarkers: the advantage and disadvantage. Clinical Epigenetics. 2018;10(1). doi:10.1186/s13148-018-0492-1
-
Li J, Meng H, Bai Y, Wang K. Regulation of LNCRNA and its role in cancer metastasis. Oncology Research Featuring Preclinical and Clinical Cancer Therapeutics. 2016;23(5):205-217. doi:10.3727/096504016x14549667334007
-
Kitsou K, Iliopoulou M, Spoulou V, Lagiou P, Magiorkinis G. Viral causality of human cancer and potential roles of human endogenous retroviruses in the Multi-Omics Era: an Evolutionary Epidemiology review. Frontiers in Oncology. 2021;11. doi:10.3389/fonc.2021.687631
-
Evangelista EA, Cho CW, Aliwarga T, Totah RA. Expression and function of Eicosanoid-Producing cytochrome P450 enzymes in solid tumors. Frontiers in Pharmacology. 2020;11. doi:10.3389/fphar.2020.00828
-
Olmedo-Suárez MÁ, Ramírez-Díaz I, Pérez-González A, et al. Epigenetic Regulation in Exposome-Induced Tumorigenesis: Emerging roles of NCRNAs. Biomolecules. 2022;12(4):513. doi:10.3390/biom12040513
-
Fedorko M, Juracek J, Stanik M, et al. Detection of let-7 miRNAs in urine supernatant as potential diagnostic approach in non-metastatic clear-cell renal cell carcinoma. Biochemia Medica. 2017;27(2):411-417. doi:10.11613/bm.2017.043
Images and Video
Project Image: Created by me (Ivan) using Biorender.
BioRender. https://app.biorender.com/illustrations/67d70f463840ed0b84af20af.
Methodology Flowchart: Created by me (Ivan) using Canva.
Canva. https://www.canva.com/design/DAGfkvtei6g/W-OLp3etpEb8R3JPQEwtjA/edit.
Banner:
-
New and improved SPRITE: Now with RNA! | NIH Common Fund. https://commonfund.nih.gov/4DNucleome/highlights/new-and-improved-sprite-now-rna.
-
Garcia J. Landmark Study Proves Effectiveness of Kidney Transplants from HIV-positive Donors to HIV-positive Recipients. InventUM. January 2025. https://news.med.miami.edu/study-proves-effectiveness-of-kidney-transplants-from-hiv-positive-donors/.
-
MicroRNA IN a NEW LIGHT! | BioVendor R&D. https://www.biovendor.com/mirna-summary.
Data Diagrams:
- Box plot created by me (Ivan) using matplotlib.pyplot in python
Matplotlib — Visualization with Python. https://matplotlib.org/. - Table created by me (Ivan) using google sheets
KIRC Project Data: ALL. Google Docs. https://docs.google.com/spreadsheets/d/1OxaipNosENnGiObiOAnu9GqviznvuBl8RHOFlvrl7Ro/edit?gid=0#gid=0.
miRNA Biogenisis Video: Katharina Petsche. Gene Silencing by Micro RNA - Studio Katharina Petsche. YouTube. May 2015. https://www.youtube.com/watch?v=t5jroSCBBwk.
Trifold Images:
- Davis-Dusenbery BN, Hata A. MicroRNA in Cancer: The Involvement of Aberrant MicroRNA Biogenesis Regulatory Pathways. Genes & Cancer. 2010;1(11):1100-1114. doi:10.1177/1947601910396213
-
What is Cancer | Recognising The Symptoms | Cancer Council SA. https://www.cancersa.org.au/cancer-a-z/what-is-cancer/.
-
How to perform the mIRNA expression assay. NanoString University. https://university.nanostring.com/how-to-perform-the-mirna-expression-assay.
-
Hcostie. Kidney cancer discoveries open new avenue for treatment - Ontario Institute for Cancer Research. Ontario Institute for Cancer Research. https://oicr.on.ca/kidney-cancer-discoveries-open-new-avenue-for-treatment/. Published November 21, 2024.
-
Iyikon. Vector Dna Icon Vector and PNG. https://pngtree.com/freepng/vector-dna-icon_3787609.html.
-
The Editors of Encyclopaedia Britannica. Ribosomal RNA (rRNA) | Definition & Function. Encyclopedia Britannica. https://www.britannica.com/science/ribosomal-RNA. Published May 6, 2009.
Acknowledgement
I would like to acknowledge my mentor in this science fair, Jessica (Jingyi) Xiang and my science fair professor: Dr. Garcia, for giving me guidance and mentorship throughout this project. I would like to thank Dr. Rai for helping me when I reached a roadblock, and my family for being so supportive of my endaveurs in this science fair. I would also like to thank the CYSF for giving me the opportunity to test my skills and learn something new in this fair.