Gene Expression Changes Associated 3q Amplification in Squamous Cell Carcinoma

Data Analysis of SCC to determine the impact of 3q 24-29 DNA amplification
Phoebe Weng
Grade 11


Squamous Cell Carcinoma (SCC) is one of the most common cancers capable of metastatic spread1.  SCC is a cancer type prevalent in many different anatomical sites, within organs that are covered with epithelium cells.. These cancers were recently studied by The Cancer Genome Atlas (TCGA); TCGA completed initial analyses of mutations, DNA copy-number alterations, DNA methylations, RNA/microRNA and protein expressions in SCCs3–6.  A recent study showcased chromosome-wide amplification within the 3q 24-29 chromosome across patients within 5 subtypes of Squamous Cell Carcinoma. This study aimed at answering this over-arching question: What is effect of amplification of the chromosomal region 3q24-29 on gene expression in Squamous Cell Carcinoma? 



To answer our questions about the effect of chromosomal amplification on gene expression and which genes specifically are affected, we must perform two different expression analyses (DEA).  This was done using the programming language of R-specifically RStudio, as well as the DESeq2 package to perform the DEA22,23.

The First Differential Expression Analysis

The first DEA compared expression data from samples taken from tumor sites with samples taken from sites without cancer, our ‘normal’ samples. To do this, we took data from gdac broad institute, a freely available source of the data collected by The Cancer Genome Atlas Research(TCGA) Network: Using this data, we created our metadata for our differential expression analysis.

Making the Metadata

 Our metadata contained all of our samples and the information on their variables that DESeq2 connects to our dataset with the actual expression value of each gene from RNA sequencing (RNAseq) data. This first metadata contained 4 columns of information: cancer type, hpv status (positive or negative), cell type (normal or tumor) and TCGA ID.

Cell type and TCGA ID

 Cell type information is all available through interpretation of the name TCGA gives to each sample, or its TCGA barcode. Within the TCGA barcode, samples are labeled as 01 for tumor samples and 11 for normal samples. The barcode was subset to a new column and given ‘tumor’ or ‘normal’ for each sample. 

HPV-status and cancer type

HPV has been associated with HNSC, CESC and BLCA, however it has not been found to be associated with LUSC and BLCA. HPV status was found for each sample by using clinical data with hpv status and cancer type included, merged with our main metadata. This row can later be used to see the impact that hpv status might have our results.

After merging the hpv dataset with our main dataset by the patiend id, we have a dataset of hundreds of patients with all of our information. 

The final metadata is taken by selecting the 4 variables we are looking for: cancer type(admin.disease_code, patient TCGA ID(rowname), hpv status and cell type. 

Raw counts

The second data frame is made up of the raw counts data with the expression data (obtained through RNA-sequencing) for each gene (as row names) of each sample (as column names). Both data frames needed to be formatted and cleaned up properly to be inputted into the DESeq2 system to run the DEA. After the DEA was run, DESeq2 outputted a list of genes with their subsequent p-values and logfoldChange values, from which the upregulated and downregulated genes could be calculated using the threshold of +/- 0.58 and that the p-value had to be <0.05.

This list of genes was then inputted into the website g: profiler, which would tell us exactly which metabolic pathways each gene was a part of.

The Second Differential Expression Analysis

The second analysis also compared expression data from different samples but now with the variable of amplification in the metadata. To do that, each sample must first be assigned its amplification status, whether it was amplified(amp) or not(non_amp).


Amplification was found in another dataset form TCGA, one that measured copy number variation with amplification represented by 0, 1 and 2. For our purposes, 0 and 1(no and low amplification) became the not amplified samples and samples with 2 were amplified. The other variables of the metadata were the same for both analyses.

The second analysis also required filtering for only tumor samples as this analysis was only within cancer samples. The other steps of formatting for DESeq2 were the same in principle for both DEAs.

Pathway Analysis

Once both DEA were completed, there was two datasets from each with each gene name listed with its log Fold Change and p-value listed. Significant genes were filtered for by passing these results through a default threshold of +/-0.58 log Fold Change value and over a p-value of 0.05. However, for LUSC and BLCA, the thresholds were increased respectively to +/- 1 and 0.01 since we were getting results of thousands of genes. For our purposes, we only looked at significantly overexpressed genes, those that had a log Fold Change value over 0.58 and a p-value of 0.05. Then the overlap was found between significant differentially expressed genes from amplified samples and significant differentially expressed genes in cancer tissue to give us our final gene list for each cancer type.  These lists of significantly up-regulated genes were then fed through the g:Profiler website to find which metabolic pathways were affected by these genes24.  


There are certainly limitationg within this study, both from the setup and also by using data from the TCGA. One limitation comes from our use of DESeq and the DEA workflow. Any weakly expressed genes that do not have enough reads from rna-seq would not be viabke for calculate their differntial expressions because their low count numbers lead to results that are more prone to error.

While using publicly available data through TCGA allowed us to access a wide expanse of data for free, using data generated by others confines us to their parameters and we cannot generate out own data with our own parameters. 


Taking It Further

While this initial analysis provided us with many new insights and important data for future cancer research, there is still plenty more than can imporve and build upon this project. 

First of all, hpv status was calculated and it was originally planned to filter for only hpv negative samples to keep our data consistent. However the role of hpv in LUSC and BLCA has not been found and most of our CESC samples were hpv positive, so we kept both hpv positive and negative samples were kept. Since hpv data was calculated, DEA can be re-run with hpv status included to see how that variable affects our data. DESeq can be run with multiple variables. However, since this was my first time coding and using DESeq, we started out with just 1 manipulated variable at a time. 

Secondly, the cancer tissue that was drawn for TCGA analysis was from bulk tissue of the tumor, with many different cell types that were sequenced. Going forward, we can programs such as Cibersort to profile the different cell types and see what effect that can have on our results. 



I wrote all the code using Rstudio, due to this project being in collaboration with a Dr. Bose at the University of Calgary, I did not put the code on a public link. If you would like to see code we ran, feel free to ask during the Q&A session. 


Squamous Cell Carcinoma (SCC)

 Squamous Cell Carcinoma(SCC) is one of the most common cancers capable of metastatic spread1.  SCC is a cancer type prevalent in many different anatomical sites, within organs that are covered with epithelium cells.  In particular, Head and Neck (HNSC), Lung(LUSC), Esophagus(ESCA), Cervix(CESC) and Bladder(BLCA) Squamous Cell Carcinomas are all cancers associated with smoking and/or Human Papillomavirus (HPV)2. These cancers were recently studied by The Cancer Genome Atlas (TCGA); TCGA completed initial analyses of mutations, DNA copy-number alterations, DNA methylations, RNA/microRNA and protein expressions in SCCs3–6. Around 600,000 new cases are diagnosed each year in only HNSC, the sixth most common cancer in the world7,8. Only 11-17 percent of patients survive past 5 years once diagnosed with non-small cell lung cancers which include LUSC9.  The main risk factors of SCCs are the  heavy tobacco/alcohol use  and human papillomavirus(HPV)10


Human Papillomavirus and Squamous Cell Carcinoma

SCCs are either HPV positive, caused by previous infection of HPV or HPV negative, caused by tobacco and/or alcohol use.  HPV+ SCC are consistently poorly differentiated and non-keratinizing compared to HPV- SCC11. The two types of SCC are completely separate biological entities and result in different epidemiological, clinical, anatomical, radiological, behavioural, biological and prognostic characteristics12. Growing research suggests that these two subtypes should be treated differently and the optimal treatment for these subtypes remain unclear12.

Copy Number Variation

A recent study by TCGA showed that smoking related HNSCs showed near universal mutations in the TP53 gene and CDKN2A (both resulting in loss of function), as well as frequent copy number alterations specifically amplification of 3q 26/28 and 11q13/22 chromosome13. Another analysis done by the TCGA focusing on LUSC showed similar alterations perhaps indicating that the biology of these two diseases may be similar6. Broad 3q chromosomal amplification is the most common chromosomal alteration in LUSC14.

Specifically, the human 3q chromosome has been shown to experience copy number gains in SCC2. Individuals may experience changes in the number of copies of a certain gene, either copy gain or copy loss (CNV). Research has shown an association between CNV loss or gain and certain types of cancer, including SCC.  Researchers are still investigating the exact role of copy number gains within cancers and other human diseases. A study has been done into the role of Fibroblast growth receptor 1(FGFR1), a tyrosine kinase receptor shown to be targeted by 3q amplification within LUSC14. It has been suggested that the number and biological importance of the genes along each 3q amplicon might help explain inter-individual outcome variations of the cancer and its potential predictive value14.


Targeted Therapy

Currently treatment to local and non-metastatic cancers are primarily surgery and radiotherapy while metastatic cancers are usually treated with anti-cancer drugs15. However chemotherapy will target normal and cancer cells leading to the need for targeted therapy, which is based on changes in the molecular biology of tumor cells15. In targeted therapy, drugs target specific enzymes, growth factor receptors and signal transducers that are involved in cancer growth15. These aim to block specific biologic transduction pathways or cancer proteins that are involved in tumor growth and progression, these molecular targets are found in normal tissues but are overexpressed or mutated within cancer cells16. Specific drugs are manufactured to block the signals that help mutated cells to grow, cause apoptosis of cancer cells, stimulate the immune system or target the delivery of chemotherapy agents to only cancer cells16

Already in Non-Small Cell Lung Cancer (NSCLC) the mutated KRAS gene has been found to be associated with a worse prognosis and in a clinical study, a mixture of selumetinib and docetaxel was associated with a higher response rate(37%) vs docetaxel and placebo(0%) in patients with KRAS-mutant NSCLC17. Erlotinib is a medicine that was approved by the FDA for metastatic NSCLC to target Epidermal Growth Factor Receptor(EGFR) tyrosine kinase18,19. However not all genomic alterations in LUSC have been comprehensively characterized and no molecularly targeted agents have been developed for its treatment6. In BLSC no molecularly targeted agents have been approved for treatment so far although the high number of chromatin regulatory genes mutated suggest a potential for targeted therapy for chromatin abnormalities4. Within HNSC, the only new agent that has been FDA approved was cetuximab8. This monoclonal antibody targets epidermal growth factor receptor yet the addition of cetuximab to platinum-based chemotherapy resulted in only a 2.7 month survival increase and a 20% reduction in the relative risk of death20. But immunotherapy is an exciting new treatment that hopes to use patient’s own immune systems to fight cancer by releasing suppressed immune cells that will fight tumor cells akin to how the body fights infection8. An improved response rate and overall survival in patients with recurrent/metastatic HNSC has been shown in current results from immunotherapy trials8. The combination of targeted therapies and immunotherapy has also been an area of interest in cancer treatment based on the complementary modes of action between these two treatments21.  Knowing which specific genes are affected by 3q amplification and their effect on molecular pathways will have the potential to influence how SCCs are classified into molecular subtypes and help improve treatment and prognosis for patients2.


In conclusion, we were able to obtain a list of significantly up-regulated genes in each cancer type and see which metabolic pathways were affected. Some metabolic pathways were seen across cancers such as the NRF2 pathway, DNA-binding, as well as 3q29 copy number variation syndrome pathway. We also saw new pathways that have not been studies with these cancer types before as well as pathways that make sense given their subtype of cancer. For example, nervous system and synapse pathways being affect within Head and Neck Squamous Cell Carcinoma. Going forward, more exploration can be done to fully explore the effect of 3q chromosome amplification. Metabolic pathways affected just by up-regulated genes in cancer cells could be found and compared to our results to see what pathways are affected solely by amplification. A survival analysis could be also done with clinical data to see which genes are associated with prognosis of cancer. 


Reference List(AMA):

1.        Yan W, Wistuba II, Emmert-Buck MR, Erickson HS. Squamous Cell Carcinoma - Similarities and Differences among Anatomical Sites. Am J Cancer Res. 2011;1(3):275-300. doi:10.1158/1538-7445.am2011-275

2.        Campbell JD, Yau C, Bowlby R, et al. Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas. Cell Rep. 2018;23(1):194-212.e6. doi:10.1016/j.celrep.2018.03.063

3.        Kim J, Bowlby R, Mungall AJ, et al. Integrated genomic characterization of oesophageal carcinoma. Nature. 2017;541(7636):169-175. doi:10.1038/nature20805

4.        Weinstein JN, Akbani R, Broom BM, et al. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature. 2014;507(7492):315-322. doi:10.1038/nature12965

5.        Burk RD, Chen Z, Saller C, et al. Integrated genomic and molecular characterization of cervical cancer. Nature. 2017;543(7645):378-384. doi:10.1038/nature21386

6.        Hammerman PS, Lawrence MS, Voet D, et al. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489(7417):519-525. doi:10.1038/nature11404

7.        Elkashty OA, Ashry R, Tran SD. Head and neck cancer management and cancer stem cells implication. Saudi Dent J. 2019;31(4):395-416. doi:

8.        Moskovitz J, Moy J, Ferris RL. Immunotherapy for Head and Neck Squamous Cell Carcinoma. Curr Oncol Rep. 2018;20(2):22. doi:10.1007/s11912-018-0654-5

9.        Zappa C, Mousa SA. Non-small cell lung cancer: current treatment and future advances. Transl lung cancer Res. 2016;5(3):288-300. doi:10.21037/tlcr.2016.06.07

10.      Dhull AK, Atri R, Dhankhar R, Chauhan AK, Kaushal V. Major Risk Factors in Head and Neck Cancer: A Retrospective Analysis of 12-Year Experiences. World J Oncol. 2018;9(3):80-84. doi:10.14740/wjon1104w

11.      Hennessey PT, Westra WH, Califano JA. Human papillomavirus and head and neck squamous cell carcinoma: recent evidence and clinical implications. J Dent Res. 2009;88(4):300-306. doi:10.1177/0022034509333371

12.      Dayyani F, Etzel CJ, Liu M, Ho C-H, Lippman SM, Tsao AS. Meta-analysis of the impact of human papillomavirus (HPV) on cancer risk and overall survival in head and neck squamous cell carcinomas (HNSCC). Head Neck Oncol. 2010;2:15. doi:10.1186/1758-3284-2-15

13.      Lawrence MS, Sougnez C, Lichtenstein L, et al. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature. 2015;517(7536):576-582. doi:10.1038/nature14129

14.      Mendez P, Ramirez JL. Copy number gains of FGFR1 and 3q chromosome in squamous cell carcinoma of the lung. Transl lung cancer Res. 2013;2(2):101-111. doi:10.3978/j.issn.2218-6751.2013.03.05

15.      Pérez-Herrero E, Fernández-Medarde A. Advanced targeted therapies in cancer: Drug nanocarriers, the future of  chemotherapy. Eur J Pharm Biopharm  Off J  Arbeitsgemeinschaft fur Pharm Verfahrenstechnik eV. 2015;93:52-79. doi:10.1016/j.ejpb.2015.03.018

16.      Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646-674. doi:10.1016/j.cell.2011.02.013

17.      Jänne PA, Shaw AT, Pereira JR, et al. Selumetinib plus docetaxel for KRAS-mutant advanced non-small-cell lung cancer: a  randomised, multicentre, placebo-controlled, phase 2 study. Lancet Oncol. 2013;14(1):38-47. doi:10.1016/S1470-2045(12)70489-8

18.      Shepherd FA, Rodrigues Pereira J, Ciuleanu T, et al. Erlotinib in previously treated non-small-cell lung cancer. N Engl J Med. 2005;353(2):123-132. doi:10.1056/NEJMoa050753

19.      Tsimberidou A-M. Targeted therapy in cancer. Cancer Chemother Pharmacol. 2015;76(6):1113-1132. doi:10.1007/s00280-015-2861-1

20.      Vermorken JB, Mesia R, Rivera F, et al. Platinum-based chemotherapy plus cetuximab in head and neck cancer. N Engl J Med. 2008;359(11):1116-1127. doi:10.1056/NEJMoa0802656

21.      Vanneman M, Dranoff G. Combining immunotherapy and targeted therapies in cancer treatment. Nat Rev Cancer. 2012;12(4):237-251. doi:10.1038/nrc3237

22.      Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi:10.1186/s13059-014-0550-8

23.      Team RC. R: A language and Environment for Statistical Computing. Published online 2013. http://www/

24.      Raudvere U, Kolberg L, Kuzmin I, et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019;47(W1):W191-W198. doi:10.1093/nar/gkz369

25.      Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458(7239):719-724. doi:10.1038/nature07943

26.      Cooper LA, Demicco EG, Saltz JH, Powell RT, Rao A, Lazar AJ. PanCancer insights from The Cancer Genome Atlas: the pathologist’s perspective. J Pathol. 2018;244(5):512-524. doi:10.1002/path.5028

27.      chinnikatti S. Cancer and its Genomics in Transformation Era. Cancer Ther Oncol Int J. 2017;7. doi:10.19080/CTOIJ.2017.07.555708

28.      Qian J, Massion PP. Role of chromosome 3q amplification in lung cancer. J Thorac Oncol  Off Publ Int Assoc  Study Lung Cancer. 2008;3(3):212-215. doi:10.1097/JTO.0b013e3181663544

29.      Dash S, Kinney NA, Varghese RT, Garner HR, Feng W, Anandakrishnan R. Differentiating between cancer and normal tissue samples using multi-hit combinations of genetic mutations. Sci Rep. 2019;9(1):1005. doi:10.1038/s41598-018-37835-6

30.      Bose P, Brockton NT, Dort JC. Head and neck cancer: from anatomy to biology. Int J Cancer. 2013;133(9):2013-2023. doi:10.1002/ijc.28112


Many thanks to my mentor, Dr. Pinaki Bose from the Cummings School of Medicine of the University of Calgary. This project would not have been possible without his guidance and expertise. 

I was also greatly helped by Mehul Kumar (MSc Graduate Student at the Cummings School of Medicine) with his knowledge on R coding as well as data analysis. As well as Reid McNeil (MSc Molecular Biology and Biochemistry), who was a fellow contributor to this study whose advice and insight was intrumental in my learning about this new world of data analysis. 

I would also like to say a huge thank you to my teachers Dr. Garcia-Diaz and Ms. Gierus for their constant support and feedback.