《Journal of Oral and Maxillofacial Surgery》 ›› 2025, Vol. 35 ›› Issue (5): 356-364. doi: 10.12439/kqhm.1005-4979.2025.05.004

• Digital Dentistry Column • Previous Articles     Next Articles

Machine learning-based screening and identification of key genes associated with the prognosis of head and neck squamous cell carcinoma

YAO Jia1,2(), DANG Linlin3, TU Junbo1,2,4, NA Sijia1,2,4()   

  1. 1 Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004
    2 Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004
    3 Shaanxi Center for Drug and Vaccine Inspection, Xi'an 710065
    4 Department of Oral and Maxillofacial Surgery, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China
  • Received:2024-09-12 Accepted:2025-03-26 Published:2025-10-28 Online:2025-10-28

基于机器学习筛选和鉴定与头颈部鳞状细胞癌预后相关的关键基因

姚佳1,2(), 党琳琳3, 屠军波1,2,4, 那思家1,2,4()   

  1. 1 西安交通大学口腔医院,陕西省颅颌面精准医学研究重点实验室,西安 710004
    2 西安交通大学口腔医院,陕西省牙颌疾病临床研究中心,西安 710004
    3 陕西省药品和疫苗检查中心,西安 710065
    4 西安交通大学口腔医院口腔颌面外科,西安 710004
  • 通讯作者: 那思家,副主任医师. E-mail: sijiana@xjtu.edu.cn
  • 作者简介:
    姚佳,住院医师. E-mail:
  • 基金资助:
    陕西省卫生健康口腔颌面外科科研创新团队(2024TD-19)

Abstract:

Objective: To screen and identify key genes associated with prognosis in head and neck squamous cell carcinoma (HNSCC). Methods: The clinical data and RNA sequencing (RNA-Seq) data of HNSCC patients from the Cancer Genome Atlas (TCGA) database were randomly divided into training set (cohortⅠ, n=228) and validation set (cohortⅡ, n=98). The prognostic seed genes were determined using random survival forest (RSF) models and Cox proportional hazards models, and the key genes related to prognosis were further screened using a forward selection modes. The survival risk scoring system was constructed using the selected key genes, and these genes were subsequently validated and subjected to bioinformatics analysis. The expression of the key genes was detected by real-time quantitative polymerase chain reaction (RT-qPCR) in the human oral epithelial keratinocytes (HOK cell line) and the human tongue squamous carcinoma cell (CAL27 cell line). Results: Twelve prognosis-related key genes were identified. Patients in the high-risk group had a significantly poorer prognosis than those in the low-risk group, with a hazard ratio (HR) of 4.19 in CohortⅡ (P<0.05). There was a significant difference in the expression level of the key genes between the HOK cell line and the CAL27 cell line (P<0.05). Conclusion: Twelve key genes affecting the prognosis of HNSCC patients were identified through a machine learning model and may serve as prognostic biomarkers for HNSCC.

Key words: machine learning, head and neck squamous cell carcinoma, random survival forest model, Cox proportional hazards model, survival risk

摘要:

目的:筛选和鉴定与头颈部鳞状细胞癌(head and neck squamous cell carcinoma,HNSCC)预后相关的关键基因。方法:将来自癌症基因组图谱(Cancer Genome Atlas,TCGA)数据库中的HNSCC患者的临床数据和RNA测序(RNA sequencing,RNA-Seq)数据,随机分为训练集(队列Ⅰ,n=228)和验证集(队列Ⅱ,n=98)。使用机器学习模型——随机生存森林(random survival forest,RSF)模型和Cox比例风险模型共同确定预后相关种子基因,使用正向选择模型进一步筛选与预后相关的关键基因。利用筛选出的关键基因构建生存风险评分系统,并对关键基因进行验证和生物信息学分析。利用实时定量聚合酶链反应(real-time quantitative polymerase chain reaction,RT-qPCR)检测人口腔上皮角质细胞HOK细胞系和人舌鳞癌细胞CAL27细胞系中关键基因的表达。结果:筛选出12个与预后相关的关键基因,高风险组的预后比低风险组差,队列Ⅱ风险比(hazard ratio,HR)为4.19,差异具有统计学意义(P<0.05)。关键基因在HOK细胞系与CAL27细胞系中的表达水平差异有统计学意义(P<0.05)。结论:通过机器学习模型共发现12个影响HNSCC患者预后的关键基因,其可能成为HNSCC预后标志物。

关键词: 机器学习, 头颈部鳞状细胞癌, 随机生存森林模型, Cox比例风险模型, 生存风险