A Knowledge Graph Approach for Breast Cancer Diagnosis and Data Sharing Platform Implementation in the Context of Human Papillomavirus Infection

Notice

This is an unedited manuscript accepted for publication and provided as an Article in Press for early access at the author’s request. The article will undergo copyediting, typesetting, and galley proof review before final publication. Please be aware that errors may be identified during production that could affect the content. All legal disclaimers of the journal apply.

Year : 2026 | Volume : 15 | Issue : 01 | Page :
    By

    Xiang Fu,

  • Lin Hu,

  • Xuebei Du,

  1. Researcher, Department of Information Center, Longgang Central Hospital of Shenzhen, Shenzhen city, Guangdong province, China
  2. Researcher, School of Electronic Information and Electrical Engineering, Chengdu University, Chengdu, Sichuan province, China
  3. Researcher, Department of Geriatric Infection, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, Sichuan province, China

Abstract

Background: Breast cancer remains among the most prevalent malignancies in women worldwide, and effective diagnosis and data integration continue to challenge clinical practice. Diagnostic reports from mammography and ultrasound contain rich clinical information that is often under-utilised due to heterogeneous formats and limited data-sharing infrastructure. In the context of human papillomavirus (HPV) infection, which may influence oncogenic pathways and data complexity, advanced computational methods offer new solutions to this problem. Objective: We aimed to construct a comprehensive knowledge graph for breast cancer diagnosis, derived from imaging reports, and to implement a data-sharing platform to enable structured, interoperable data management in a clinical oncology setting. Methods: We collected 1,002 mammography and ultrasound reports and employed a joint entity- and relationship-extraction pipeline using a RoBERTa-Global Pointer model. Model performance metrics accuracy, recall and F1-score exceeded 95%. Extracted information was loaded into a Neo4j knowledge graph containing 23,421 entities and 12,589 relationships. A Spring Boot-based web platform provided interactive visualization, advanced search and intelligent data management of breast cancer and HPV-related diagnostic information. Results: The system demonstrated excellent extraction performance and established a rich semantic network that supports structured data exchange, query and analysis. The platform supports clinicians and researchers in linking imaging features, HPV-related metadata and therapeutic decision-making in a unified data environment. Conclusions: Our knowledge graph and data-sharing platform provide a robust and scalable framework for breast cancer diagnosis and management addressing data-format heterogeneity, enhancing interoperability and enabling precision-driven clinical workflows. This approach offers a generalisable template for other disease domains, particularly where complex imaging and infection-related data converge.

Keywords: Breast cancer, Knowledge graph, human papillomavirus, Breast mammography, Data sharing platform implementation.

[This article belongs to Research and Reviews : A Journal of Medical Science and Technology ]

How to cite this article:
Xiang Fu, Lin Hu, Xuebei Du. A Knowledge Graph Approach for Breast Cancer Diagnosis and Data Sharing Platform Implementation in the Context of Human Papillomavirus Infection. Research and Reviews : A Journal of Medical Science and Technology. 2026; 15(01):-.
How to cite this URL:
Xiang Fu, Lin Hu, Xuebei Du. A Knowledge Graph Approach for Breast Cancer Diagnosis and Data Sharing Platform Implementation in the Context of Human Papillomavirus Infection. Research and Reviews : A Journal of Medical Science and Technology. 2026; 15(01):-. Available from: https://journals.stmjournals.com/rrjomst/article=2026/view=241817


References

  1. Acciavatti, R. J., Lee, S. H., Reig, B., Moy, L., Conant, E. F., Kontos, D., & Moon, W. K. (2023). Beyond breast density: risk measures for breast cancer in multiple imaging modalities. Radiology, 306(3), e222575.
  2. Adoma, A. F., Henry, N.-M., & Chen, W. (2020). Comparative analyses of bert, roberta, distilbert, and xlnet for text-based emotion recognition. 2020 17th international computer conference on wavelet active media technology and information processing (ICCWAMTIP),
  3. Ahn, J. S., Shin, S., Yang, S.-A., Park, E. K., Kim, K. H., Cho, S. I., Ock, C.-Y., & Kim, S. (2023). Artificial intelligence in breast cancer diagnosis and personalized medicine. Journal of breast cancer, 26(5), 405.
  4. An, B. (2023). Construction and application of Chinese breast cancer knowledge graph based on multi-source heterogeneous data. Math Biosci Eng, 20(4), 6776-6799.
  5. Aristokli, N., Polycarpou, I., Themistocleous, S., Sophocleous, D., & Mamais, I. (2022). Comparison of the diagnostic performance of Magnetic Resonance Imaging (MRI), ultrasound and mammography for detection of breast cancer based on tumor type, breast density and patient’s history: A review. Radiography, 28(3), 848-856.
  6. Buchberger, W., Geiger-Gritsch, S., Knapp, R., Gautsch, K., & Oberaigner, W. (2018). Combined screening with mammography and ultrasound in a population-based screening program. European journal of radiology, 101, 24-29.
  7. Cai, Y., Zhang, Y., Zhai, C., Xu, Z., Wang, C., Xie, J., & Xie, Q. (2023). Research on construction method and application of knowledge graph for power transformer intelligent operation and maintenance. Authorea Preprints.
  8. Chandak, P., Huang, K., & Zitnik, M. (2023). Building a knowledge graph to enable precision medicine. Scientific Data, 10(1), 67.
  9. Cui, X., Song, C., Li, D., Qu, X., Long, J., Yang, Y., & Zhang, H. (2024). RoBGP: A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer. Computers, Materials & Continua, 78(3).
  10. Daowd, A., Barrett, M., Abidi, S., & Abidi, S. S. R. (2021). Building a knowledge graph representing causal associations between risk factors and incidence of breast cancer. In Public Health and Informatics (pp. 724-728). IOS Press.
  11. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers),
  12. Feng, J., Zhang, R., Chen, D., & Shi, L. (2023). A Visualization Method of Knowledge Graphs for the Computation and Comprehension of Ultrasound Reports. Biomimetics, 8(8), 560.
  13. Goyal, A., Gupta, V., & Kumar, M. (2018). Recent named entity recognition and classification techniques: a systematic review. Computer science review, 29, 21-43.
  14. Guo, J., Hu, J., Zheng, Y., Zhao, S., & Ma, J. (2023). Artificial intelligence: opportunities and challenges in the clinical applications of triple-negative breast cancer. British journal of cancer, 128(12), 2141-2149.
  15. Halodová, K. (2025). Vliv fyzioterapeutických přístupů v léčbě lymfedému u žen po ablaci prsu.
  16. Jain, H., Raj, N., & Mishra, S. (2021). A sui generis QA approach using RoBERTa for adverse drug event identification. BMC bioinformatics, 22(Suppl 11), 330.
  17. Jiang, G., Fan, M., & Li, L. (2016). A cloud platform for remote diagnosis of breast cancer in mammography by fusion of machine and human intelligence. Medical Imaging 2016: PACS and Imaging Informatics: Next Generation and Innovations,
  18. Jin, Y., Junren, W., Jingwen, J., Yajing, S., Xi, C., & Ke, Q. (2021). Research on the construction and application of breast cancer-specific database system based on full data lifecycle. Frontiers in Public Health, 9, 712827.
  19. Li, D., Yang, Y., Cui, J., Meng, X., Qu, J., Jiang, Z., & Zhao, Y. (2024). Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer. BMC Medical Informatics and Decision Making, 24(1), 218.
  20. Li, T., Mello-Thoms, C., & Brennan, P. C. (2016). Descriptive epidemiology of breast cancer in China: incidence, mortality, survival and prevalence. Breast cancer research and treatment, 159(3), 395-406.
  21. Li, X., Sun, S., Tang, T., Lu, J., Zhang, L., Yin, J., Geng, Q., & Wu, Y. (2023). Construction of a knowledge graph for breast cancer diagnosis based on Chinese electronic medical records: development and usability study. BMC Medical Informatics and Decision Making, 23(1), 210.
  22. Liu, H., Chen, Y., Zhang, Y., Wang, L., Luo, R., Wu, H., Wu, C., Zhang, H., Tan, W., & Yin, H. (2021). A deep learning model integrating mammography and clinical factors facilitates the malignancy prediction of BI-RADS 4 microcalcifications in breast cancer screening. European Radiology, 31(8), 5902-5912.
  23. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  24. Lyu, H., Sha, N., Qin, S., Yan, M., Xie, Y., & Wang, R. (2019). Advances in neural information processing systems. Advances in neural information processing systems, 32.
  25. Mann, R. M., Hooley, R., Barr, R. G., & Moy, L. (2020). Novel approaches to screening for breast cancer. Radiology, 297(2), 266-285.
  26. Nam, K. J., Han, B.-K., Ko, E. S., Choi, J. S., Ko, E. Y., Jeong, D. W., & Choo, K. S. (2015). Comparison of full-field digital mammography and digital breast tomosynthesis in ultrasonography-detected breast cancers. The Breast, 24(5), 649-655.
  27. of National, B. C. E. C., Center, C. Q. C., & Prevention, C. (2025). Guideline for the Management Pathway and Quality Control of Breast Cancer Prevention and Treatment in China’s Counties. Cancer Innovation, 4(3), e70005.
  28. Ohnuki, K., Tohno, E., Tsunoda, H., Uematsu, T., & Nakajima, Y. (2021). Overall assessment system of combined mammography and ultrasound for breast cancer screening in Japan. Breast Cancer, 28(2), 254-262.
  29. Peng, C., Xia, F., Naseriparsa, M., & Osborne, F. (2023). Knowledge graphs: Opportunities and challenges. Artificial intelligence review, 56(11), 13071-13102.
  30. Ra, M., Yoo, D., No, S., Shin, J., & Han, C. (2012). The mixed ontology building methodology using database information. Proceedings of the International MultiConference of Engineers and Computer Scientists,
  31. Radiology, A. C. o. (2003). Breast imaging reporting and data system. BI-RADS.
  32. Song, C., Long, J., Li, D., Qu, X., Lin, F., & Zhang, X. (2023). RoBGP: A Nested Biomedical Named Entity Recognition Method Based on RoBERTa and Global Pointer. 2023 26th ACIS International Winter Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD-Winter),
  33. Wang, H., Zu, Q., Lu, M., Chen, R., Yang, Z., Gao, Y., & Ding, J. (2022). Application of medical knowledge graphs in cardiology and cardiovascular medicine: a brief literature review. Advances in Therapy, 39(9), 4052-4060.
  34. Wong, S. S., Wilczynski, N. L., & Haynes, R. B. (2006). Developing optimal search strategies for detecting clinically sound treatment studies in EMBASE. Journal of the Medical Library Association, 94(1), 41.
  35. Xue, Q.-L., Wang, B.-X., Miao, K.-H., Li, X.-D., Yu, Y., & Li, Z. (2022). Construction of knowledge base of Chinese medicine manufacturing. Zhongguo Zhong yao za zhi= Zhongguo Zhongyao Zazhi= China Journal of Chinese Materia Medica, 47(12), 3402-3408.
  36. Yala, A., Lehman, C., Schuster, T., Portnoi, T., & Barzilay, R. (2019). A deep learning mammography-based model for improved breast cancer risk prediction. Radiology, 292(1), 60-66.
  37. Yang, Y., Lu, Y., & Yan, W. (2023). A comprehensive review on knowledge graphs for complex diseases. Briefings in Bioinformatics, 24(1), bbac543.

Regular Issue Subscription Original Research
Volume 15
Issue 01
Received 03/02/2026
Accepted 13/02/2026
Published 29/04/2026
Publication Time 85 Days


Login


My IP

PlumX Metrics