Abstract
As vast amounts of unstructured data are becoming available digitally, computer-based methods to extract relevant and meaningful information are needed. Named entity recognition (NER) is the task of identifying text spans that mention named entities, and to classify them into predefined categories. Despite the existence of numerous and well-versed NER methods, the bio-medical domain remains under-studied. The objective of this research is to identify an efficient technique for NER tasks from biomedical data. This is achieved by investigating using deep learning technologies namely pre-trained BERT [1] model and its variances SciBERT [2] and BioBERT [3]. Preprocessing the data before passing it for training influences model performance. There is also investigation with some preprocessing rules to monitor their effect on model performance. Our model outperforms vanilla BERT, and BioBERT where is Precision: 66.20%, Recall: 98.96%, F1: 79.33%.
School
School of Sciences and Engineering
Department
Computer Science & Engineering Department
Degree Name
MS in Computer Science
Graduation Date
Fall 2-15-2023
Submission Date
11-13-2022
First Advisor
Ahmed Rafea
Committee Member 1
Hossam Sharara
Committee Member 2
Nevin Darwish
Extent
48 p.
Document Type
Master's Thesis
Institutional Review Board (IRB) Approval
Not necessary for this item
Recommended Citation
APA Citation
Guirguis, M.
(2023).Named Entity Recognition from Biomedical Text [Master's Thesis, the American University in Cairo]. AUC Knowledge Fountain.
https://fount.aucegypt.edu/etds/1983
MLA Citation
Guirguis, Maged. Named Entity Recognition from Biomedical Text. 2023. American University in Cairo, Master's Thesis. AUC Knowledge Fountain.
https://fount.aucegypt.edu/etds/1983