Abstract

As vast amounts of unstructured data are becoming available digitally, computer-based methods to extract relevant and meaningful information are needed. Named entity recognition (NER) is the task of identifying text spans that mention named entities, and to classify them into predefined categories. Despite the existence of numerous and well-versed NER methods, the bio-medical domain remains under-studied. The objective of this research is to identify an efficient technique for NER tasks from biomedical data. This is achieved by investigating using deep learning technologies namely pre-trained BERT [1] model and its variances SciBERT [2] and BioBERT [3]. Preprocessing the data before passing it for training influences model performance. There is also investigation with some preprocessing rules to monitor their effect on model performance. Our model outperforms vanilla BERT, and BioBERT where is Precision: 66.20%, Recall: 98.96%, F1: 79.33%.

School

School of Sciences and Engineering

Department

Computer Science & Engineering Department

Degree Name

MS in Computer Science

Graduation Date

Fall 2-15-2023

Submission Date

11-13-2022

First Advisor

Ahmed Rafea

Committee Member 1

Hossam Sharara

Committee Member 2

Nevin Darwish

Extent

48 p.

Document Type

Master's Thesis

Institutional Review Board (IRB) Approval

Not necessary for this item

Included in

Data Science Commons

Share

COinS