Hybrid AI for Arabic Sensitive Data Detection: Enhancing Privacy Compliance in Egypt

Third Author's Department

Computer Science & Engineering Department

Find in your Library

https://doi.org/10.18280/ijsse.150602

All Authors

Omar Elbarbary Mohamed Rasslan Alia El Bolock Caroline Sabty

Document Type

Research Article

Publication Title

International Journal of Safety and Security Engineering

Publication Date

6-1-2025

doi

10.18280/ijsse.150602

Abstract

We present a hybrid framework that combines BERT-based Named Entity Recognition with rule-based detectors for rigid identifiers (e.g., national IDs, IP/MAC addresses, phone numbers) and excludes these patterns from embedding-based classifiers on structured data. On unstructured Arabic text, our hybrid system achieves an F1 of 92%. In the structured setting, isolating formatted fields increases average F1 from 87% to 88%, with BiLSTM delivering the best performance. These results demonstrate that integrating deep contextual models with deterministic rules extends coverage of legally defined formats and outperforms single-strategy approaches. Future work will focus on developing a custom Arabic sensitive-entity corpus, validating on real datasets, and adding anonymization and encryption modules.

First Page

1103

Last Page

1109

Share

COinS