Abstract
The identification and evaluation of risky contractual clauses remain a critical challenge in the construction industry during the tender phase. Such clauses can expose the contractors and other parties to conflicts and disputes, which will cause delays and cost overruns during project execution. Hence, the traditional contract review processes that are currently in practice and the time-consuming assessment methods that rely heavily on expert judgment and manual procedures cause inconsistency and lead to human error, particularly when multiple contracts must be reviewed under strict deadlines. To address these issues, this study introduces an automated data-driven framework and will be referred to as Contracts Assessment Tool (CAT) that leverages text mining techniques and machine learning algorithms to enhance the contract evaluation process. This CAT integrates contractual clauses collected from multiple projects with expert assessments and generates automated report in which each clause is classified according to its impact level, probability of occurrence, and similarity compared to a reference contract. To achieve this objective, two main paths were undertaken. First, a text extraction model was developed to accurately identify, extract, and compare contractual clauses. Second, data were collected from contracts and experts, then preprocessed and visualized to extract meaningful insights. Finally, clause risk probability and impact classification models were developed and validated using different machine learning techniques such as Random Forest, SVM, KNN, XGBoost, Naïve Bayes, and Logistic Regression. The results showed that the Logistic Regression achieved the best results with an accuracy of 0.740 and F1-score of 0.736 for the risk model, and an accuracy of 0.710 with F1-score of 0.707 for the probability model. However, the use of resampling techniques, particularly ADASYN approach enhanced the models' performance, with the SVM achieving an accuracy of 0.922 and F1-score of 0.921 for the risk model, and an accuracy of 0.928 with F1-score of 0.926 for the probability model. Finally, the CAT was tested on two contract documents from different projects, where it successfully identified, extracted, and evaluated clauses by assigning accurate classifications for both impact and probability of occurrence. These results demonstrate CAT’s capability to support contract engineers by accelerating the review process, reducing human error, and improving efficiency, showing the potential of having an automated and machine learning based tools to enhance contract evaluation and strengthen contract risk identification in the pre-award phase.
School
School of Sciences and Engineering
Department
Construction Engineering Department
Degree Name
MS in Construction Engineering
Graduation Date
Fall 3-3-2026
Submission Date
1-25-2026
First Advisor
May Haggag
Committee Member 1
Khaled Nassar
Committee Member 2
Engy Serag
Committee Member 3
Ossama Hosny
Extent
245 p.
Document Type
Master's Thesis
Institutional Review Board (IRB) Approval
Approval has been obtained for this item
Disclosure of AI Use
Code/algorithm generation and/or validation
Recommended Citation
APA Citation
Abouelwy, S. A.
(2026).A Text Analysis-Based Predictive Approach for Assessing Clause Risk: An Application in Construction Contracts [Master's Thesis, the American University in Cairo]. AUC Knowledge Fountain.
https://fount.aucegypt.edu/etds/2674
MLA Citation
Abouelwy, Seifeldin Ahmed. A Text Analysis-Based Predictive Approach for Assessing Clause Risk: An Application in Construction Contracts. 2026. American University in Cairo, Master's Thesis. AUC Knowledge Fountain.
https://fount.aucegypt.edu/etds/2674
