Abstract

The identification and evaluation of risky contractual clauses remain a critical challenge in the construction industry during the tender phase. Such clauses can expose the contractors and other parties to conflicts and disputes, which will cause delays and cost overruns during project execution. Hence, the traditional contract review processes that are currently in practice and the time-consuming assessment methods that rely heavily on expert judgment and manual procedures cause inconsistency and lead to human error, particularly when multiple contracts must be reviewed under strict deadlines. To address these issues, this study introduces an automated data-driven framework and will be referred to as Contracts Assessment Tool (CAT) that leverages text mining techniques and machine learning algorithms to enhance the contract evaluation process. This CAT integrates contractual clauses collected from multiple projects with expert assessments and generates automated report in which each clause is classified according to its impact level, probability of occurrence, and similarity compared to a reference contract. To achieve this objective, two main paths were undertaken. First, a text extraction model was developed to accurately identify, extract, and compare contractual clauses. Second, data were collected from contracts and experts, then preprocessed and visualized to extract meaningful insights. Finally, clause risk probability and impact classification models were developed and validated using different machine learning techniques such as Random Forest, SVM, KNN, XGBoost, Naïve Bayes, and Logistic Regression. The results showed that the Logistic Regression achieved the best results with an accuracy of 0.740 and F1-score of 0.736 for the risk model, and an accuracy of 0.710 with F1-score of 0.707 for the probability model. However, the use of resampling techniques, particularly ADASYN approach enhanced the models' performance, with the SVM achieving an accuracy of 0.922 and F1-score of 0.921 for the risk model, and an accuracy of 0.928 with F1-score of 0.926 for the probability model. Finally, the CAT was tested on two contract documents from different projects, where it successfully identified, extracted, and evaluated clauses by assigning accurate classifications for both impact and probability of occurrence. These results demonstrate CAT’s capability to support contract engineers by accelerating the review process, reducing human error, and improving efficiency, showing the potential of having an automated and machine learning based tools to enhance contract evaluation and strengthen contract risk identification in the pre-award phase.

School

School of Sciences and Engineering

Department

Construction Engineering Department

Degree Name

MS in Construction Engineering

Graduation Date

Fall 3-3-2026

Submission Date

1-25-2026

First Advisor

May Haggag

Committee Member 1

Khaled Nassar

Committee Member 2

Engy Serag

Committee Member 3

Ossama Hosny

Extent

245 p.

Document Type

Master's Thesis

Institutional Review Board (IRB) Approval

Approval has been obtained for this item

Disclosure of AI Use

Code/algorithm generation and/or validation

Available for download on Tuesday, January 25, 2028

Share

COinS