Cost estimating and forecasting are key components of the budgeting of any construction project. However, conventional approaches may result in uncertainty and usually do not utilize the organizational knowledge of past projects. Ideally, performing a cost estimation under the best predictions of the relevant future conditions, is the best approach of economic analysis to evaluate the optimum value for money. With the rise of Artificial Intelligence techniques, there are now several Machine Learning (ML) methods that are being studied to capitalize on accuracy gains with regards to cost estimating techniques to increase reliability during the tendering phase. This research aims at developing an expert system that determines the rough order of magnitude for budgeting with an expected range of 30%-35% leeway, to forecast the costs of consultancy services of future World Bank projects. The expert system utilizes advanced ML methods to be able to generate accurate forecasts based on a rigorous database of similar past projects. Accordingly, an expert system is created by utilizing data from contracts tendered by the World Bank. Through the World Bank’s Open Data Initiative and Access to Information Policy; administrative and analytical information can be accessed by the general public. Therefore, this source was chosen as it provides an ample dataset to help perform the study and further strengthening the accountability of the presented data. Another approach taken was further examining the consultancy services performed under the World Bank in order to investigate the applicability of ML methods for cost estimation. Before building the model, the dataset was first gathered, selected, and sorted from the online database through the Procurement section via the World Bank website. Over 16,000 projects globally of which 82,000 Consultancy Service contracts over the past 14 years were collected. The dataset was then examined to identify the influential factors that affect the cost of the services. These were then filtered and identified, ranging from: major sector, procurement method, environmental category, consultancy (procurement) type, country, and overall project budget. The cleaned dataset (upon removing outliers) was then used as inputs for over 60,000 consultancy contracts. 4 During the model building phase, multiple regression Machine Learning models, such as Artificial Neural Network (ANN), Lite Gradient Boosting (LGB), and Lasso LI Regularization, were implemented in order to determine the optimum model with the lowest MAE and a correlation coefficient (r) closest to 1. However, due to the lack of sufficient numerical features, the regression models would yield undesirable results. Another approach, regression by classification, which was based on mapping regression-based problems into classification, was examined and proved to yield better results and hence was implemented for this research. Furthermore, the ensemble method, a composite meta-algorithm model, was then applied in order to create a more robust classifier by combining several machine learning models into a single predictive model where each classifier votes and provides a final prediction label based on the majority of the voting. The classifiers that were included in the research were; Categorical Naive Bayes, K Nearest Neighbors, XGBoost, Random Forest, Gradient Boosting Classifier, AdaBoost Classifier, Bagging Classifier, Small Multi-layer Perceptron (Neural Network). Upon establishing the regression by classification approach by using the ensemble method, the final vote is computed using "mode" operation, to select the most common output class among all models, resulting with a 40.62% success rate and 72.41% acceptance rate respectively. The model was then tested and verified by creating a second dataset of contracts that were not included in the earlier dataset. Accordingly, the success rate and acceptance rates were then measured to be 36% and 72% respectively. Furthermore, the interface was then validated by fixing one feature in the interface and changing the other features to record any changes in the contract cost. Consequently, by testing and validating each of the variables in each feature, the adaptability of each feature was recorded as well as their success and acceptance rates. These rates were then compiled to reach the mean success rates and acceptance rates of 48% and 75% respectively.


Construction Engineering Department

Degree Name

MS in Construction Engineering

Graduation Date

Summer 6-15-2021

Submission Date


First Advisor

Samer Ezeldin

Committee Member 1

Ibrahim Abotaleb

Committee Member 2

Osama El Hosseiny


159 p.

Document Type

Master's Thesis

Institutional Review Board (IRB) Approval

Not necessary for this item