Machine translation is a very important field in Natural Language Processing. The need for machine translation arises due to the increasing amount of data available online. Most of our data now is digital and this is expected to increase over time. Since human manual translation takes a lot of time and effort, machine translation is needed to cover all of the languages available. A lot of research has been done to make machine translation faster and more reliable between different language pairs. Machine translation is now being coupled with deep learning and neural networks. New topics in machine translation are being studied and tested like applying neural machine translation as a replacement to the classical statistical machine translation. In this thesis, we also study the effect of data-preprocessing and decoder type on translation output. We then demonstrate two ways to enhance translation from English to Arabic. The first approach uses a two-decoder system; the first decoder translates from English to Arabic and the second is a post-processing decoder that retranslates the first Arabic output to Arabic again to fix some of the translation errors. We then study the results of different kinds of decoders and their contributions to the test set. The results of this study lead to the second approach which combines different decoders to create a stronger one. The second approach uses a classifier to categorize the English sentences based on their structure. The output of the classifier is the decoder that is suited best to translate the English sentence. Both approaches increased the BLEU score albeit with different ranges. The classifier showed an increase of ~0.1 BLEU points while the post-processing decoder showed an increase of between ~0.3~11 BLEU points on two different test sets. Eventually we compare our results to Google translate to know how well we are doing in comparison to a well-known translator. Our best translation machine system scored 5 absolute points compared to Google translate in ISI corpus test set and we were 9 absolute points lower in the case of the UN corpus test set.
Computer Science & Engineering Department
MS in Computer Science
Committee Member 1
Committee Member 2
The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy.
Institutional Review Board (IRB) Approval
Not necessary for this item
(2017).Creating a strong statistical machine translation system by combining different decoders [Master's Thesis, the American University in Cairo]. AUC Knowledge Fountain.
ElMaghraby, Ayah. Creating a strong statistical machine translation system by combining different decoders. 2017. American University in Cairo, Master's Thesis. AUC Knowledge Fountain.