Abstract

Sentiment analysis has recently become one of the growing areas of research related to text mining and natural language processing. The increasing availability of online resources and popularity of rich and fast resources for opinion sharing like news, online review sites and personal blogs, caused several parties such as customers, companies, and governments to start analyzing and exploring these opinions. The main task of sentiment classification is to classify a sentence (i.e. review, blog, comment, news, etc.) as holding an overall positive, negative or neutral sentiment. Most of the current studies related to this topic focus mainly on English texts with very limited resources available for other languages like Arabic, especially for the Egyptian dialect. In this research work, we would like to improve the performance measures of Egyptian dialect sentence-level sentiment analysis by proposing a hybrid approach which combines both the machine learning approach using support vector machines and the semantic orientation approach. Two methodologies were proposed, one for each approach, which were then joined, creating the hybrid proposed approach. The corpus used contains more than 20,000 Egyptian dialect tweets collected from Twitter, from which 4800 manually annotated tweets will be used (1600 positive tweets, 1600 negative tweets and 1600 neutral tweets). We performed several experiments to: 1) compare the results of each approach individually with regards to our case which is dealing with the Egyptian dialect before and after preprocessing; 2) compare the performance of merging both approaches together generating the hybrid approach against the performance of each approach separately; and 3) evaluate the effectiveness of considering negation on the performance of the hybrid approach. The results obtained show significant improvements in terms of the accuracy, precision, recall and F-measure, indicating that our proposed hybrid approach is effective in sentence-level sentiment classification. Also, the results are very promising which encourages continuing in this line of research.

Department

Computer Science & Engineering Department

Degree Name

MS in Computer Science

Date of Award

6-1-2013

Online Submission Date

May 2013

First Advisor

Rafea, Ahmed

Committee Member 1

El-Kassas, Sherif

Committee Member 2

Moustafa, Mohamed N.

Document Type

Thesis

Extent

112 p.

Library of Congress Subject Heading 1

tural language processing (Computer science)

Library of Congress Subject Heading 2

Computatiol linguistics.

Rights

The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy.

IRB

Not necessary for this item

Comments

I would like to thank my great supervisor, Dr. Ahmed Rafea, who always helped me, welcomed my questions and gave me a lot of recommendations and suggestions. I would not have reached this phase, if it were not for his permanent support, advice, and guidance. I would also like to express my thanks to my thesis committee members, Dr. Mohamed Mostafa, Dr. Sherif Kassas and Dr. Reem Bahgat for their support, guidance and helpful feedback. Moreover, I would like to recognize ITIDA for sponsoring this project entitled "Semantic Alysis and Opinion Mining for Arabic Web", and the Egyptian industrial company LINK-Development and its team for their help in developing a tool for collecting and annotating the tweets. Filly, I thank my beloved parents, and friends for their permanent support, appreciation and patience. I am also grateful for my fiancé, who has given me a lot of help and support during doing my research and while running the experiments. I would like to dedicate this thesis to them all.

Share

COinS