Abstract

In Computer Vision, the method of representing an image has a profound effect on the performance of a model. Traditionally speaking, an image is treated as a grid of pixels and can be processed via Convolution Neural Net- works (CNN). An image can also be treated as a sequence of patches. Vision Transformers and MLP-Mixers (Multi-Layer Perceptron Mixers) are two types of models that process an image as a sequence. A more generic representation than grids and sequences would be graphs. That is why Vision Graph Neural Network (ViG) construct a graph for an image and process the image as a graph of patches. However, graph construction is based on K-Nearest Neighbors (k-NN). Using k-NN to construct a graph could lead to missing important edges while enforcing other less important edges in order to satisfy the ”k” constraint on each node’s neighborhood. To overcome this challenge, we present two graph construction methodologies. The first is called Similarity Thresholded Graph Construction (STGC), while the other is called Learnable Reparameterized Graph Construction (LRGC). In STGC, an edge is picked if it has a normalized similarity score higher than a pre-defined threshold. In addition, to fight oversmoothing, we present a decreasing threshold framework. Using STGC, we show experimentally that our model outperforms the State Of The Art graph-based models on ImageNet image classification without introducing a computational overhead. For LRGC, which does not need any hyper-parameter tuning, similarity scores are replaced by learnable attention scores and the threshold for each layer becomes learnable. We prove that LRGC achieves a similar performance to the best hyper-parameter combination of STGC on Imagenette without the need for tuning hyper-parameters.

School

School of Sciences and Engineering

Department

Computer Science & Engineering Department

Degree Name

MS in Computer Science

Graduation Date

Winter 1-31-2025

Submission Date

12-31-2024

First Advisor

Hossam Sharara

Second Advisor

Ahmed Rafea

Committee Member 1

Seif ElDawlatly

Committee Member 2

Mahmoud Khalil

Extent

106 p.

Document Type

Master's Thesis

Institutional Review Board (IRB) Approval

Not necessary for this item

Recommended Citation

APA Citation

Elsharkawi, I. (2025).A Robust Framework for Graph Construction in Vision Graph Neural Networks [Master's Thesis, the American University in Cairo]. AUC Knowledge Fountain.
https://fount.aucegypt.edu/etds/2431

MLA Citation

Elsharkawi, Ismael. A Robust Framework for Graph Construction in Vision Graph Neural Networks. 2025. American University in Cairo, Master's Thesis. AUC Knowledge Fountain.
https://fount.aucegypt.edu/etds/2431

Theses and Dissertations

A Robust Framework for Graph Construction in Vision Graph Neural Networks

Abstract

School

Department

Degree Name

Graduation Date

Submission Date

First Advisor

Second Advisor

Committee Member 1

Committee Member 2

Extent

Document Type

Institutional Review Board (IRB) Approval

Recommended Citation

APA Citation

MLA Citation

Included in

Search

Browse

Submit

Theses and Dissertations

A Robust Framework for Graph Construction in Vision Graph Neural Networks

Author

Abstract

School

Department

Degree Name

Graduation Date

Submission Date

First Advisor

Second Advisor

Committee Member 1

Committee Member 2

Extent

Document Type

Institutional Review Board (IRB) Approval

Recommended Citation

APA Citation

MLA Citation

Included in

Share

Search

Browse

Submit