Faculty Journal Articles

BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization

Ahmed Allam, The American University in Cairo

Funding Sponsor

Bureau of Near Eastern Affairs

Author's Department

Computer Science & Engineering Department

All Authors

Ahmed Allam

Document Type

Research Article

Publication Title

Proceedings of the Annual Meeting of the Association for Computational Linguistics

Publication Date

1-1-2024

Abstract

Large Language Models (LLMs) have become pivotal in advancing natural language processing, yet their potential to perpetuate biases poses significant concerns. This paper introduces a new framework employing Direct Preference Optimization (DPO) to mitigate gender, racial, and religious biases in LLM-generated English text. By developing a loss function that favors less biased over biased completions, our approach cultivates a preference for respectful and non-discriminatory language in LLMs. We also contribute a manually designed dataset for training LLMs to recognize and correct biases. This dataset encompasses a diverse range of prompts paired with both biased and unbiased completions. Implementing this approach on the Microsoft Phi-2 model, we demonstrate substantial reductions in biased outputs as our model outperforms the baseline model on almost all bias benchmarks. Our model also achieves better performance compared to other open-source models on most benchmarks. By reducing biases in the language generated by the model, our study marks a significant step towards developing more ethical and socially responsible LLMs. We publicly release BiasDPO dataset on HuggingFace.1

First Page

Last Page

Comments

Conference Paper. Record derived from SCOPUS.

Recommended Citation

APA Citation

Allam, A. (2024). BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 4, 71–79.

MLA Citation

Allam, Ahmed "BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization." Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 4, 2024, pp. 71–79.

This document is currently not available here.

COinS

Faculty Journal Articles

BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization

Funding Sponsor

Author's Department

All Authors

Document Type

Publication Title

Publication Date

Abstract

First Page

Last Page

Comments

Recommended Citation

APA Citation

MLA Citation

Search

Browse

Submit

Faculty Journal Articles

BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization

Authors

Funding Sponsor

Author's Department

All Authors

Document Type

Publication Title

Publication Date

Abstract

First Page

Last Page

Comments

Recommended Citation

APA Citation

MLA Citation

Share

Search

Browse

Submit