Faculty Journal Articles

Selecting third-party libraries: the data scientist’s perspective

Sarah Nadi, The American University in Cairo (AUC)
Nourhan Sakr, The American University in Cairo (AUC)Follow

Author's Department

Computer Science & Engineering Department

Find in your Library

https://doi.org/10.1007/s10664-022-10241-3

All Authors

Sarah Nadi, Nourhan Sakr

Document Type

Research Article

Publication Title

Empirical Software Engineering

Publication Date

Fall 12-7-2022

doi

10.1007/s10664-022-10241-3

Abstract

With the increased reliance on data-driven decisions and software services, data scientists are becoming an integral part of many software teams and enterprise operations. To perform their tasks, data scientists rely on various third-party libraries (e.g., pandas in Python for data wrangling or ggplot in R for data visualization). Selecting the right library to use is often a difficult task, with many factors influencing this selection. While there has been a lot of research on the factors that software developers take into account when selecting a library, it is not clear if these factors influence data scientists’ library selection in the same way, especially given several differences between both groups. To address this gap, we replicate a recent survey of library selection factors, but target data scientists instead of software developers. Our survey of 90 participants shows that data scientists consider several factors when selecting libraries to use, with technical factors such as the usability of the library, fit for purpose, and documentation being the three highest influencing factors. Additionally, we find that there are 11 factors that data scientists rate differently than software developers. For example, data scientists are influenced more by the collective experience of the community but less by the library’s security or license. We also uncover new factors that influence data scientists’ library selection, such as the statistical rigor of the library. We triangulate our survey results with feedback from five focus groups involving 18 additional data science experts with various roles, whose input allow us to further interpret our survey results. We discuss the implications of our findings for data science library maintainers as well as researchers who want to design recommender and/or comparison systems that help data scientists with library selection.

First Page

Last Page

Recommended Citation

Nadi, S., Sakr, N. Selecting third-party libraries: the data scientist’s perspective. Empir Software Eng 28, 15 (2023). https://doi.org/10.1007/s10664-022-10241-3

This document is currently not available here.

Find this item online

COinS

Faculty Journal Articles

Selecting third-party libraries: the data scientist’s perspective

Author's Department

Find in your Library

All Authors

Document Type

Publication Title

Publication Date

doi

Abstract

First Page

Last Page

Recommended Citation

Search

Browse

Submit

Faculty Journal Articles

Selecting third-party libraries: the data scientist’s perspective

Authors

Author's Department

Find in your Library

All Authors

Document Type

Publication Title

Publication Date

doi

Abstract

First Page

Last Page

Recommended Citation

Share

Search

Browse

Submit