I am a Data Scientist at Ruxton Advisors, LLC, a sub-contractor for the Social Security Administration (SSA). I work as part of a team that is currently engaged in predictive modeling projects in the following two areas.

Vocational Rehabilitation

I develop machine learning (ML) models for the SSA to predict the likelihood of re-employment for Title II and XVI (disability) beneficiaries. These models may also be used moving forward to inform and improve future re-employment outreach efforts.

Occupational Requirements

The United States Bureau of Labor Statistics (BLS) collects a multitude of data about the US workforce. One of the tools they use to do so is the Occupational Requirements Survey (ORS). The ORS is a comprehensive survey that gathers information regarding the various demands and conditions (requirements) of different occupations. Much of this data is publicly accessible, but it contains a significant amount missing information. Using ensemble ML methods, I am developing a method (and relevant software package) to impute this missing information.


As a graduate student at the University of Maryland, College Park, I worked under Michelle Mazurek in the Human Computer Interaction (HCI) space.

Comprehension of ML Fairness Metrics
Presented at MD4SG '20
Published at ICML '20
Published at AIES '20

Machine learning (ML) methods are increasingly used in day to day settings to make decisions for us, spanning the contexts of hiring, defendant sentencing, and more. One of the major unresolved questions in these applications is that of fairness, i.e. ensuring that the decisions made by these automated systems are not biased against any of the groups being considered. Numerous mathematical definitions of fairness exist, optimizing different ideas of what it means to be "fair," and are used differentially across ML algorithms. Understanding and perceptions of these fairness definitions by the numerous stakeholders in relevant decisions, however, remain largely unaddressed. This work takes steps to fill this gap, using cognitive/intervews and surveys. The purpose this research is two-fold:

  • Evaluate and understand lay comprehension of different fairness metrics
  • Interrogate expert perceptions of, and desires for, automated decision making systems

User Attitudes On DTC Genetic Testing
Published at IEEE EuroS&P '20

Commercial DNA testing has become increasingly popular. Companies like 23andMe and Ancestry.com partially sequence customers' DNA and return results about their heritage, susceptibility to genetic disease, and other personal information. This raises severe privacy concerns, as DNA contains a large collection of information about an individual and their close relatives. Users of these services may not be aware of the extent of the information they are revealing to a commercial entity. This work aims to evaluate user awareness of the extent of information they are revealing to these companies and the privacy risks associated with DNA testing with an open-ended interview study. Four main research questions are addressed:

  • What information do users believe is revealed by their genetic data?
  • What concerns, if any, do users have with respect to DTC genetic testing?
  • How do users’ concerns influence their decisions to participate in DTC genetic testing?
  • How do users believe their genetic information is used by DTC genetic testing companies?


Prior to graduate school, I worked for several years in the biomedical research space.

Most recently, I worked in immunogenomics with Brad Rosenberg, studying the effects of various immune perturbations on the human transcriptome using NextGen sequencing.

Previously, I worked in psychiatric research with Hao Yang Tan, studying cognitive deficits in schizophrenia (SZ) patients using functional neuroimaging (fMRI).

Debjani Saha


Data Scientist
MS, Computer Science
University of Maryland, College Park

Design courtesy of Vasilios Mavroudis: Plain Academic