Debjani Saha

Experience

Current

I am a Data Scientist at Ruxton Advisors LLC, where I work as a contractor for the Social Security Administration (SSA). I am part of a team that is currently engaged in predictive modeling projects in multiple areas.

Race & Ethnicity

As part of its program administration, the SSA collects race and ethnicity (R&E) information for all individuals it interacts with. Because this information must be given voluntarily, there are many cases where there is no R&E data for a given individual. Moreover, changes in the way that R&E data has been collected over the past several decades has led to inconsistencies within SSA's various databases. To address these issues, I develop ensemble machine learning (ML) models to try and impute this missing/inconsistent R&E information for beneficiaries of various SSA programs. These efforts may ultimately inform SSA entitlement program analyses and administration.

Occupational Requirements

The United States Bureau of Labor Statistics (BLS) collects a multitude of data about the US workforce. One of the tools they use to do so is the Occupational Requirements Survey (ORS). The ORS is a comprehensive survey that gathers information regarding the various demands and conditions (requirements) of different occupations. Much of this data is publicly accessible, but it contains a significant amount missing information. Using ensemble ML methods, we develop a method (and relevant software package) to impute this missing information. See here for work in progress.

Vocational Rehabilitation

I developed ensemble ML models for the SSA to predict the likelihood of re-employment for Title II and XVI (disability) beneficiaries. These models may be used moving forward to inform and improve future re-employment outreach efforts.

Graduate Research

As a graduate student at the University of Maryland, College Park, I worked under Michelle Mazurek in the Human Computer Interaction (HCI) space.

Comprehension of ML Fairness Metrics
Presented at MD4SG '20
Published at ICML '20
Published at AIES '20

Machine learning (ML) methods are increasingly used in day to day settings to make decisions for us, spanning the contexts of hiring, defendant sentencing, and more. One of the major unresolved questions in these applications is that of fairness, i.e. ensuring that the decisions made by these automated systems are not biased against any of the groups being considered. Numerous mathematical definitions of fairness exist, optimizing different ideas of what it means to be "fair," and are used differentially across ML algorithms. Understanding and perceptions of these fairness definitions by the numerous stakeholders in relevant decisions, however, remain largely unaddressed. This work takes steps to fill this gap, using cognitive/intervews and surveys. The purpose this research is two-fold:

Evaluate and understand lay comprehension of different fairness metrics
Interrogate expert perceptions of, and desires for, automated decision making systems

User Attitudes On DTC Genetic Testing
Published at IEEE EuroS&P '20

Commercial DNA testing has become increasingly popular. Companies like 23andMe and Ancestry.com partially sequence customers' DNA and return results about their heritage, susceptibility to genetic disease, and other personal information. This raises severe privacy concerns, as DNA contains a large collection of information about an individual and their close relatives. Users of these services may not be aware of the extent of the information they are revealing to a commercial entity. This work aims to evaluate user awareness of the extent of information they are revealing to these companies and the privacy risks associated with DNA testing with an open-ended interview study. Four main research questions are addressed:

What information do users believe is revealed by their genetic data?
What concerns, if any, do users have with respect to DTC genetic testing?
How do users’ concerns influence their decisions to participate in DTC genetic testing?
How do users believe their genetic information is used by DTC genetic testing companies?

Past

Prior to graduate school, I worked for several years in the biomedical research space.

Most recently, I worked in immunogenomics with Brad Rosenberg, studying the effects of various immune perturbations on the human transcriptome using NextGen sequencing.

Previously, I worked in psychiatric research with Hao Yang Tan, studying cognitive deficits in schizophrenia (SZ) patients using functional neuroimaging (fMRI).

Debjani Saha

dsaha@cs.umd.edu

Data Scientist
MS, Computer Science
University of Maryland, College Park