I’m a PhD candidate in Information Science at Cornell University, where I’m advised by Matthew Wilkens. My research focuses on developing and evaluating natural language processing tools and large language models with a specific focus on clinical and biomedical purposes.
Here are a few research directions I have worked on and I continue being passionate about:
-
Advancing clinical and scientific question answering systems: with Lucy Lu Wang and Yue Guo I created the LongQAEval framework for the evaluation of large language model (LLM) long-form outputs with limited resources. In a randomized study, I found that annotating only a few sentences can provide results comparable to answer-level annotations. Furthermore, when prompted with our framework, LLM-as-judge reaches agreement with experts equivalent to the agreement among experts. At the Allen Institute for Artificial Intelligence (Ai2) under the guidance of Jay DeYoung I’ve built a system that suggests reformulated queries to Asta users to assist them with refining their queries and retrieving the desired information. Our system increases recall of relevant scientific documents from 40% to 60%.
-
Analyzing user needs at a large-scale: by fine-tuning DistilBERT models I analyzed user needs and support strategies in endometriosis online communities finding that patients need easier access to appointments; I then expended this work by analyzing patients’ perceptions of ablation and excision surgery with few-shot learning. With Ian Lundberg and Matthew Wilkens I have designed a randomized survey experiment with 3,000 participants to measure the causal effect of character gender on reader preferences. I found that character gender has a minimal effect on readers’ preferences, contradicting a long standing belief in the publishing industry that men and boys are only interested in reading about people of the same gender identity.
-
Refining clinical decision support tools: I have worked with NYC Health + Hospitals to better integrate medical alerts within nurses workflows. Using data analysis and statistical testing, I demonstrated that revising alert criteria can reduce unnecessary alerts by up to 94%, while improving alert design can decrease overridden alerts by up to 64%.
I strive to ground my work through the theoretical frameworks of ethics of care and studying up. I enjoy using a combination of quantitative - NLP, causal inference, statistical analysis - and qualitative methods - surveys, annotations, interviews.
News
Sep 2025 | My paper “Causal Effect of Character Gender on Readers’ Preferences” is accepted to CHR 2025! |
Aug 2025 | My paper “Stylometric Analysis of the Poems Attributed to an Unknown Male Author in Veronica Franco’s Terze Rime” is accepted for publication in Early Modern Women! |
Aug 2025 | My Research Intern position at the Allen Institute for AI is extended until December 2025 |
Jun 2025 | My poster “Revising BPA triggers and inclusion criteria helps reduce nurses’ fatigue” got accepted to AMIA 2025 |
Feb 2025 | Paper published in the Journal of Medical Internet Research! |
Aug 2024 | Paper published in The Journal of Minimally Invasive Gynecology! |