Federica Bologna

A portrait of me at the Cornell Botanic Gardens

I’m a PhD candidate in Information Science at Cornell University, where I’m advised by Matthew Wilkens. My research focuses on developing and evaluating natural language processing tools and large language models with a specific focus on clinical and biomedical purposes.

Here are a few research directions I have worked on and I continue being passionate about:

Advancing clinical and scientific question answering systems: with Lucy Lu Wang and Yue Guo I created the LongQAEval framework for the evaluation of large language model (LLM) long-form outputs with limited resources. In a randomized study, I found that annotating only a few sentences can provide results comparable to answer-level annotations. Furthermore, when prompted with our framework, LLM-as-judge reaches agreement with experts equivalent to the agreement among experts. At the Allen Institute for Artificial Intelligence (Ai2) under the guidance of Jay DeYoung I’ve built a system that suggests reformulated queries to Asta users to assist them with refining their queries and retrieving the desired information. Our system increases recall of relevant scientific documents from 40% to 60%.
Analyzing user needs at a large-scale: by fine-tuning DistilBERT models I analyzed user needs and support strategies in endometriosis online communities finding that patients need easier access to appointments; I then expended this work by analyzing patients’ perceptions of ablation and excision surgery with few-shot learning. With Ian Lundberg and Matthew Wilkens I have designed a randomized survey experiment with 3,000 participants to measure the causal effect of character gender on reader preferences. I found that character gender has a minimal effect on readers’ preferences, contradicting a long standing belief in the publishing industry that men and boys are only interested in reading about people of the same gender identity.
Refining clinical decision support tools: I have worked with NYC Health + Hospitals to better integrate medical alerts within nurses workflows. Using data analysis and statistical testing, I demonstrated that revising alert criteria can reduce unnecessary alerts by up to 94%, while improving alert design can decrease overridden alerts by up to 64%.

I strive to ground my work through the theoretical frameworks of ethics of care and studying up. I enjoy using a combination of quantitative - NLP, causal inference, statistical analysis - and qualitative methods - surveys, annotations, interviews.

News

Sep 2025	My paper “Causal Effect of Character Gender on Readers’ Preferences” is accepted to CHR 2025!
Aug 2025	My paper “Stylometric Analysis of the Poems Attributed to an Unknown Male Author in Veronica Franco’s Terze Rime” is accepted for publication in Early Modern Women!
Aug 2025	My Research Intern position at the Allen Institute for AI is extended until December 2025
Jun 2025	My poster “Revising BPA triggers and inclusion criteria helps reduce nurses’ fatigue” got accepted to AMIA 2025
Feb 2025	Paper published in the Journal of Medical Internet Research!
Aug 2024	Paper published in The Journal of Minimally Invasive Gynecology!