Capstone Projects

UCSC’s NLP Master’s students build innovative industry solutions to real-world challenges

Natural Language Processing (NLP) is a rapidly growing field with applications in many of the technologies we are all accustomed to using every day, from virtual assistants and smart speakers to autocorrect functions.


Our Capstone project program matches industry mentors with small teams of NLP Master’s students for collaboration on cutting edge projects. Students are able to apply the skills they’ve acquired in the program to a real-world issue or challenge. They benefit from practical experience of working in a team with peers and from the insights of the academic or organizational mentor(s). Mentors benefit from being able to direct a small but advanced project that might be too risky to undertake in their organization. It also enables mentors to establish professional relationships with our students for future recruiting purposes. Mentors have come from a variety of organizations including: Adobe, Cisco, CDIAL, Google, Meta, Intel, Modelcode, Uniphore, and UCSC

We are open to all kinds of NLP projects, from industry, academic and non-profit/government organizations. Projects can include those that are specific to your organization.

Potential mentors will need to submit a project proposal form and if the proposal is selected, the mentors will guide a team of NLP Master’s students for approximately 15 weeks during the period of May to December.

At the end of the Capstone course sequence, students present their work in a public workshop which showcases the projects.

We are always delighted to discuss the details of the program with potential mentors. For more information contact Beth Ann Hockey, Academic Capstone Coordinator (bahockey@ucsc.edu).

To submit a proposal, please find the form below:
https://forms.gle/tkNosPH49WCZBd6F7

Current Capstone Projects

During the spring 2025 quarter, current students explored leading research on a variety of current NLP topics then joined Capstone teams for projects mentored by experts from Adobe, Uniphore, Samsung UCSC and CarbonBridge Check out this year’s project topics:

Agentic LLMs for Flexible Dialog Management
Mentors: Andreas Stolke, Alessandro Di Bari, Neha Gupta, Uniphore
Students: Ishika Kulkarni Adam Zernik Arkajyoti Chakraborty
Mudit Arora Hugo Lin

Abstract
A majority of enterprise dialog systems today
are built upon finite-state dialogue manage-
ment architectures, which enforce rigid, rule-
based call flows. These systems often strug-
gle to handle evolving user goals or recover
gracefully from unexpected conversational in-
puts. In contrast, human agents excel at main-
taining context, reasoning about what infor-
mation is needed, and choosing whether to
ask questions or consult external tools. This
project explores the design of agentic large
language models (LLMs) that emulate these
human capabilities. We propose a dialog agent
that integrates multi-turn reasoning, tool use,
and adaptive decision-making to dynamically
manage conversations. Inspired by frame-
works like ReAct, ReSpAct, and Pre-Act, our
agent determines whether to prompt users or
take autonomous actions based on situational
context. To support development, we gener-
ate a synthetic dataset using dual-LLM simu-
lations, representing realistic, annotated dialog
traces with intermediate reasoning steps. Our
system is designed to generalize across task-
oriented domains, supporting both conversa-
tional and non-conversational workflows. This
work contributes toward more flexible, user-
aware AI agents that improve automation qual-
ity and reduce friction in real-world human-
computer interactions.

No RAGrets: A Modular NLP System for Smart Literature Discovery and
Research Paper Vetting

Mentors: Manu Pillai, Sophia Xu, CarbonBridge
Student: Soren Larsen Yifei Gan Camellia Bazargan

Abstract
As the volume of academic publications con-
tinues to grow exponentially, researchers face
increasing difficulty in both discovering rele-
vant literature and evaluating its reliability. Our
proposed project, No RAGrets, will introduce a
modular NLP system designed to assist with lit-
erature discovery, information extraction, and
early-stage paper vetting. The system will be
aimed at helping users identify key research
claims, assess citation quality, and surface po-
tential red flags such as methodological incon-
sistencies or weak references.
We will approach this challenge by combining
components for information extraction, seman-
tic classification, and contextual retrieval. Addi-
tionally, we explore the integration of retrieval-
augmented generation (RAG), large language
models (LLMs), and agent-based architectures
to support flexible, interpretable research work-
flows. Our goal is to build a system that not
only improves the precision of literature search,
but also enables more critical and structured
engagement with scientific texts.

LLM Co-Writing
Mentor: Pranav Anand, UCSC Linguistics
Students: Yousuf Golding | Kiara LaRocca | Ting-Yu Chou

Abstract
Our primary goal is to develop an LLM-
based application that supports students during
the writing process. Drawing on existing tools
like Grammarly and WordTune as inspiration,
we envision a system that offers real-time sug-
gestions or conducts a final review after writing
is complete. However, unlike these tools, our
application is designed to preserve cognitive
engagement by prompting rather than correct-
ing. For instance, when grammatical or fluency
issues are detected, the LLM would avoid imme-
diately supplying the fix. Instead, it would ini-
tiate a guided sequence: first asking the student
whether they believe any changes are needed,
then prompting them to identify and explain
those changes. If the student is on the right
track, the LLM could assist in completing the
revision. If not, the system would scaffold the
process further—helping the student recognize
the error, understand the reasoning behind the
correction, and apply the fix themselves. This
approach aims to maintain the learning bene-
fits of writing while still leveraging the LLM’s
strengths.

Multilingual NL2SQL Synthetic Dataset Creation
Mentors: Jeremy Shi, Aditya Bansal, Adobe
Shannon Rumsey, Darian Lee, Shriya Sravani Yellapragada, Valentina Tang, Jack St Clair

Abstract
International enterprises typically store data in
a relational database that is accessible via a
dialect of SQL. The barrier-to-entry for deriv-
ing insights from these databases is a need for
both English and SQL fluency. As a means
to simplify data access, we seek to research
text-to-SQL that converts multilingual natural
language questions into SQL queries. While
there have been efforts to increase the presence
of multilingual datasets, traditional methods of
human-translation can be costly and imprac-
tical. Existing synthetic data generation tech-
niques in this realm are limited and restricted
only to English, with filtering processes that
may reduce the difficulty of the SQL queries.
There have also been successful strides towards
grounding text-to-SQL LLMs through the injec-
tion of contextual information, however, these
models still struggle with providing sensible,
complex, and diverse outputs. In addition, ex-
isting datasets frequently overlook cultural and
practical usage norms. For instance, they may
assume that column names follow English con-
ventions without justification, or that users in-
teract with applications using native scripts,
even in contexts where Romanized versions
are more common. This mismatch can lead
to unrealistic training data. To address these
challenges, we propose a method for multilin-
gual synthetic data creation and use it to cre-
ate a novel multilingual dataset for NL (natu-
ral language) to SQL tasks, expanded to Ara-
bic, Chinese, French, German, Hindi, Japanese,
Malay, Portuguese, Russian, Spanish, Turkish,
and Vietnamese. Prior research frequently com-
bined machine translation with human evalua-
tion. With the increasing fluency of LLMs, we
see the opportunity to fully automate the trans-
lation process in cases where we are unable to
utilize human-constructed translations. To our
knowledge, this would be the largest multilin-
gual text-to-SQL dataset and the first scalable
synthetic data generation pipeline for this task.

WebAgents
Mentor: Yilin Shen, Samsung
Student Team: Karthik Raja Anandan, Yuchia Chang, Judith Clymo, Shubham Gaur,
Siddharth Suresh

Abstract
The Samsung GUI Agent project aims to deliver (1) a browser-extension capture engine to record
high-fidelity user actions (clicks, scrolls, inputs, DOM snapshots) and state context (screenshots, ac-
cessibility trees); (2) a high quality dataset including synthetic (MiniWoB++, ALFWorld), realistic
(WebArena, REAL, WebGames), and enterprise (WorkArena, WorkArena++) benchmarks; and
(3) LLM-based agents fine-tuned and evaluated in BrowserGym to exceed state-of-the-art perfor-
mance across these benchmarks. Success metrics include end-task success rate, time to completion,
accuracy of individual action decisions, and robustness under dynamic web changes.

Investigating the Feasibility and
Capabilities of GUI Agents for Enterprise

Mentor: Guang-Jie Ren, Adobe
Student Team: Cal Blanco, Gavin DSouza, Jou-Yi Lee,
Chelsey Rush, Sam Silver

Abstract
As large language models (LLMs) and vision-
language models (VLMs) continue to advance,
there is growing interest in developing GUI
agents capable of interacting with graphical
user interfaces to automate complex tasks
across consumer and enterprise applications.
While significant progress has been made in
consumer-facing applications, enterprise de-
ployment of GUI agents presents unique chal-
lenges including legacy system integration,
stringent security requirements, compliance
considerations, and the need for robust er-
ror handling in mission-critical environments.
This proposal highlights the evolution of LLM
agents, provides a short survey of related work,
and proposes research directions for our cap-
stone project. Our work aims to bridge the gap
between academic research and practical en-
terprise deployment by developing an agentic
system specifically designed for Adobe’s enter-
prise GUI automation challenges, focused on
the Adobe Express application.

Capstone Projects 2024

Summarization for Long Document Input using Multiple LLMs

Mentors: Hanieh Deliamsalehy, Franck Dernoncourt, Ryan Rossi, Adobe
Student Team: Cheng-Tse (Alex) Liu, Ethan Liu, June Kim, Michael Fang, Nikhil Singh, Yash Bhedaru
[Poster] [Report]

GPS4LLM: Graph-based Planning System for Large Language Models

Mentors: Namyong Park, Yu Wang, Meta AI
Student Team: Anish Pahilajani, Devasha Trivedi, Jincen Shuai, Khin Yone, Neng Wan, Samyak Rajesh Jain
[Poster] [Report]

Personalized Graph-based Retrieval for Large Language Models

Mentor: Nesreen Ahmed, Intel
Student Team: Cameron Dimacali, Ojasmitha Pedirappagari, Steven Au
[Poster] [Report]

GUT-Bench: Maintainability and Performance

Mentors: Antoine Raux, John Daniswara, ModelCode
Student Team: Decker Krogh, Ethan Sin, Esha Ubale, Sreekar Molakalapalli, Sujit Noronha
[Poster] [Report]

Capstone projects 2023

As part of the NLP program public poster session, NLP Master’s students presented their Capstone work developed in collaboration with mentors from CDIAL, Google, Adobe, Meta, Cisco and UCSC. The projects addressed a range of real-world NLP challenges including hallucination detection, multi-modal question and approaches for low resource languages.

Capstone projects 2022

NLP students collaborated with industry mentors from IBM, Interactions, LinkedIn, and Google to develop and implement a variety of Capstone projects to address real-world NLP challenges. The workshop also featured a keynote address about the future of NLP from Professor Ian Lane as well as the annual NLP Industry Panel where leading scientists shared their insights on career opportunities in NLP.

Capstone projects 2021

NLP students showcased their projects at the inaugural NLP Capstone Workshop in August 2021 to an audience made up of faculty members, Industry Advisory Board and invited guests from industry. Each team has half an hour to present their work and take questions from attendees.

Dots pattern

The NLP Capstone experience offers a great opportunity for students to extend their networks and put themselves in front of potential employers.

Last modified: Jul 31, 2025