Identifying Errors in SRL using Weak Supervision

August 27, 2021

Student Team: Kit Lao, Alex Lue, Sam Shamsan

Project Mentor: Ishan Jindal & Frederick Reiss, IBM

In datasets collected from real word data, noise and mislabelings in the corpora are almost always inevitable, and are especially prominent in large datasets. Performance of learned models from these datasets relies heavily on correctly labelled data to produce significant results. This research project investigated the noise rate in large semantic role labelling datasets, EWT and OntoNotes. Using a novel weak supervision mechanism for noise detection called Confident Learning, the project proposed an end-to-end system to characterize, identify and correct the noisy labels in these datasets. The goal was to generate corrected versions of SRL datasets that showed performance improvement in state of the art SRL models trained on these datasets.

Identifying Errors in SRL using Weak Supervision (PDF)

Capstone Projects

2021