CAPSTONE PROJECTS

Identifying Errors in SRL using Weak Supervision

Identifying Errors in SRL using Weak Supervision

Student Team: Kit Lao, Alex Lue, Sam Shamsan

Project Mentor: Ishan Jindal & Frederick Reiss, IBM

In datasets collected from real word data, noise and mislabelings in the corpora are almost always inevitable, and are especially prominent in large datasets. Performance of learned models from these datasets relies heavily on correctly labelled data to produce significant results. This research project investigated the noise rate in large semantic role labelling datasets, EWT and OntoNotes. Using a novel weak supervision mechanism for noise detection called Confident Learning, the project proposed an end-to-end system to characterize, identify and correct the noisy labels in these datasets. The goal was to generate corrected versions of SRL datasets that showed performance improvement in state of the art SRL models trained on these datasets.