CAPSTONE PROJECTS

Information Extraction of Corporate Events from the Web

Information Extraction of Corporate Events from the Web

Student Team: Tianxiao Jiang, David Li, Liren Wu

Project Mentor: Yuval Marton & Swapnil Khedekar, Bloomberg

Publicly traded companies are required to report earnings and hold certain types of events. During these events stocks are most volatile and draw huge interest from analysts, investors, shareholders and journalists. Consequently, the collection of this information is valuable, but the problem in extracting this information is that companies announce their events on the web in various ways and formats. This project explored building a pipeline to extract key information, such as the event type, date, and time, from corporate event announcements on websites. This extracted data is then normalized so it can be easily used later. The project employed a variety of natural language processing techniques ranging from rule-based or regular expression based approaches to Transformer-based techniques such as BERT and GPT to extract events from websites, normalize the data, and classify the type of these events.