Data Science Community Hour (March 25th, 2021): Dr. Santana from Twitch and Natural Language Data Processing
From Austin Brockmeier
Industry guest Dr. Eder Santana from Twitch
Natural language processing (NLP) themed week.
“SAD: A Stress Annotated Dataset for Recognizing Everyday Stressors in SMS-like Conversational Systems [a CHI 2021 late-breaking work]” — Prof. Matthew Mauriello, UD CIS. There is limited infrastructure for providing stress management services to those in need. To address this problem, chatbots are viewed as a scalable solution. However, one limiting factor is having clear definitions and examples of daily stress on which to build models and methods for routing appropriate advice during conversations. In this talk, we will discuss recent work to develop a dataset of 6850 SMS- like sentences that can be used to classify SMS-like input using a scheme of 9 stressor categories derived from: stress management literature, live conversations from a prototype chatbot system, crowdsourcing, and targeted web scraping from an online repository. In addition to practical consideration around building the dataset, we'll touch on analysis of it that demonstrates its potential efficacy, look at how real-time events result in topic drift, and describe its implementation in a future SMS-based chatbot.
“GPT-3” — Arshiya Khan, CCRG, ECE, UD.
This talk is an introduction to GPT-3 (Generative Pre-trained Transformer). It was the highlight of 2020 NLP research domain. It is a machine translation model capable of aggressively resolving NLP pain point, i.e., context. The model has been successful in recognizing facts, remembering trivia questions, applying reasoning and logic and most surprisingly, reverse engineer code.