This talk is aimed at upper-year undergraduate and graduate students. Sponsored by the Women in Mathematics Committee
Abstract: Traditionally, successful computational approaches to understanding human language have depended on manual encoding of large amounts of both linguistic and real-world knowledge. The resulting systems have therefore necessarily been limited in their scope, both in terms of what can be talked about (the semantics of the domain) and how it can be talked about (the grammatical knowledge). Recently, statistical techniques have revolutionized the automatic learning of grammatical knowledge from the vast amounts of text available on-line. The current challenge is how to extend these statistical approaches to induce rich semantic representations, in addition to the more surface-level syntactic information. In this talk, I describe how the statistical distributions of the syntactic usages of words can in fact reveal deeper semantic properties of the words. We use machine learning techniques that exploit this statistical connection, allowing us to automatically build richer representations of words that link syntax and semantics.