Visiting Speaker in Computational Stylistics
Date: Tuesday, January 26, 2010, 12:45 pm, SB 111
Title: Authorship Attribution in the Wild
Speaker: Moshe Koppel, Bar-Ilan University
In the vanilla version of the authorship attribution problem, we are asked to assign a long anonymous text to one of a small closed set of candidate authors. This is a straightforward text categorization problem and its solution is simple and well-understood. In the real world (especially in forensic settings), however, we are often faced with attribution problems in which the candidate set might be very large (thousands of candidates) and open (the real author might not be in the candidate set) and in which the training texts and anonymous text might be of limited length. We present a new method, based on randomized feature sets, that solves the real world problem with high precision.