Date: Friday, March 26, 2010, 11:30 am - 12:30 pm, SB 111
Title: Cost-sensitive Information Acquisition for Prediction
Speaker: Mustafa Bilgic
Dept. Computer Science
University of Maryland College Park
Machine learning systems have been increasingly used in our day-to-day activities now. Just a few examples include handwritten character recognition systems, product recommendation systems, face detection features of cameras, speech recognition in hands-free devices, document ranking by search engines, fraudulent activity detection for credit card transactions, spam detection, and medical diagnosis. A critical component of a machine learning system is the "information" needed to develop and use the systems. The speech and handwritten digit recognition systems need be trained on a representative and diverse set of examples, product recommendation systems need be provided with example ratings, emails need to be tagged as spam nor not, laboratory experiments need be run for medical diagnosis, etc.
Even though the necessary information can be freely available in some cases, gathering the information is costly in most of the cases; the users are willing to rate only few items, tag only few emails, and train the speech recognition algorithm with only few examples. It is essential to gather the user and expert feedback for the right examples and not waste their effort. A system that requires a tremendous amount of user input and labeled data, is impracticable, while a system that provides an unacceptable rate of incorrect predictions is useless if not harmful. It is thus imperative to develop systems that can provide correct predictions with the least amount of information and feedback possible.
In this talk, I will present two techniques aimed at reducing the amount of information required to provide correct predictions. The techniques that I will present are based on decision theoretic analysis of value of information and predicting which examples the underlying model is most likely to be incorrect about. The techniques that I propose outperform several state-of-the art techniques on both real-world and synthetic datasets.
Biography: Mustafa Bilgic is a PhD candidate at the University of Maryland - College Park working with Dr. Lise Getoor. He received his MSc from the University of Maryland at College Park and his BSc from the University of Texas at Austin. His research interests include data mining, machine learning, probabilistic graphical models, statistical relational learning, active learning, social network analysis, and information visualization. His work on active inference won the ACM SIGKDD Best Student Paper Award in 2008.