Date: Monday, February 28, 2011, 11:00 am - 12 noon, SB 213
Title: Ranking in Context: Search and Exploration of Large Semantically-Rich Datasets
Speaker: Julia Stoyanovich
The focus of my research is on modeling and exploring large complex datasets, in presence of rich semantic and statistical structure. In this talk, I will start with an overview of my main research directions, and will then present two recent lines of work that highlight two different technical approaches, and address two important applications — personalized search and ranking in social content sites, and rank-aware exploration of structured datasets.
In the first part of the talk I will focus on information discovery on the Social Web. Social Web users build persistent on-line personas: they provide information about themselves in stored profiles, register their relationships with other users, and express their preferences with respect to information and products. I will argue that information discovery should account for a user's social context, and will present network-aware search — a novel search paradigm in which result relevance is computed with respect to a user's social network. I will describe efficient top-K processing algorithms appropriate for this setting, and will propose indexing structures in support of these algorithms. I will also show how social similarities between users may be leveraged to achieve better query processing times and lower space overhead.
In the second part of the talk I will consider information discovery in large structured datasets that arise in a variety of application domains such as real estate, shopping and dating. In these applications, users specify their preferences with structured queries that often return thousands of matches. Users also specify a set of ranking attributes, customizing the order in which matches are returned. I will observe that statistical relationships often hold between attribute values and the ranking function, and that ranking alone may not be able to support an adequate data exploration experience. I will propose to view the ranking function in the context of the data distribution, and will formalized rank-aware clustering, a novel data exploration paradigm that identifies combinations of attributes, and value ranges, that together determine the ranking outcome for a set of items. I will present BARAC, an efficient rank-aware clustering algorithm that runs in interactive time. Finally, I will present results of a large-scale user study that demonstrate the effectiveness of rank-aware clustering.
Julia Stoyanovich is a Postdoctoral Researcher and a Computing Innovations Fellow at the University of Pennsylvania. Julia holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics and Statistics from the University of Massachusetts at Amherst. After receiving her B.S. Julia went on to work for two start-ups and one real company in New York City, where she interacted, and was puzzled by, a variety of massive datasets.
Julia's research focuses on modeling and exploring large datasets in presence of rich semantic and statistical structure. She has recently worked on personalized search and ranking in social content sites, rank-aware clustering in large structured datasets that focus on dating and restaurant reviews, data exploration in repositories of biological objects as diverse as scientific publications, functional genomics experiments and scientific workflows, and representation and inference in large datasets with missing values.
Additional Material: Talk slides