Search Duke CS 


Abstract
First, we investigate many problems related to answering queries in the presence of uncertainty in data. Motivated by applications in sensor network, data cleaning, and data integration, there has been a growing interest in managing uncertain data. Uncertainty is typically captured using stochastic data models, and querying data requires either statistics about the probabilistic behavior of the underlying data, or data cleaning to reduce the uncertainty of the query answer. We present efficient data structures for answering rangemax queries on uncertain data. Given a collection of uncertain points in the ddimensional space, where each point is associated with a value, the goal is to preprocess the points to a data structure such that given a query rectangle R, compute the expected maximum or the most likely maximum of the values of the points that lie in R. Next, we study data cleaning problems. Given a function f over the uncertain objects of a database, we propose algorithms for choosing a subset of objects to clean with cost at most C to i) minimize the variance of f or ii) maximize the surprise of f, in average over all user queries.
The second theme is computing data summaries to answer queries quickly. We present algorithms to create a small representation Q of a much larger data set P such that for every user preference, there is always an object in Q whose preference score is not much worse than the score of the kth most preferred object in P. Our method can be used to answer efficiently topk queries without storing the full data set P.
We conclude by discussing current research and future research directions.