Building Query Optimizers for Information Extraction: The SQoUT Project

Luis Gravano
Panagiotis Ipeirotis
Alpa Jain

Venue: SIGMOD Record, Special Issue on "Managing Information Extraction," Vol. 37, No. 4, December 2008
Dec 2008
Status: Invited
Type: Journal

Text documents often embed data that is structured in nature. This structured data is increasingly exposed using information extraction systems, which generate structured relations from documents, introducing an opportunity to process expressive, structured queries over text databases. This paper discusses our SQoUT1 project, which focuses on processing structured queries over relations extracted from text databases. We show how, in our extraction-based scenario, query processing can be decomposed into a sequence of basic steps: retrieving relevant text documents, extracting relations from the documents, and joining extracted relations for queries involving multiple relations. Each of these steps presents different alternatives and together they form a rich space of possible query execution strategies. We identify execution efficiency and output quality as the two critical properties of a query execution, and argue that an optimization approach needs to consider both properties. To this end, we take into account the user specified requirements for execution efficiency and output quality, and choose an execution strategy for each query based on a principled, cost-based comparison of the alternative execution strategies.

Panos Ipeirotis

Building Query Optimizers for Information Extraction: The SQoUT Project

Panos Ipeirotis

Building Query Optimizers for Information Extraction: The SQoUT Project

Related Files:

Panos Ipeirotis