Query- vs. Crawling-Based Classification of Searchable Web Databases

Luis Gravano
Panagiotis Ipeirotis
Mehran Sahami

Venue: IEEE Data Engineering Bulletin, Vol. 25, No. 1, March 2002
Mar 2002
Status: Invited
Type: Journal

this paper is to compare these two classification approaches, namely the query-based approach that we introduced in [8] and the crawling-based approach outlined above. We present evidence that our query based approach works best in terms of both classification accuracy and efficiency. In a nutshell, the crawling based approach can lead to unstable classification decisions, while requiring large amounts of data to be retrieved when classifying large databases. The rest of this paper is organized as follows. Section 2 provides a definition of database classification. Then, Section 3 gives a brief overview of both our query-based algorithm for this task and a crawling-based algorithm. Section 4 reports an experimental comparison of the query- and crawling-based approaches in terms of their accuracy and efficiency. Finally, Section 5 concludes the paper.

QProber: Classifying and Searching “Hidden-Web” Text Databases

Panos Ipeirotis

Query- vs. Crawling-Based Classification of Searchable Web Databases

Panos Ipeirotis

Query- vs. Crawling-Based Classification of Searchable Web Databases

Related Files:

Related Projects: