Query- vs. Crawling-Based Classification of Searchable Web Databases

this paper is to compare these two classification approaches, namely the query-based approach that we introduced in [8] and the crawling-based approach outlined above.  We present evidence that our query based approach works best in terms of both classification accuracy and efficiency.  In a nutshell, the crawling based approach can lead to unstable classification decisions, while requiring large amounts of data to be retrieved when classifying large databases.  The rest of this paper is organized as follows.  Section 2 provides a definition of database classification.  Then, Section 3 gives a brief overview of both our query-based algorithm for this task and a crawling-based algorithm.  Section 4 reports an experimental comparison of the query- and crawling-based approaches in terms of their accuracy and efficiency.  Finally, Section 5 concludes the paper.