WebPageClassification

home | about us | contact us | terms of use


The service provides online classification of website content. Website classification (or categorization) is required for a number of purposes. Some of them are:
  • Search and Advertising. Humanly created website content not necessarily meets its original purposes in the eyes of search engines. In the process of indexing content miscategorization and topic drift are quite possible. This negatively affects search results exposure and makes advertising campaigns more complicated and expensive. Website content classification on the stage of development may help a lot.
  • Domain Registration and Hosting. Domain Name Registrars and Web Hosting Providers need statistics on their clients’ websites for different purposes of their business. For example to understand which website categories are most popular and profitable, which websites will be parked or placed for sale, etc. This helps predict what number of registered websites will be renewed or discontinued. This is also useful for other services packaged with domain registration and hosting.
  • Safe Search and Browsing. In some cases it is important to know in advance a threat level and a brief information about the website before it is entered. It can be detected by a classifier.
  • Performing Focused Crawling. For some crawling tasks it is required to automatically detect which category this website belongs to and index only those which match requirements.
  • Automated Link Submission Into Web Directories. Investigation of many directories showed that even when a submission is totally done manually the accuracy of human classification is far from ideal. Even DMOZ has a lot of misclassified websites or websites put to wrong directories. An automated classification performed by the automated directory such as logyourlink.com provides more than 90% accuracy in retrieved website categories, whereas accuracy of manual submission directories is usually lower than 80%. An automated classification is also of great help in the article or blog submission into article directories.
Website classification is a very challenging task. A simple approach performing a standard html parsing and treating results as a regular text unfortunately does not work in many cases. For example the page may have a limited textual content or may be an assembly of links to other websites or just an advertisement placeholder. Our application is focused on determining the main topic of a web page even in such cases when possible. The algorithms are based on a machine learning method in combination with feature extraction and statistical analysis.

Results show top level categories. When more details are requested the low level categories will be also provided. For some sites, for example ”news”, in addition to a general topic there may be a few additional topics covered and shown when possible.

Our tool provides classification for websites which content is written in English and other languages as well as for some multi-lingual websites. The number of languages is being expended. Currently we serve: English, German, Spanish, French, Italian, Dutch.

We provide also classification API which is available upon request.


WebPageClassification.com 2012. All Rights Reserved. Powered by SemanticPRO

Processing... Loading... Please wait