BayesFor.eu

beta

Bayesian web spidering

Translations of this page?:

Menu

Projects

Personal pages

Trace: » sviluppo » simp-aic » norma » collabora_con_noi » home » un_volto_nuovo » sviluppo » esempi » ci_hanno_incubato » database
::

Database description

Bayes-Swarm is a research project that aims to design and build an engine to extract information from internet sources (news portals, newspapers, news agencies and TV websites).
So far once a day it visits 30 sources that means around 70 web pages, mainly homepages and economical and political pages (the full list is reported in appendix A. Every page passes through a working process whose main steps are:

  1. formatting tags and punctuation removal,
  2. conjunctions and articles removal,
  3. word roots extraction.

Subsequently the number of appearances of every word (word occurrences) is saved and stored in a database. Any word that happens to occur more than five times on a single day automatically enters the database. Thus the database yields the number of appearance of a growing set of words on the sources we consider. From these visibility time series trend graphs and correlations can be computed. Trends can then be linked to specific events. Bayes-Swarm saves and stores in its database all the entire monitored web pages that are freely available on-line at www.bayes-swarm.com.

Back to documentation
Use Swarm!

Back to top :: en/bayes-swarm/database.txt · Last modified: 2008/09/15 15:55 by paolo.brunori
Show pagesource Old revisions Recent changes Index