Database description
Bayes-Swarm is a research project that aims to design and build an engine to
extract information from internet sources (news portals, newspapers, news
agencies and TV websites).
So far once a day it visits 30 sources that means around 70 web pages, mainly homepages and economical and political pages (the full list is reported in appendix A. Every page passes through a working process whose main steps are:
- formatting tags and punctuation removal,
- conjunctions and articles removal,
- word roots extraction.
Subsequently the number of appearances of every word (word occurrences) is saved and stored in a database. Any word that happens to occur more than five times on a single day automatically enters the database. Thus the database yields the number of appearance of a growing set of words on the sources we consider. From these visibility time series trend graphs and correlations can be computed. Trends can then be linked to specific events. Bayes-Swarm saves and stores in its database all the entire monitored web pages that are freely available on-line at www.bayes-swarm.com.