BayesFor.eu

beta

Bayesian web spidering

Translations of this page?:

Menu

Projects

Personal pages

Trace: » dev » blog » intro
::

Bayes-Swarm Project

Bayes-Swarm is a research project aiming at spidering web sources. Web sources' content are then organized in a huge database (the order of magnitude is hundred thusands occurences per day). Finally, the dataset is studied using standard statistical analysis and data mining tecniques.

The project name, swarm, is a suggestion about its core value: the ability to extract a global quantitative meaning from apprently meaningless elements (such as single words), just in the same way a swarm behaves in a coherent way even if the behaviour of the single insect is erratic.

The project started in September 2007. Bayes-Swarm already produced relevant results, an example is the pubblication on the Italian Partito Democratico primary election, in the article we have shown how vsibility in the web, in the month before the elections, was strictly correlated with electoral behaviour (Il Politico, N. 218, 2008). Moreover the project is sheding light on a number of relevant issues about how news are edited in the web and about how words occurences can be used to understand news, compare media behaviour, and analyze local differences between news web sites of different countries. Bayes-Swarm poses other interesting questions, such as: how news propagate in the web, how attention “decay” after a relevant event have taken place. Because Bayes-Swarm is able to “measure” news, it represents an opprtunity to anwer to these questions.

Time Machine is a web search engine that works together with bayes-swarm. Time Mascine tells you were the occurencies of a certain words were counted. Time Machine allows anyone to read pages that are no longer on-line and to learn about wat the web said in the past.

Bayes-Swarm is developed and programmed in Ruby, mysql, and R, three flexible and powerfull open languages.

Back to documentation
Use Swarm!

Back to top :: en/bayes-swarm/intro.txt · Last modified: 2008/09/15 15:54 by paolo.brunori
Show pagesource Old revisions Recent changes Index