Volunia – Updates (for the last time…)

Some readers are asking me on Facebook and via email to post some updates about Volunia, the search engine i’ve already analyzed and blogged about.

So, let’s try to write something. But, before starting, I have to say this: the point is that… there are no updates. It’s so simple.

As you know, I’ve been monitoring the growth of the search index for a while, but i’ve stopped checking because it simply was not growing. This is the graph:

The only maybe-interesting news is that they have launched a new feedback system (basically, a Q&A CMS), probably due to the need of a more efficient system to manage users feedbacks (with the old “closed” system, i’d bet, they were receiving tons of duplicated feedbacks without any way to rank and clean them).

I’ve tried to speak about good and new features of Volunia, but right now, 2 months after its pre-launch, it looks more as a failure than a revolution: the online rumors about this search engine are now silent and the media interest in it has finished.

Being the interest in it is almost dead, I don’t think I will blog again about Volunia. Do you think this “revolutionary” search engine is still alive?

From my point of view, i’d just consider it as “dead” and focus on other emerging engines.

Giorgio

Volunia Capitolo 7823 – Qualche aggiornamento

Punto primo: il popolo di internet ha mostrato un fortissimo interesse per il mio articolo Volunia for Geeks, che è girato come mai prima su Twitter e sui social network. E’ stato il mio articolo più letto di sempre, e pare la parte più importante sia stata quella legata a WordPress.

Qualcuno l’ha compresa, altri, immancabilmente, l’hanno fraintesa, per cui sento la necessità quantomeno di spiegare. Sono abbastanza sicuro che sia il mio articolo che ha iniziato a far parlare di WordPress in Volunia, considerato che è il primo in ordine temporale.

Comunque: non era mia intenzione sostenere che Volunia fosse basato su WordPress (ma non l’ho sostenuto: il mio testo originale è lì, potete leggerlo), anche perchè è ovvio a tutti che WordPress venga usato solo dalla sezione News. Il punto era totalmente diverso: chiedevo come mai si lasciasse così aperta al pubblico una pagina che può potenzialmente creare problemi.

Problemi che vanno dal bruteforce sulla password di admin, alle potenziali vulnerabilità dell’applicativo, che grazie a quella pagina è facilmente identificabile. Ma non solo: dig dig dig e si scopre che si tratta degli stessi server a cui ci si connette per la chat a lato pagina. Un potenziale bug di WordPress potrebbe quindi creare effetti catastrofici.

Punto due: avevo notato che l’indice di Volunia era abbastanza statico, poi avevo aggiornato il post annunciando che il numero di risultati nelle ricerche di prova era tornato a crescere. Ma qui c’è il grafico aggiornato:

Risultato? Siamo nuovamente fermi. Voluniabot ha visitato ieri nel primo pomeriggio il mio blog, ma ancora non appare nei risultati di ricerca. Non vorrei che questo motore di ricerca soffra fin dalla nascita di problemi con cui Google ha combattuto (e che ha risolto).

94.32.111.22 - - [15/Feb/2012:15:21:27 +0100] "GET /2011/12/un-tema-come-si-deve-finalmente/ HTTP/1.0" 200 32022 "-" "Mozilla/5.0 (compatible; Voluniabot/1.0; spider@volunia.com)"

Terzo: noto che il team sta spingendo verso l’utilizzo del plugin di Volunia (disponibile per Explorer, Firefox e Chrome), probabilmente per ovviare ai vari problemi dati dall’iframe, quali siti non visualizzabili o semplicemente la chat laterale, che resta “fissata” alla prima pagina aperta durante la ricerca e non “segue” l’utente durante la navigazione nel sito web. Che stia cambiando la natura stessa di Volunia, da “portale” ad applicazione?

Quarto ed ultimo: è il più importante. Fin dal primo momento ho riportato via feedback tutti i problemi che ho notato, da problemi di visualizzazione, alla chat che non seguiva l’utente, fino alla schermata di login di WordPress aperta al mondo intero. Noto però che i miei feedback con screenshot inviati circa 7 giorni fa non sono stati ancora letti.

Questo è problemativo, perchè si presume che in una fase “beta ad accesso programmato” in cui si tiene d’occhio il numero di utenti, si dia l’accesso a nuove persone non solo in base al carico dei server ma anche al carico di lavoro per lo staff, in modo che questo possa seguirli tutti. Mi chiedo se 7 giorni per leggere un feedback siano normali, mi chiedo quanto sia utile perdere tempo a riportare tutto.

Giorgio

Is Volunia crawling the web?

Are Volunia bots crawling the web? The answer should obviously be… “yes”. But, actually, they don’t seem to be doing their work. Have a look:

This graph is showing the number of search results returned for some common keywords in the last 2 days. I’ve set my Search Language to “Italian and English” and the SafeSearch Filter to “Medium”. This is the data table:

The number of results returned for those keywords has slightly changed. How is this possible?

If you look more carefully at that table, you’ll notice that the number of results isn’t actually changing at all. It seems that my searches are run against 2 different versions of the indexes (look at the number of results for “snow”: it’s 1618056 on 10th February, then it becomes 1583862 in the 11th February morning and then back to 1618056 in the afternoon).

A search for “vada a bordo cazzo”, well-known De Falco‘s exclamation, will result in… no results if you use the double quotes, and some porn sites if you don’t use them. Wow! A little outdated index, isn’t it?

Is VoluniaBot taking a break?

UPDATE 20:45 13/02/2012: Yes, Voluniabot did woke up (in the last 10 hours). Look at this graph:

Volunia – A quick update

This i just a (very) quick update about my tests on Volunia: yesterday I focused on network activity and front end configuration, so that i’ve tried to focus today’s tests on search results and common gui errors.

The first thing i’ve noticed: searches are quickly improving. There are many more results, and Volunia’s team has reported that web indexing has been made quicker. In order to “measure” the index growth, i’ve decided to create a table and a graph where i’ll periodically report the number of results shown for some test words. I’ll speak about this in the next days. Well, don’t expect Google-level results in 2 days, but… It’s improving.

I’m a bit concerned about slowness. Searching for “I’m just trying to stress you” takes more that 8 seconds (and the sistem is not yet under heavy loads). But that’s not the worst point. Searching for those words, will lead to this results page:

Can you see that HTML encoding error?

Something similar happens if you search ” “Giorgio Bonfiglio” ” (yes, this time with the double quotes): Volunia will show only one result.

Now, let’s try to click on that “repeat the search with…” link. This is the result:

That’s terrible. It’s not just about an HTML encoding error: Volunia is showing results matching on… that error! Moreover, look at the third result: is Volunia really showing as third result a page where the only matching item is a first name?

Volunia for geeks

No, you’re not drunk. This is my first english post: i planned to start writing in english a while ago but have never done that, as my audience is mainly italian. But i’ve decided to try and see what happens.

I’ve already written about Volunia, before and after trying it out. I plan to go deeper in details in the next few days, but now I want to show you some “geeky” things i’ve noticed in this search engine. I’m a sys/net admin, you know, so i couldn’t avoid opening WireShark to check what Volunia was doing and sending trough my computer (yes, this is the first time i’m really concerned about my privacy).

The first point: altough all POST data (profile and so on) is sent trough HTTPS to secure.volunia.com, the chat system (both public and private) is using Jabber trough HTTP (chat.volunia.com).

Searches too are using GET trough HTTP. This could be a concern, so I hope a full-HTTPS version will be released in the next months (they’re years that Google is available over HTTPS).

Second point: do you see something strange in the following screenshots (click on it to see a larger version)?

Have you noticed that “wp-content”? It’s the default WordPress directory to store themes and uploads! That’s… strange. Looks like it’s only used for their news page (http://en.volunia.com/news/), but i’m not sure: why leave the WP login page open to the world? It’s just matter of some .htaccess lines.

Hope WordPress is up-to-date, at least.

I decided to check active connections using “netstat” and this way I noticed that… The Volunia team doesn’t know what PTR records are. All their IP are still using the default Tiscali reverse records. Not a real concern, right, but a proper reverse dns records use makes netadmin’s work of monitoring their networks simplier. You can check what your users are connected to with a click, configure your firewalls to always accept outgoing traffic for hosts whose PTR ends in *volunia.com, and so on.

Anyway, this forced me to further study DNS records. I discovered that, while the main Volunia website (www.volunia.com) is behind Level3’s CDN, other services aren’t.

Both chat.volunia.com and secure.volunia.com are located in Italy. Is this only for testing purposes or is the system set to be used by the public this way?

Latency could be another concern, as I said in my first post: Italy is not the best place to put servers that need to be reachable all over the world. But, let’s look at NS records:

Do you see that? Those are default NameCheap (or Enom)’s DNS servers. Why not use an in-house solution or a more professional service, like dyn.com or Route 53? Everything but not lowcost services, please: Route53 is only $6 per year!

Some other random notes:

Received: from [127.0.1.1] (pitps004.volunia.net [172.27.38.204])
	by pitsrv03.volunia.net (8.13.8/8.13.8) with ESMTP id q19F4dJj031611
	for <GiorgioBonfiglio>; Thu, 9 Feb 2012 16:04:39 +0100

Ouch. I tought they were planning to grow and become a little bigger than 172.16.0.0/20 :(

I’m also wondering how many Power Users have been given access to Volunia as of today. Marchiori spoke about 100.000 PU, but, Volunia’s statistics tell a different story. How many users are using Volunia if only 200 are in the homepage?

Finally, I STRONGLY hope this “.NET” is just a “fake” header to prevent sites from blocking their bot and that Volunia is not Windows-based. Windows is killing big websites.

94.32.111.29 - - [09/Feb/2012:11:59:02 +0100] "GET /volunia.txt HTTP/1.0" 200 33 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 ( .NET CLR 3.5.30729)"

I’ve reported to Volunia’s team everything the WordPress thing and asked to properly set PTR records.

Giorgio