You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Howie Wang <ho...@hotmail.com> on 2005/12/13 19:35:20 UTC

OT: Alexa Web Search Platform

Sorry if this is off-topic, but I was wondering if anyone saw this:

http://www.wired.com/news/technology/0,1282,69817,00.html?tw=wn_tophead_2
http://websearch.alexa.com/welcome.html

It seems like a pretty compelling solution for most specialized search
engines for a very reasonable price. It's a clever way for a big
search engine to capitalize on the growing number of niche search
engines. I guess they figured if you can't beat 'em, join 'em.

The only thing is that I'm not sure how flexible it is. If you do lots
of custom semantic analysis, I'm not sure you can with AWPS. Maybe
it's possible to just download all the data you want, and then do
all your processing offline. Maybe in this case, you would still end
up using Nutch/Lucene to do your custom indexing?

Howie



Re: [Nutch-general] OT: Alexa Web Search Platform

Posted by RJ <ry...@sympatico.ca>.
  So far I don't know enough of the details but, if I can get their data at
$1 per/gig and do whatever I want to do with it, I'm very interested. Even
more interested if Nutch can be used as the front end.

   Why is Alexa using Google for web searchs? Why not use their own DBs and
do what you are suggesting? Something doesn't add up here and I'm not sure
what it is.




----- Original Message ----- 
From: "Howie Wang" <ho...@hotmail.com>
To: <nu...@lucene.apache.org>
Sent: Wednesday, December 14, 2005 12:27 AM
Subject: Re: [Nutch-general] OT: Alexa Web Search Platform


> I know it's only tangentially related to Nutch, but is no one
> else interested in this? I've read the APIs and read a couple
> of news stories about it, and it looks like you can download
> the crawled data (for a relatively small fee: $1/GB).
>
> This could be the thing that changes everything. The barrier
> to entry to this field was fairly low using Nutch, but building
> up a decent sized index takes time and a decent number of
> machines. Now you can buy the crawled data, and literally
> get a custom search engine running overnight.
>
> I'm guessing that many would choose not to host their
> front-end search on Alexa. In this case, Nutch/Lucene would
> come in very handy. Just cram the Alexa data into a Lucene
> index, and use Nutch as the front-end. Instant search engine...
>
> Howie
>
> >It doesn't sound like they are offering the data itself, only access to
> >it, CPU cycles used for accessing it, upload of your own data, and
> >such.
> >
> >In other words, it doesn't sound like you can just download a chunk of
> >data and do your own processing with it.  That would be one mighty
> >chunk! :)
> >
> >Otis
>
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.371 / Virus Database: 267.13.13/198 - Release Date:
12/12/2005
>
>



-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.371 / Virus Database: 267.13.13/198 - Release Date: 12/12/2005


Re: [Nutch-general] OT: Alexa Web Search Platform

Posted by Howie Wang <ho...@hotmail.com>.
I know it's only tangentially related to Nutch, but is no one
else interested in this? I've read the APIs and read a couple
of news stories about it, and it looks like you can download
the crawled data (for a relatively small fee: $1/GB).

This could be the thing that changes everything. The barrier
to entry to this field was fairly low using Nutch, but building
up a decent sized index takes time and a decent number of
machines. Now you can buy the crawled data, and literally
get a custom search engine running overnight.

I'm guessing that many would choose not to host their
front-end search on Alexa. In this case, Nutch/Lucene would
come in very handy. Just cram the Alexa data into a Lucene
index, and use Nutch as the front-end. Instant search engine...

Howie

>It doesn't sound like they are offering the data itself, only access to
>it, CPU cycles used for accessing it, upload of your own data, and
>such.
>
>In other words, it doesn't sound like you can just download a chunk of
>data and do your own processing with it.  That would be one mighty
>chunk! :)
>
>Otis



Re: [Nutch-general] OT: Alexa Web Search Platform

Posted by og...@yahoo.com.
It doesn't sound like they are offering the data itself, only access to
it, CPU cycles used for accessing it, upload of your own data, and
such.

In other words, it doesn't sound like you can just download a chunk of
data and do your own processing with it.  That would be one mighty
chunk! :)

Otis


--- Howie Wang <ho...@hotmail.com> wrote:

> Sorry if this is off-topic, but I was wondering if anyone saw this:
> 
>
http://www.wired.com/news/technology/0,1282,69817,00.html?tw=wn_tophead_2
> http://websearch.alexa.com/welcome.html
> 
> It seems like a pretty compelling solution for most specialized
> search
> engines for a very reasonable price. It's a clever way for a big
> search engine to capitalize on the growing number of niche search
> engines. I guess they figured if you can't beat 'em, join 'em.
> 
> The only thing is that I'm not sure how flexible it is. If you do
> lots
> of custom semantic analysis, I'm not sure you can with AWPS. Maybe
> it's possible to just download all the data you want, and then do
> all your processing offline. Maybe in this case, you would still end
> up using Nutch/Lucene to do your custom indexing?
> 
> Howie
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through
> log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD
> SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>