You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Bing Li <lb...@gmail.com> on 2010/11/19 16:52:49 UTC

How to Transmit and Append Indexes

Hi, all,

I am working on a distributed searching system. Now I have one server only.
It has to crawl pages from the Web, generate indexes locally and respond
users' queries. I think this is too busy for it to work smoothly.

I plan to use two servers at at least. The jobs to crawl pages and generate
indexes are done by one of them. After that, the new available indexes
should be transmitted to anther one which is responsible for responding
users' queries. From users' point of view, this system must be fast.
However, I don't know how I can get the additional indexes which I can
transmit. After transmission, how to append them to the old indexes? Does
the appending block searching?

Thanks so much for your help!

Bing Li

Re: How to Transmit and Append Indexes

Posted by Gora Mohanty <go...@mimirtech.com>.

On Sat, Nov 20, 2010 at 12:39 AM, Bing Li <lb...@gmail.com> wrote:
> Hi, Gora,
>
> No, I really wonder if Solr is based on Hadoop?

As far as I know, no it it isn't.

> Hadoop is efficient when using on search engines since it is suitable to the
> write-once-read-many model. After reading your emails, it looks like Solr's
> distributed file system does the same thing. Both of them are good for
> searching large indexes in a large scale distributed environment, right?
[...]

Are you talking about distributed Solr search, such as Solr on the Cloud:
http://wiki.apache.org/solr/SolrCloud ? Someone more familiar with Solr
can correct me if I am wrong, but I do not believe that this does a
map/reduce like Hadoop provides.

Unless I am even more confused than usual, Hadoop provides a distributed
file-system (HDFS), and a framework for doing map/reduce. This is a generic
framework and no built-in search capabilities are available. People have tried
to use Solr/Lucene on HDFS, but am not very sure as to whether anyone has
used map/reduce techniques on search, indexing, or other items with Solr/Lucene,
and Hadoop.

Regards,
Gora

Re: How to Transmit and Append Indexes

Posted by Bing Li <lb...@gmail.com>.

Hi, Gora,

No, I really wonder if Solr is based on Hadoop?

Hadoop is efficient when using on search engines since it is suitable to the
write-once-read-many model. After reading your emails, it looks like Solr's
distributed file system does the same thing. Both of them are good for
searching large indexes in a large scale distributed environment, right?

Thanks!
Bing

On Sat, Nov 20, 2010 at 3:01 AM, Gora Mohanty <go...@mimirtech.com> wrote:

> On Sat, Nov 20, 2010 at 12:05 AM, Bing Li <lb...@gmail.com> wrote:
> > Dear Erick,
> >
> > Thanks so much for your help! I am new in Solr. So I have no idea about
> the
> > version.
>
> The solr/admin/registry.jsp URL on your local Solr installation should show
> you the version at the top.
>
> > But I wonder what are the differences between Solr and Hadoop? It seems
> that
> > Solr has done the same as what Hadoop promises.
> [...]
>
> Er, what? Solr and Hadoop are entirely different applications. Did you
> mean Lucene or Nutch, instead of Hadoop?
>
> Regards,
> Gora
>

Re: How to Transmit and Append Indexes

Posted by Gora Mohanty <go...@mimirtech.com>.

On Sat, Nov 20, 2010 at 12:05 AM, Bing Li <lb...@gmail.com> wrote:
> Dear Erick,
>
> Thanks so much for your help! I am new in Solr. So I have no idea about the
> version.

The solr/admin/registry.jsp URL on your local Solr installation should show
you the version at the top.

> But I wonder what are the differences between Solr and Hadoop? It seems that
> Solr has done the same as what Hadoop promises.
[...]

Er, what? Solr and Hadoop are entirely different applications. Did you
mean Lucene or Nutch, instead of Hadoop?

Regards,
Gora

Re: How to Transmit and Append Indexes

Posted by Bing Li <lb...@gmail.com>.

Dear Erick,

Thanks so much for your help! I am new in Solr. So I have no idea about the
version.

But I wonder what are the differences between Solr and Hadoop? It seems that
Solr has done the same as what Hadoop promises.

Best,
Bing

On Sat, Nov 20, 2010 at 2:28 AM, Erick Erickson <er...@gmail.com>wrote:

> You haven't said what version of Solr you're using, but you're
> asking about replication, which is built-in.
> See: http://wiki.apache.org/solr/SolrReplication
>
> And no, your slave doesn't block while the update is happening,
> and it automatically switches to the updated index upon
> successful replication.
>
> Older versions of Solr used rsynch & etc.
>
> Best
> Erick
>
> On Fri, Nov 19, 2010 at 10:52 AM, Bing Li <lb...@gmail.com> wrote:
>
>> Hi, all,
>>
>> I am working on a distributed searching system. Now I have one server
>> only.
>> It has to crawl pages from the Web, generate indexes locally and respond
>> users' queries. I think this is too busy for it to work smoothly.
>>
>> I plan to use two servers at at least. The jobs to crawl pages and
>> generate
>> indexes are done by one of them. After that, the new available indexes
>> should be transmitted to anther one which is responsible for responding
>> users' queries. From users' point of view, this system must be fast.
>> However, I don't know how I can get the additional indexes which I can
>> transmit. After transmission, how to append them to the old indexes? Does
>> the appending block searching?
>>
>> Thanks so much for your help!
>>
>> Bing Li
>>
>
>

Re: How to Transmit and Append Indexes

Posted by Erick Erickson <er...@gmail.com>.

You haven't said what version of Solr you're using, but you're
asking about replication, which is built-in.
See: http://wiki.apache.org/solr/SolrReplication

And no, your slave doesn't block while the update is happening,
and it automatically switches to the updated index upon
successful replication.

Older versions of Solr used rsynch & etc.

Best
Erick

On Fri, Nov 19, 2010 at 10:52 AM, Bing Li <lb...@gmail.com> wrote:

> Hi, all,
>
> I am working on a distributed searching system. Now I have one server only.
> It has to crawl pages from the Web, generate indexes locally and respond
> users' queries. I think this is too busy for it to work smoothly.
>
> I plan to use two servers at at least. The jobs to crawl pages and generate
> indexes are done by one of them. After that, the new available indexes
> should be transmitted to anther one which is responsible for responding
> users' queries. From users' point of view, this system must be fast.
> However, I don't know how I can get the additional indexes which I can
> transmit. After transmission, how to append them to the old indexes? Does
> the appending block searching?
>
> Thanks so much for your help!
>
> Bing Li
>

Re: How to Transmit and Append Indexes

Posted by Renaud Delbru <re...@deri.org>.

  Have you looked at Apache Nutch [1]. It is a distributed web crawl and 
search system, based on Lucene/Solr and Hadoop.

[1] http://nutch.apache.org/
-- 
Renaud Delbru

On 19/11/10 16:52, Bing Li wrote:
> Hi, all,
>
> I am working on a distributed searching system. Now I have one server only.
> It has to crawl pages from the Web, generate indexes locally and respond
> users' queries. I think this is too busy for it to work smoothly.
>
> I plan to use two servers at at least. The jobs to crawl pages and generate
> indexes are done by one of them. After that, the new available indexes
> should be transmitted to anther one which is responsible for responding
> users' queries. From users' point of view, this system must be fast.
> However, I don't know how I can get the additional indexes which I can
> transmit. After transmission, how to append them to the old indexes? Does
> the appending block searching?
>
> Thanks so much for your help!
>
> Bing Li
>

Re: How to Transmit and Append Indexes

Posted by Alex Baranau <al...@gmail.com>.

Make sure you are not going to "reinvent the wheel" here ;). There's been
done a lot around the problem of distributes search engine.
This thread might be useful for you: http://search-hadoop.com/m/ARlbS1MiTNY

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Fri, Nov 19, 2010 at 5:52 PM, Bing Li <lb...@gmail.com> wrote:

> Hi, all,
>
> I am working on a distributed searching system. Now I have one server only.
> It has to crawl pages from the Web, generate indexes locally and respond
> users' queries. I think this is too busy for it to work smoothly.
>
> I plan to use two servers at at least. The jobs to crawl pages and generate
> indexes are done by one of them. After that, the new available indexes
> should be transmitted to anther one which is responsible for responding
> users' queries. From users' point of view, this system must be fast.
> However, I don't know how I can get the additional indexes which I can
> transmit. After transmission, how to append them to the old indexes? Does
> the appending block searching?
>
> Thanks so much for your help!
>
> Bing Li
>