You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Bing Li <lb...@gmail.com> on 2010/11/19 18:23:20 UTC

Is it fine to transmit indexes in this way?

Hi, all,

Since I didn't find that Lucene presents updated indexes to us, may I
transmit indexes in the following way?

1) One indexing machine, A, is busy with generating indexes;

2) After a certain time, the indexing process is terminated;

3) Then, the new indexes are transmitted to machines which serve users'
queries;

4) It is possible that some index files have the same names. So the
conflicting files should be renamed;

5) After the transmission is done, the transmitted indexes are removed from
A.

6) After the removal, the indexing process is started again on A.

The reason I am trying to do that is to load balancing the search load. One
machine is responsible for generating indexes and the others are responsible
for responding queries.

If the above approaches do not work, may I see the updates of indexes in
Lucene? May I transmit them? And, may I append them to existing indexes?
Does the appending affect the querying?

I am learning Solr. But it seems that Solr does that for me. However, I have
to set up Tomcat to use Solr. I think it is a little bit heavy.

Thanks!
Bing Li

Re: Is it fine to transmit indexes in this way?

Posted by Bing Li <lb...@gmail.com>.
Steve,

Thanks so much for your reminding!

I will learn Solr. But I still wonder if "indexWriter.AddIndexesNoOptimize"
will block the reading the relevant indexes until it is completed. Could you
answer me?

Best regards,
Bing

On Sat, Nov 20, 2010 at 4:46 PM, Steve Martin <st...@gmail.com> wrote:

> Bing,
> Solr really is quite simple to set up and will definitely provide you with
> a robust infrastructure from which to load balance. Definitely worth the
> investment in time to learn.
> Manually copying Lucene index files around is something you really ought
> not to be doing.
>
> Regards, Steve
>
>
>
> On 19 Nov 2010, at 17:23, Bing Li <lb...@gmail.com> wrote:
>
> > Hi, all,
> >
> > Since I didn't find that Lucene presents updated indexes to us, may I
> > transmit indexes in the following way?
> >
> > 1) One indexing machine, A, is busy with generating indexes;
> >
> > 2) After a certain time, the indexing process is terminated;
> >
> > 3) Then, the new indexes are transmitted to machines which serve users'
> > queries;
> >
> > 4) It is possible that some index files have the same names. So the
> > conflicting files should be renamed;
> >
> > 5) After the transmission is done, the transmitted indexes are removed
> from
> > A.
> >
> > 6) After the removal, the indexing process is started again on A.
> >
> > The reason I am trying to do that is to load balancing the search load.
> One
> > machine is responsible for generating indexes and the others are
> responsible
> > for responding queries.
> >
> > If the above approaches do not work, may I see the updates of indexes in
> > Lucene? May I transmit them? And, may I append them to existing indexes?
> > Does the appending affect the querying?
> >
> > I am learning Solr. But it seems that Solr does that for me. However, I
> have
> > to set up Tomcat to use Solr. I think it is a little bit heavy.
> >
> > Thanks!
> > Bing Li
>
>

Re: Is it fine to transmit indexes in this way?

Posted by Steve Martin <st...@gmail.com>.
Bing,
Solr really is quite simple to set up and will definitely provide you with a robust infrastructure from which to load balance. Definitely worth the investment in time to learn.
Manually copying Lucene index files around is something you really ought not to be doing.

Regards, Steve



On 19 Nov 2010, at 17:23, Bing Li <lb...@gmail.com> wrote:

> Hi, all,
> 
> Since I didn't find that Lucene presents updated indexes to us, may I
> transmit indexes in the following way?
> 
> 1) One indexing machine, A, is busy with generating indexes;
> 
> 2) After a certain time, the indexing process is terminated;
> 
> 3) Then, the new indexes are transmitted to machines which serve users'
> queries;
> 
> 4) It is possible that some index files have the same names. So the
> conflicting files should be renamed;
> 
> 5) After the transmission is done, the transmitted indexes are removed from
> A.
> 
> 6) After the removal, the indexing process is started again on A.
> 
> The reason I am trying to do that is to load balancing the search load. One
> machine is responsible for generating indexes and the others are responsible
> for responding queries.
> 
> If the above approaches do not work, may I see the updates of indexes in
> Lucene? May I transmit them? And, may I append them to existing indexes?
> Does the appending affect the querying?
> 
> I am learning Solr. But it seems that Solr does that for me. However, I have
> to set up Tomcat to use Solr. I think it is a little bit heavy.
> 
> Thanks!
> Bing Li


Re: Is it fine to transmit indexes in this way?

Posted by Gora Mohanty <go...@mimirtech.com>.
On Fri, Nov 19, 2010 at 11:39 PM, Bing Li <lb...@gmail.com> wrote:
[...]
> When updates are replicated to slave servers, it is supposed that the
> updates are merged with the existing indexes and readings on them can be
> done concurrently. If so, the queries must be responded instantly. That's
> what I mean "appending". Does it happen in Solr?
[...]

If you look at the last point in the section "How does the slave replicate?"
on the replication page, http://wiki.apache.org/solr/SolrReplication , you
will note that a commit is issued on the slave Solr server *after* replication
finishes, so that new/updated documents become available for querying
only then.

Do not personally have much experience with this, but if you need real-time
search feature like you seem to be describing above, I would look at
http://wiki.apache.org/solr/NearRealtimeSearch
http://wiki.apache.org/solr/NearRealtimeSearchTuning
and recent threads on the subject on this mailing list.

Regards,
Gora

Re: Is it fine to transmit indexes in this way?

Posted by Bing Li <lb...@gmail.com>.
Dear Digy,

I will try that! I hope I could know how to control everything before using
a mature framework, such as Solr.

By the way, do you think "indexWriter.AddIndexesNoOptimize" will block the
reading the relevant indexes until it is completed?

Thanks so much!

Best regards,
Bing Li

On Sat, Nov 20, 2010 at 5:57 AM, Digy <di...@gmail.com> wrote:

> Hi Li,
> Have you tried the alternative below?
>
> loop{
>  * Create a clean index and add some docs to it on Index-Server
>  * Copy this "small" index to Search-Server where your master index
> resides.
>  * Add this "small" index to master using
> "indexWriter.AddIndexesNoOptimize"
> }
>
> DIGY
>
> -----Original Message-----
> From: Bing Li [mailto:lblabs@gmail.com]
> Sent: Friday, November 19, 2010 8:10 PM
> To: solr-user@lucene.apache.org
> Cc: lucene-net-user@incubator.apache.org
> Subject: Re: Is it fine to transmit indexes in this way?
>
> Thanks so much, Gora!
>
> What do you mean by appending? If you mean adding to an existing index
> (on reindexing, this would normally mean an update for an existing Solr
> document ID, and a create for a new Solr document ID), the best way
> probably is not to delete the index on the master server (what you call
> machine A). Once the indexing is completed, a commit ensures that new
> documents show up for any subsequent queries.
>
> When updates are replicated to slave servers, it is supposed that the
> updates are merged with the existing indexes and readings on them can be
> done concurrently. If so, the queries must be responded instantly. That's
> what I mean "appending". Does it happen in Solr?
>
> Best,
> Bing
>
> On Sat, Nov 20, 2010 at 1:58 AM, Gora Mohanty <go...@mimirtech.com> wrote:
>
> > On Fri, Nov 19, 2010 at 10:53 PM, Bing Li <lb...@gmail.com> wrote:
> > > Hi, all,
> > >
> > > Since I didn't find that Lucene presents updated indexes to us, may I
> > > transmit indexes in the following way?
> > >
> > > 1) One indexing machine, A, is busy with generating indexes;
> > >
> > > 2) After a certain time, the indexing process is terminated;
> > >
> > > 3) Then, the new indexes are transmitted to machines which serve users'
> > > queries;
> >
> > Just replied to a similar question in another thread. The best way
> > is probably to use Solr replication:
> > http://wiki.apache.org/solr/SolrReplication
> >
> > You can set up replication to happen automatically upon commit on the
> > master server (where the new index was made). As a commit should
> > have been made when indexing is complete on the master server, this
> > will then ensure that a new index is replicated on the slave server.
> >
> > > 4) It is possible that some index files have the same names. So the
> > > conflicting files should be renamed;
> >
> > Replication will handle this for you.
> >
> > > 5) After the transmission is done, the transmitted indexes are removed
> > from
> > > A.
> > >
> > > 6) After the removal, the indexing process is started again on A.
> > [...]
> >
> > These two items you have to do manually, i.e., delete all documents
> > on A, and restart the indexing.
> >
> >
> > >                                             And, may I append them to
> > existing indexes?
> > > Does the appending affect the querying?
> > [...]
> >
> > What do you mean by appending? If you mean adding to an existing index
> > (on reindexing, this would normally mean an update for an existing Solr
> > document ID, and a create for a new Solr document ID), the best way
> > probably is not to delete the index on the master server (what you call
> > machine A). Once the indexing is completed, a commit ensures that new
> > documents show up for any subsequent queries.
> >
>
> > Regards,
> > Gora
> >
>
>

RE: Is it fine to transmit indexes in this way?

Posted by Digy <di...@gmail.com>.
Hi Li,
Have you tried the alternative below?

loop{
 * Create a clean index and add some docs to it on Index-Server
 * Copy this "small" index to Search-Server where your master index resides.
 * Add this "small" index to master using "indexWriter.AddIndexesNoOptimize"
}

DIGY

-----Original Message-----
From: Bing Li [mailto:lblabs@gmail.com] 
Sent: Friday, November 19, 2010 8:10 PM
To: solr-user@lucene.apache.org
Cc: lucene-net-user@incubator.apache.org
Subject: Re: Is it fine to transmit indexes in this way?

Thanks so much, Gora!

What do you mean by appending? If you mean adding to an existing index
(on reindexing, this would normally mean an update for an existing Solr
document ID, and a create for a new Solr document ID), the best way
probably is not to delete the index on the master server (what you call
machine A). Once the indexing is completed, a commit ensures that new
documents show up for any subsequent queries.

When updates are replicated to slave servers, it is supposed that the
updates are merged with the existing indexes and readings on them can be
done concurrently. If so, the queries must be responded instantly. That's
what I mean "appending". Does it happen in Solr?

Best,
Bing

On Sat, Nov 20, 2010 at 1:58 AM, Gora Mohanty <go...@mimirtech.com> wrote:

> On Fri, Nov 19, 2010 at 10:53 PM, Bing Li <lb...@gmail.com> wrote:
> > Hi, all,
> >
> > Since I didn't find that Lucene presents updated indexes to us, may I
> > transmit indexes in the following way?
> >
> > 1) One indexing machine, A, is busy with generating indexes;
> >
> > 2) After a certain time, the indexing process is terminated;
> >
> > 3) Then, the new indexes are transmitted to machines which serve users'
> > queries;
>
> Just replied to a similar question in another thread. The best way
> is probably to use Solr replication:
> http://wiki.apache.org/solr/SolrReplication
>
> You can set up replication to happen automatically upon commit on the
> master server (where the new index was made). As a commit should
> have been made when indexing is complete on the master server, this
> will then ensure that a new index is replicated on the slave server.
>
> > 4) It is possible that some index files have the same names. So the
> > conflicting files should be renamed;
>
> Replication will handle this for you.
>
> > 5) After the transmission is done, the transmitted indexes are removed
> from
> > A.
> >
> > 6) After the removal, the indexing process is started again on A.
> [...]
>
> These two items you have to do manually, i.e., delete all documents
> on A, and restart the indexing.
>
>
> >                                             And, may I append them to
> existing indexes?
> > Does the appending affect the querying?
> [...]
>
> What do you mean by appending? If you mean adding to an existing index
> (on reindexing, this would normally mean an update for an existing Solr
> document ID, and a create for a new Solr document ID), the best way
> probably is not to delete the index on the master server (what you call
> machine A). Once the indexing is completed, a commit ensures that new
> documents show up for any subsequent queries.
>

> Regards,
> Gora
>


Re: Is it fine to transmit indexes in this way?

Posted by Bing Li <lb...@gmail.com>.
Thanks so much, Gora!

What do you mean by appending? If you mean adding to an existing index
(on reindexing, this would normally mean an update for an existing Solr
document ID, and a create for a new Solr document ID), the best way
probably is not to delete the index on the master server (what you call
machine A). Once the indexing is completed, a commit ensures that new
documents show up for any subsequent queries.

When updates are replicated to slave servers, it is supposed that the
updates are merged with the existing indexes and readings on them can be
done concurrently. If so, the queries must be responded instantly. That's
what I mean "appending". Does it happen in Solr?

Best,
Bing

On Sat, Nov 20, 2010 at 1:58 AM, Gora Mohanty <go...@mimirtech.com> wrote:

> On Fri, Nov 19, 2010 at 10:53 PM, Bing Li <lb...@gmail.com> wrote:
> > Hi, all,
> >
> > Since I didn't find that Lucene presents updated indexes to us, may I
> > transmit indexes in the following way?
> >
> > 1) One indexing machine, A, is busy with generating indexes;
> >
> > 2) After a certain time, the indexing process is terminated;
> >
> > 3) Then, the new indexes are transmitted to machines which serve users'
> > queries;
>
> Just replied to a similar question in another thread. The best way
> is probably to use Solr replication:
> http://wiki.apache.org/solr/SolrReplication
>
> You can set up replication to happen automatically upon commit on the
> master server (where the new index was made). As a commit should
> have been made when indexing is complete on the master server, this
> will then ensure that a new index is replicated on the slave server.
>
> > 4) It is possible that some index files have the same names. So the
> > conflicting files should be renamed;
>
> Replication will handle this for you.
>
> > 5) After the transmission is done, the transmitted indexes are removed
> from
> > A.
> >
> > 6) After the removal, the indexing process is started again on A.
> [...]
>
> These two items you have to do manually, i.e., delete all documents
> on A, and restart the indexing.
>
>
> >                                             And, may I append them to
> existing indexes?
> > Does the appending affect the querying?
> [...]
>
> What do you mean by appending? If you mean adding to an existing index
> (on reindexing, this would normally mean an update for an existing Solr
> document ID, and a create for a new Solr document ID), the best way
> probably is not to delete the index on the master server (what you call
> machine A). Once the indexing is completed, a commit ensures that new
> documents show up for any subsequent queries.
>

> Regards,
> Gora
>

Re: Is it fine to transmit indexes in this way?

Posted by Bing Li <lb...@gmail.com>.
Thanks so much, Gora!

What do you mean by appending? If you mean adding to an existing index
(on reindexing, this would normally mean an update for an existing Solr
document ID, and a create for a new Solr document ID), the best way
probably is not to delete the index on the master server (what you call
machine A). Once the indexing is completed, a commit ensures that new
documents show up for any subsequent queries.

When updates are replicated to slave servers, it is supposed that the
updates are merged with the existing indexes and readings on them can be
done concurrently. If so, the queries must be responded instantly. That's
what I mean "appending". Does it happen in Solr?

Best,
Bing

On Sat, Nov 20, 2010 at 1:58 AM, Gora Mohanty <go...@mimirtech.com> wrote:

> On Fri, Nov 19, 2010 at 10:53 PM, Bing Li <lb...@gmail.com> wrote:
> > Hi, all,
> >
> > Since I didn't find that Lucene presents updated indexes to us, may I
> > transmit indexes in the following way?
> >
> > 1) One indexing machine, A, is busy with generating indexes;
> >
> > 2) After a certain time, the indexing process is terminated;
> >
> > 3) Then, the new indexes are transmitted to machines which serve users'
> > queries;
>
> Just replied to a similar question in another thread. The best way
> is probably to use Solr replication:
> http://wiki.apache.org/solr/SolrReplication
>
> You can set up replication to happen automatically upon commit on the
> master server (where the new index was made). As a commit should
> have been made when indexing is complete on the master server, this
> will then ensure that a new index is replicated on the slave server.
>
> > 4) It is possible that some index files have the same names. So the
> > conflicting files should be renamed;
>
> Replication will handle this for you.
>
> > 5) After the transmission is done, the transmitted indexes are removed
> from
> > A.
> >
> > 6) After the removal, the indexing process is started again on A.
> [...]
>
> These two items you have to do manually, i.e., delete all documents
> on A, and restart the indexing.
>
>
> >                                             And, may I append them to
> existing indexes?
> > Does the appending affect the querying?
> [...]
>
> What do you mean by appending? If you mean adding to an existing index
> (on reindexing, this would normally mean an update for an existing Solr
> document ID, and a create for a new Solr document ID), the best way
> probably is not to delete the index on the master server (what you call
> machine A). Once the indexing is completed, a commit ensures that new
> documents show up for any subsequent queries.
>

> Regards,
> Gora
>

Re: Is it fine to transmit indexes in this way?

Posted by Gora Mohanty <go...@mimirtech.com>.
On Fri, Nov 19, 2010 at 10:53 PM, Bing Li <lb...@gmail.com> wrote:
> Hi, all,
>
> Since I didn't find that Lucene presents updated indexes to us, may I
> transmit indexes in the following way?
>
> 1) One indexing machine, A, is busy with generating indexes;
>
> 2) After a certain time, the indexing process is terminated;
>
> 3) Then, the new indexes are transmitted to machines which serve users'
> queries;

Just replied to a similar question in another thread. The best way
is probably to use Solr replication:
http://wiki.apache.org/solr/SolrReplication

You can set up replication to happen automatically upon commit on the
master server (where the new index was made). As a commit should
have been made when indexing is complete on the master server, this
will then ensure that a new index is replicated on the slave server.

> 4) It is possible that some index files have the same names. So the
> conflicting files should be renamed;

Replication will handle this for you.

> 5) After the transmission is done, the transmitted indexes are removed from
> A.
>
> 6) After the removal, the indexing process is started again on A.
[...]

These two items you have to do manually, i.e., delete all documents
on A, and restart the indexing.


>                                             And, may I append them to existing indexes?
> Does the appending affect the querying?
[...]

What do you mean by appending? If you mean adding to an existing index
(on reindexing, this would normally mean an update for an existing Solr
document ID, and a create for a new Solr document ID), the best way
probably is not to delete the index on the master server (what you call
machine A). Once the indexing is completed, a commit ensures that new
documents show up for any subsequent queries.

Regards,
Gora