You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Bing Li <lb...@gmail.com> on 2010/11/17 08:58:53 UTC

How to Transmit and Append Indexes

Hi, all,

I am working on a distributed searching system. Now I have one server only.
It has to crawl pages from the Web, generate indexes locally and respond
users' queries. I think this is too busy for it to work smoothly.

I plan to use two servers at at least. The jobs to crawl pages and generate
indexes are done by one of them. After that, the new available indexes
should be transmitted to anther one which is responsible for responding
users' queries. From users' point of view, this system must be fast.
However, I don't know how I can get the additional indexes which I can
transmit. After transmission, how to append them to the old indexes? Does
the appending block searching?

Thanks so much for your help!

Bing Li

Re: How to Transmit and Append Indexes

Posted by Bing Li <lb...@gmail.com>.
Dear Shashi,

I hope only one machine to generate new indexes based new crawled pages.
After that, the indexes can be transmitted to another or others machines
that respond users' queries.

However, when using Lucene.NET, I cannot control the indexing process. What
I can see is just the index files on the disk. How can I just get the new
generated indexes. May I transmit them to a remote machine and append them
with the existing ones on that machine?

Thanks!
Bing Li


On Wed, Nov 17, 2010 at 11:20 PM, Shashi Kant <sk...@sloan.mit.edu> wrote:

> IndexWriter has a Merge() method to combine indexes. Not sure what you mean
> by " transmitted to anther one" , if you mean making the index available you
> can copy it across the network.
>
> BTW suggest you should look into Solr, since it does most of the work for
> you using sharding etc.
>
> On Wed, Nov 17, 2010 at 2:58 AM, Bing Li <lb...@gmail.com> wrote:
>
>> I
>
>
>

Re: How to Transmit and Append Indexes

Posted by Shashi Kant <sk...@sloan.mit.edu>.
IndexWriter has a Merge() method to combine indexes. Not sure what you mean
by " transmitted to anther one" , if you mean making the index available you
can copy it across the network.

BTW suggest you should look into Solr, since it does most of the work for
you using sharding etc.

On Wed, Nov 17, 2010 at 2:58 AM, Bing Li <lb...@gmail.com> wrote:

> I

RE: How to Transmit and Append Indexes

Posted by Jean-Francois Beaulac <je...@hotmail.com>.
here is a pretty good and efficient way to do it, from the author of lucene himself
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12709.html

----------------------------------------
> Date: Wed, 17 Nov 2010 15:58:53 +0800
> Subject: How to Transmit and Append Indexes
> From: lblabs@gmail.com
> To: lucene-net-user@incubator.apache.org
>
> Hi, all,
>
> I am working on a distributed searching system. Now I have one server only.
> It has to crawl pages from the Web, generate indexes locally and respond
> users' queries. I think this is too busy for it to work smoothly.
>
> I plan to use two servers at at least. The jobs to crawl pages and generate
> indexes are done by one of them. After that, the new available indexes
> should be transmitted to anther one which is responsible for responding
> users' queries. From users' point of view, this system must be fast.
> However, I don't know how I can get the additional indexes which I can
> transmit. After transmission, how to append them to the old indexes? Does
> the appending block searching?
>
> Thanks so much for your help!
>
> Bing Li
 		 	   		  

Re: How to Transmit and Append Indexes

Posted by Bing Li <lb...@gmail.com>.
Dear Nicholas,

What do you mean? I think a distributed search engine should consider the
issue.

Thanks,
Bing Li

On Wed, Nov 17, 2010 at 11:33 PM, Nicholas Paldino [.NET/C# MVP] <
casperOne@caspershouse.com> wrote:

> That's in horribly bad taste.
>
> -----Original Message-----
> From: Derek Finlen [mailto:DFinlen@ahmdirect.com]
> Sent: Wednesday, November 17, 2010 9:58 AM
> To: lucene-net-user@lucene.apache.org; bing.li@asu.edu
> Subject: RE: How to Transmit and Append Indexes
>
> I hope, this isn't for the search engine Bing?  :P
>
> -----Original Message-----
> From: Bing Li [mailto:lblabs@gmail.com]
> Sent: Wednesday, November 17, 2010 2:59 AM
> To: lucene-net-user@incubator.apache.org
> Subject: How to Transmit and Append Indexes
>
> Hi, all,
>
> I am working on a distributed searching system. Now I have one server only.
> It has to crawl pages from the Web, generate indexes locally and respond
> users' queries. I think this is too busy for it to work smoothly.
>
> I plan to use two servers at at least. The jobs to crawl pages and generate
> indexes are done by one of them. After that, the new available indexes
> should be transmitted to anther one which is responsible for responding
> users' queries. From users' point of view, this system must be fast.
> However, I don't know how I can get the additional indexes which I can
> transmit. After transmission, how to append them to the old indexes? Does
> the appending block searching?
>
> Thanks so much for your help!
>
> Bing Li
>
> ###
>
>
>
> This e-mail is confidential and may well be legally
>
> privileged. If you received it in error, you are on notice
>
> of its status. Please notify us immediately by reply e-mail
>
> and then delete this message from your system. Please
>
> do not copy it or use it for any purposes or disclose its
>
> contents to any other person. To do so could violate
>
> state and federal privacy laws. Thank you for your
>
> cooperation.
>
>
>
> ###
>
>
>
>

RE: How to Transmit and Append Indexes

Posted by "Nicholas Paldino [.NET/C# MVP]" <ca...@caspershouse.com>.
That's in horribly bad taste.

-----Original Message-----
From: Derek Finlen [mailto:DFinlen@ahmdirect.com] 
Sent: Wednesday, November 17, 2010 9:58 AM
To: lucene-net-user@lucene.apache.org; bing.li@asu.edu
Subject: RE: How to Transmit and Append Indexes

I hope, this isn't for the search engine Bing?  :P  

-----Original Message-----
From: Bing Li [mailto:lblabs@gmail.com] 
Sent: Wednesday, November 17, 2010 2:59 AM
To: lucene-net-user@incubator.apache.org
Subject: How to Transmit and Append Indexes

Hi, all,

I am working on a distributed searching system. Now I have one server only.
It has to crawl pages from the Web, generate indexes locally and respond
users' queries. I think this is too busy for it to work smoothly.

I plan to use two servers at at least. The jobs to crawl pages and generate
indexes are done by one of them. After that, the new available indexes
should be transmitted to anther one which is responsible for responding
users' queries. From users' point of view, this system must be fast.
However, I don't know how I can get the additional indexes which I can
transmit. After transmission, how to append them to the old indexes? Does
the appending block searching?

Thanks so much for your help!

Bing Li

###



This e-mail is confidential and may well be legally

privileged. If you received it in error, you are on notice

of its status. Please notify us immediately by reply e-mail

and then delete this message from your system. Please

do not copy it or use it for any purposes or disclose its

contents to any other person. To do so could violate

state and federal privacy laws. Thank you for your

cooperation.



###




RE: How to Transmit and Append Indexes

Posted by Derek Finlen <DF...@ahmdirect.com>.
I hope, this isn't for the search engine Bing?  :P  

-----Original Message-----
From: Bing Li [mailto:lblabs@gmail.com] 
Sent: Wednesday, November 17, 2010 2:59 AM
To: lucene-net-user@incubator.apache.org
Subject: How to Transmit and Append Indexes

Hi, all,

I am working on a distributed searching system. Now I have one server only.
It has to crawl pages from the Web, generate indexes locally and respond
users' queries. I think this is too busy for it to work smoothly.

I plan to use two servers at at least. The jobs to crawl pages and generate
indexes are done by one of them. After that, the new available indexes
should be transmitted to anther one which is responsible for responding
users' queries. From users' point of view, this system must be fast.
However, I don't know how I can get the additional indexes which I can
transmit. After transmission, how to append them to the old indexes? Does
the appending block searching?

Thanks so much for your help!

Bing Li

###

This e-mail is confidential and may well be legally
privileged. If you received it in error, you are on notice
of its status. Please notify us immediately by reply e-mail
and then delete this message from your system. Please
do not copy it or use it for any purposes or disclose its
contents to any other person. To do so could violate
state and federal privacy laws. Thank you for your
cooperation.

###

Re: How to Transmit and Append Indexes

Posted by Ben West <bw...@yahoo.com>.
Correct. If you move the entire index each time you will be moving, well, the entire index each time.

You might want to look into using a distributed file system, e.g. http://en.wikipedia.org/wiki/Distributed_File_System_(Microsoft). This will try to send only the updates over the network.

I know that many large Java Lucene servers use rsync, which is roughly similar to this DFS, and that seems to work well for them. I have not heard of any lucene.net servers using something like this, but I'm sure some are.

--- On Wed, 11/17/10, Bing Li <lb...@gmail.com> wrote:

From: Bing Li <lb...@gmail.com>
Subject: Re: How to Transmit and Append Indexes
To: "Ben West" <bw...@yahoo.com>
Cc: lucene-net-user@incubator.apache.org
Date: Wednesday, November 17, 2010, 9:51 AM

Hi, Ben,

If doing this way, the size of the index must becomes larger and larger, right? The load on the network must become heavy. The longer, the heavier. Is it a proper solution?

Best,
Bing Li



On Wed, Nov 17, 2010 at 11:41 PM, Ben West <bw...@yahoo.com> wrote:

It sounds like you want one server to do the writing and one to do the reading (searching)? If so, why not do something like:



1. Have you writing server constantly updating the index

2. At some point, pause the writing process, then copy the directory over to your searching machine.

3. Point the searcher at the copied dir when the copy is done

4. Remove the old dir, and go back to 1



There are many tools to copy directories from one place to another (or to keep two filesystems in sync in general). I don't think this will be a lucene-specific issue.



Thanks,

-Ben





--- On Wed, 11/17/10, Bing Li <lb...@gmail.com> wrote:



> From: Bing Li <lb...@gmail.com>

> Subject: How to Transmit and Append Indexes

> To: lucene-net-user@incubator.apache.org

> Date: Wednesday, November 17, 2010, 1:58 AM

> Hi, all,

>

> I am working on a distributed searching system. Now I have

> one server only.

> It has to crawl pages from the Web, generate indexes

> locally and respond

> users' queries. I think this is too busy for it to work

> smoothly.

>

> I plan to use two servers at at least. The jobs to crawl

> pages and generate

> indexes are done by one of them. After that, the new

> available indexes

> should be transmitted to anther one which is responsible

> for responding

> users' queries. From users' point of view, this system must

> be fast.

> However, I don't know how I can get the additional indexes

> which I can

> transmit. After transmission, how to append them to the old

> indexes? Does

> the appending block searching?

>

> Thanks so much for your help!

>

> Bing Li

>












      

Re: How to Transmit and Append Indexes

Posted by Bing Li <lb...@gmail.com>.
Hi, Ben,

If doing this way, the size of the index must becomes larger and larger,
right? The load on the network must become heavy. The longer, the heavier.
Is it a proper solution?

Best,
Bing Li


On Wed, Nov 17, 2010 at 11:41 PM, Ben West <bw...@yahoo.com> wrote:

> It sounds like you want one server to do the writing and one to do the
> reading (searching)? If so, why not do something like:
>
> 1. Have you writing server constantly updating the index
> 2. At some point, pause the writing process, then copy the directory over
> to your searching machine.
> 3. Point the searcher at the copied dir when the copy is done
> 4. Remove the old dir, and go back to 1
>
> There are many tools to copy directories from one place to another (or to
> keep two filesystems in sync in general). I don't think this will be a
> lucene-specific issue.
>
> Thanks,
> -Ben
>
>
> --- On Wed, 11/17/10, Bing Li <lb...@gmail.com> wrote:
>
> > From: Bing Li <lb...@gmail.com>
> > Subject: How to Transmit and Append Indexes
> > To: lucene-net-user@incubator.apache.org
> > Date: Wednesday, November 17, 2010, 1:58 AM
> > Hi, all,
> >
> > I am working on a distributed searching system. Now I have
> > one server only.
> > It has to crawl pages from the Web, generate indexes
> > locally and respond
> > users' queries. I think this is too busy for it to work
> > smoothly.
> >
> > I plan to use two servers at at least. The jobs to crawl
> > pages and generate
> > indexes are done by one of them. After that, the new
> > available indexes
> > should be transmitted to anther one which is responsible
> > for responding
> > users' queries. From users' point of view, this system must
> > be fast.
> > However, I don't know how I can get the additional indexes
> > which I can
> > transmit. After transmission, how to append them to the old
> > indexes? Does
> > the appending block searching?
> >
> > Thanks so much for your help!
> >
> > Bing Li
> >
>
>
>
>

Re: How to Transmit and Append Indexes

Posted by Ben West <bw...@yahoo.com>.
It sounds like you want one server to do the writing and one to do the reading (searching)? If so, why not do something like:

1. Have you writing server constantly updating the index
2. At some point, pause the writing process, then copy the directory over to your searching machine. 
3. Point the searcher at the copied dir when the copy is done
4. Remove the old dir, and go back to 1

There are many tools to copy directories from one place to another (or to keep two filesystems in sync in general). I don't think this will be a lucene-specific issue.

Thanks,
-Ben


--- On Wed, 11/17/10, Bing Li <lb...@gmail.com> wrote:

> From: Bing Li <lb...@gmail.com>
> Subject: How to Transmit and Append Indexes
> To: lucene-net-user@incubator.apache.org
> Date: Wednesday, November 17, 2010, 1:58 AM
> Hi, all,
> 
> I am working on a distributed searching system. Now I have
> one server only.
> It has to crawl pages from the Web, generate indexes
> locally and respond
> users' queries. I think this is too busy for it to work
> smoothly.
> 
> I plan to use two servers at at least. The jobs to crawl
> pages and generate
> indexes are done by one of them. After that, the new
> available indexes
> should be transmitted to anther one which is responsible
> for responding
> users' queries. From users' point of view, this system must
> be fast.
> However, I don't know how I can get the additional indexes
> which I can
> transmit. After transmission, how to append them to the old
> indexes? Does
> the appending block searching?
> 
> Thanks so much for your help!
> 
> Bing Li
>