You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Brian Sayatovic <bs...@creditinfonet.com> on 2011/05/20 19:40:31 UTC

[Lucene.Net] Server farm sharing Lucene

How have folks gone about setting up Lucene in a server farm?  Just a network-accessible shared directory?

Regards,
Brian.
________________________________

Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

Re: [Lucene.Net] Server farm sharing Lucene

Posted by Wyatt Barnett <wy...@gmail.com>.
Messaging +1, really the way to fly these days. Disk space is cheap,
multiple local indexes don't hurt that much.

On 5/23/11 12:19 PM, "Moray McConnachie" <mm...@oxford-analytica.com>
wrote:

>We use a message queue (MSMQ) for our indexing updates, and it works
>well. It's also not a bad one-to-many distribution system so you can
>update several search servers if you want to handle fail-over through
>multiple updates (rather than synchronising indices), and it can handle
>transactions, exceptions and retries pretty well too so you should be
>able to guarantee updates get synchronised.
>
>It's looked to me for a while as if MS will eventually get rid of MSMQ -
>but I might be out-of-date.
>
>I've had some bad experiences in the long distant past (v1.0) with
>.NET's file monitoring with updates failing to fire. No doubt this is
>all fixed in recent OS/.NET combinations, but it left me with a sour
>taste.
>
>M.
>
>-------------------------------------
>Moray McConnachie
>Director of IT    +44 1865 261 600
>Oxford Analytica  http://www.oxan.com
>
>-----Original Message-----
>From: Brian Sayatovic [mailto:bsayatovic@creditinfonet.com]
>Sent: 23 May 2011 14:06
>To: lucene-net-user@lucene.apache.org
>Subject: RE: [Lucene.Net] Server farm sharing Lucene
>
>Interesting!
>
>Right now, the index updating occurs on the same thread where the DB
>write is occurring.  This is nice in that we have little room for one to
>happen without the other.  With a dedicated search server, I'd have to
>see pushing the update off to that other server via a message queue,
>perhaps, and then the ability have all servers in the farm query through
>it.
>
>Still, I'd worry about fail over.  We have some other failover
>strategies where every server in the farm is capable of a function, but
>only one server is actively doing it.  But each server periodically
>checks if any other server still has an "active claim" (i.e. not too
>old) and if not, it will pick up.  So in the event one server fails,
>another in the farm takes over.
>
>Perhaps I could marry the two.
>
>But, as said earlier in this thread, I won't prematurely optimize.
>
>-----Original Message-----
>From: Moray McConnachie [mailto:mmcconna@oxford-analytica.com]
>Sent: Monday, May 23, 2011 4:40 AM
>To: lucene-net-user@lucene.apache.org
>Subject: RE: [Lucene.Net] Server farm sharing Lucene
>
>If your traffic is high enough to warrant the server farm, and search is
>a highly used feature, it is also worth thinking about a dedicated
>search server (or pair of such synced as suggested by Ken and/or
>separately driven by your publishing tools depending on the degree of
>redundancy and failsafe you need).
>
>We use a dedicated search server as a service, running a custom wrapper
>- we pass a Lucene Query across the network using .NET Remoting -
>binary-serialization over TCP (stay away from other forms of
>serialization unless you have lots of resources to throw at search and
>lots of bandwidth), returning a custom object containing the results and
>other assorted metadata, including faceting.
>
>.NET remoting is a joy in this context, you only need to be careful
>about version synchronisation - upgrades need to be carefully planned so
>that servers with e.g. an upgraded Lucene only talk to a search server
>with an upgraded Lucene.
>
>Yours,
>Moray
>-------------------------------------
>Moray McConnachie
>Director of IT    +44 1865 261 600
>Oxford Analytica  http://www.oxan.com
>
>-----Original Message-----
>From: Ken Foskey [mailto:kfoskey@tpg.com.au]
>Sent: 21 May 2011 00:25
>To: lucene-net-user@lucene.apache.org
>Subject: Re: [Lucene.Net] Server farm sharing Lucene
>
>Shared directory means network so you have two latencies and much more
>traffic on the network.
>
>.net has file monitor which will trigger a function on change of file.
>You can use this to push a file on change.  If you do this copy it to
>the same file system (partition) then move it into place after so it is
>immediately copied.
>
>Ken Foskey
>
>On 21/05/2011, at 3:40 AM, Brian Sayatovic
><bs...@creditinfonet.com> wrote:
>
>> How have folks gone about setting up Lucene in a server farm?  Just a
>network-accessible shared directory?
>>
>> Regards,
>> Brian.
>> ________________________________
>>
>> Learn more about the products, services and technology solutions
>> available from CIN Legal Data Services at:
>> www.cinlegal.com<http://www.cinlegal.com>
>>
>> This message may contain confidential / proprietary information from
>CIN Legal Data Service and Credit Infonet, Inc.. If you are not an
>intended recipient, please refrain from the disclosure, copying,
>distribution or use of this information. All such unauthorized actions
>are strictly prohibited. If you have received this transmission in
>error, please notify the sender by e-mail at
>bsayatovic@creditinfonet.com and delete all copies of this material from
>any computer.
>
>---------------------------------------------------------
>Disclaimer
>
>This message and any attachments are confidential and/or privileged. If
>this has been sent to you in error, please do not use, retain or
>disclose them, and contact the sender as soon as possible.
>
>Oxford Analytica Ltd
>Registered in England: No. 1196703
>5 Alfred Street, Oxford
>United Kingdom, OX1 4EH
>---------------------------------------------------------
>
>________________________________
>
>Learn more about the products, services and technology solutions
>available from CIN Legal Data Services at:
>www.cinlegal.com<http://www.cinlegal.com>
>
>This message may contain confidential / proprietary information from CIN
>Legal Data Service and Credit Infonet, Inc.. If you are not an intended
>recipient, please refrain from the disclosure, copying, distribution or
>use of this information. All such unauthorized actions are strictly
>prohibited. If you have received this transmission in error, please
>notify the sender by e-mail at bsayatovic@creditinfonet.com and delete
>all copies of this material from any computer.
>
>---------------------------------------------------------
>Disclaimer 
>
>This message and any attachments are confidential and/or privileged. If
>this has been sent to you in error, please do not use, retain or disclose
>them, and contact the sender as soon as possible.
>
>Oxford Analytica Ltd
>Registered in England: No. 1196703
>5 Alfred Street, Oxford
>United Kingdom, OX1 4EH
>---------------------------------------------------------
>



RE: [Lucene.Net] Server farm sharing Lucene

Posted by Moray McConnachie <mm...@oxford-analytica.com>.
We use a message queue (MSMQ) for our indexing updates, and it works
well. It's also not a bad one-to-many distribution system so you can
update several search servers if you want to handle fail-over through
multiple updates (rather than synchronising indices), and it can handle
transactions, exceptions and retries pretty well too so you should be
able to guarantee updates get synchronised. 

It's looked to me for a while as if MS will eventually get rid of MSMQ -
but I might be out-of-date.

I've had some bad experiences in the long distant past (v1.0) with
.NET's file monitoring with updates failing to fire. No doubt this is
all fixed in recent OS/.NET combinations, but it left me with a sour
taste.

M.

-------------------------------------
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: Brian Sayatovic [mailto:bsayatovic@creditinfonet.com] 
Sent: 23 May 2011 14:06
To: lucene-net-user@lucene.apache.org
Subject: RE: [Lucene.Net] Server farm sharing Lucene

Interesting!

Right now, the index updating occurs on the same thread where the DB
write is occurring.  This is nice in that we have little room for one to
happen without the other.  With a dedicated search server, I'd have to
see pushing the update off to that other server via a message queue,
perhaps, and then the ability have all servers in the farm query through
it.

Still, I'd worry about fail over.  We have some other failover
strategies where every server in the farm is capable of a function, but
only one server is actively doing it.  But each server periodically
checks if any other server still has an "active claim" (i.e. not too
old) and if not, it will pick up.  So in the event one server fails,
another in the farm takes over.

Perhaps I could marry the two.

But, as said earlier in this thread, I won't prematurely optimize.

-----Original Message-----
From: Moray McConnachie [mailto:mmcconna@oxford-analytica.com]
Sent: Monday, May 23, 2011 4:40 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: [Lucene.Net] Server farm sharing Lucene

If your traffic is high enough to warrant the server farm, and search is
a highly used feature, it is also worth thinking about a dedicated
search server (or pair of such synced as suggested by Ken and/or
separately driven by your publishing tools depending on the degree of
redundancy and failsafe you need).

We use a dedicated search server as a service, running a custom wrapper
- we pass a Lucene Query across the network using .NET Remoting -
binary-serialization over TCP (stay away from other forms of
serialization unless you have lots of resources to throw at search and
lots of bandwidth), returning a custom object containing the results and
other assorted metadata, including faceting.

.NET remoting is a joy in this context, you only need to be careful
about version synchronisation - upgrades need to be carefully planned so
that servers with e.g. an upgraded Lucene only talk to a search server
with an upgraded Lucene.

Yours,
Moray
-------------------------------------
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: Ken Foskey [mailto:kfoskey@tpg.com.au]
Sent: 21 May 2011 00:25
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] Server farm sharing Lucene

Shared directory means network so you have two latencies and much more
traffic on the network.

.net has file monitor which will trigger a function on change of file.
You can use this to push a file on change.  If you do this copy it to
the same file system (partition) then move it into place after so it is
immediately copied.

Ken Foskey

On 21/05/2011, at 3:40 AM, Brian Sayatovic
<bs...@creditinfonet.com> wrote:

> How have folks gone about setting up Lucene in a server farm?  Just a
network-accessible shared directory?
>
> Regards,
> Brian.
> ________________________________
>
> Learn more about the products, services and technology solutions 
> available from CIN Legal Data Services at:
> www.cinlegal.com<http://www.cinlegal.com>
>
> This message may contain confidential / proprietary information from
CIN Legal Data Service and Credit Infonet, Inc.. If you are not an
intended recipient, please refrain from the disclosure, copying,
distribution or use of this information. All such unauthorized actions
are strictly prohibited. If you have received this transmission in
error, please notify the sender by e-mail at
bsayatovic@creditinfonet.com and delete all copies of this material from
any computer.

---------------------------------------------------------
Disclaimer

This message and any attachments are confidential and/or privileged. If
this has been sent to you in error, please do not use, retain or
disclose them, and contact the sender as soon as possible.

Oxford Analytica Ltd
Registered in England: No. 1196703
5 Alfred Street, Oxford
United Kingdom, OX1 4EH
---------------------------------------------------------

________________________________

Learn more about the products, services and technology solutions
available from CIN Legal Data Services at:
www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN
Legal Data Service and Credit Infonet, Inc.. If you are not an intended
recipient, please refrain from the disclosure, copying, distribution or
use of this information. All such unauthorized actions are strictly
prohibited. If you have received this transmission in error, please
notify the sender by e-mail at bsayatovic@creditinfonet.com and delete
all copies of this material from any computer.

---------------------------------------------------------
Disclaimer 

This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

Oxford Analytica Ltd
Registered in England: No. 1196703
5 Alfred Street, Oxford
United Kingdom, OX1 4EH
---------------------------------------------------------


Re: [Lucene.Net] Server farm sharing Lucene

Posted by Kevin Miller <sc...@gmail.com>.
Our solution for this concern was to create a search web service which
spits out JSON results. Works great but you have to limit your search
clients to the query interface the web service supports. This limits
you sometimes from doing some of the crazy advanced things you can do
with Lucene.

Kevin Miller

On May 23, 2011, at 8:06 AM, Brian Sayatovic
<bs...@creditinfonet.com> wrote:

> Interesting!
>
> Right now, the index updating occurs on the same thread where the DB write is occurring.  This is nice in that we have little room for one to happen without the other.  With a dedicated search server, I'd have to see pushing the update off to that other server via a message queue, perhaps, and then the ability have all servers in the farm query through it.
>
> Still, I'd worry about fail over.  We have some other failover strategies where every server in the farm is capable of a function, but only one server is actively doing it.  But each server periodically checks if any other server still has an "active claim" (i.e. not too old) and if not, it will pick up.  So in the event one server fails, another in the farm takes over.
>
> Perhaps I could marry the two.
>
> But, as said earlier in this thread, I won't prematurely optimize.
>
> -----Original Message-----
> From: Moray McConnachie [mailto:mmcconna@oxford-analytica.com]
> Sent: Monday, May 23, 2011 4:40 AM
> To: lucene-net-user@lucene.apache.org
> Subject: RE: [Lucene.Net] Server farm sharing Lucene
>
> If your traffic is high enough to warrant the server farm, and search is a highly used feature, it is also worth thinking about a dedicated search server (or pair of such synced as suggested by Ken and/or separately driven by your publishing tools depending on the degree of redundancy and failsafe you need).
>
> We use a dedicated search server as a service, running a custom wrapper
> - we pass a Lucene Query across the network using .NET Remoting - binary-serialization over TCP (stay away from other forms of serialization unless you have lots of resources to throw at search and lots of bandwidth), returning a custom object containing the results and other assorted metadata, including faceting.
>
> .NET remoting is a joy in this context, you only need to be careful about version synchronisation - upgrades need to be carefully planned so that servers with e.g. an upgraded Lucene only talk to a search server with an upgraded Lucene.
>
> Yours,
> Moray
> -------------------------------------
> Moray McConnachie
> Director of IT    +44 1865 261 600
> Oxford Analytica  http://www.oxan.com
>
> -----Original Message-----
> From: Ken Foskey [mailto:kfoskey@tpg.com.au]
> Sent: 21 May 2011 00:25
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] Server farm sharing Lucene
>
> Shared directory means network so you have two latencies and much more traffic on the network.
>
> .net has file monitor which will trigger a function on change of file.
> You can use this to push a file on change.  If you do this copy it to the same file system (partition) then move it into place after so it is immediately copied.
>
> Ken Foskey
>
> On 21/05/2011, at 3:40 AM, Brian Sayatovic <bs...@creditinfonet.com> wrote:
>
>> How have folks gone about setting up Lucene in a server farm?  Just a
> network-accessible shared directory?
>>
>> Regards,
>> Brian.
>> ________________________________
>>
>> Learn more about the products, services and technology solutions
>> available from CIN Legal Data Services at:
>> www.cinlegal.com<http://www.cinlegal.com>
>>
>> This message may contain confidential / proprietary information from
> CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
>
> ---------------------------------------------------------
> Disclaimer
>
> This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.
>
> Oxford Analytica Ltd
> Registered in England: No. 1196703
> 5 Alfred Street, Oxford
> United Kingdom, OX1 4EH
> ---------------------------------------------------------
>
> ________________________________
>
> Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>
>
> This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

RE: [Lucene.Net] Server farm sharing Lucene

Posted by Brian Sayatovic <bs...@creditinfonet.com>.
Interesting!

Right now, the index updating occurs on the same thread where the DB write is occurring.  This is nice in that we have little room for one to happen without the other.  With a dedicated search server, I'd have to see pushing the update off to that other server via a message queue, perhaps, and then the ability have all servers in the farm query through it.

Still, I'd worry about fail over.  We have some other failover strategies where every server in the farm is capable of a function, but only one server is actively doing it.  But each server periodically checks if any other server still has an "active claim" (i.e. not too old) and if not, it will pick up.  So in the event one server fails, another in the farm takes over.

Perhaps I could marry the two.

But, as said earlier in this thread, I won't prematurely optimize.

-----Original Message-----
From: Moray McConnachie [mailto:mmcconna@oxford-analytica.com]
Sent: Monday, May 23, 2011 4:40 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: [Lucene.Net] Server farm sharing Lucene

If your traffic is high enough to warrant the server farm, and search is a highly used feature, it is also worth thinking about a dedicated search server (or pair of such synced as suggested by Ken and/or separately driven by your publishing tools depending on the degree of redundancy and failsafe you need).

We use a dedicated search server as a service, running a custom wrapper
- we pass a Lucene Query across the network using .NET Remoting - binary-serialization over TCP (stay away from other forms of serialization unless you have lots of resources to throw at search and lots of bandwidth), returning a custom object containing the results and other assorted metadata, including faceting.

.NET remoting is a joy in this context, you only need to be careful about version synchronisation - upgrades need to be carefully planned so that servers with e.g. an upgraded Lucene only talk to a search server with an upgraded Lucene.

Yours,
Moray
-------------------------------------
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: Ken Foskey [mailto:kfoskey@tpg.com.au]
Sent: 21 May 2011 00:25
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] Server farm sharing Lucene

Shared directory means network so you have two latencies and much more traffic on the network.

.net has file monitor which will trigger a function on change of file.
You can use this to push a file on change.  If you do this copy it to the same file system (partition) then move it into place after so it is immediately copied.

Ken Foskey

On 21/05/2011, at 3:40 AM, Brian Sayatovic <bs...@creditinfonet.com> wrote:

> How have folks gone about setting up Lucene in a server farm?  Just a
network-accessible shared directory?
>
> Regards,
> Brian.
> ________________________________
>
> Learn more about the products, services and technology solutions
> available from CIN Legal Data Services at:
> www.cinlegal.com<http://www.cinlegal.com>
>
> This message may contain confidential / proprietary information from
CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

---------------------------------------------------------
Disclaimer

This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

Oxford Analytica Ltd
Registered in England: No. 1196703
5 Alfred Street, Oxford
United Kingdom, OX1 4EH
---------------------------------------------------------

________________________________

Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

RE: [Lucene.Net] Server farm sharing Lucene

Posted by Moray McConnachie <mm...@oxford-analytica.com>.
If your traffic is high enough to warrant the server farm, and search is
a highly used feature, it is also worth thinking about a dedicated
search server (or pair of such synced as suggested by Ken and/or
separately driven by your publishing tools depending on the degree of
redundancy and failsafe you need).

We use a dedicated search server as a service, running a custom wrapper
- we pass a Lucene Query across the network using .NET Remoting -
binary-serialization over TCP (stay away from other forms of
serialization unless you have lots of resources to throw at search and
lots of bandwidth), returning a custom object containing the results and
other assorted metadata, including faceting.

.NET remoting is a joy in this context, you only need to be careful
about version synchronisation - upgrades need to be carefully planned so
that servers with e.g. an upgraded Lucene only talk to a search server
with an upgraded Lucene.

Yours,
Moray
-------------------------------------
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: Ken Foskey [mailto:kfoskey@tpg.com.au] 
Sent: 21 May 2011 00:25
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] Server farm sharing Lucene

Shared directory means network so you have two latencies and much more
traffic on the network.

.net has file monitor which will trigger a function on change of file.
You can use this to push a file on change.  If you do this copy it to
the same file system (partition) then move it into place after so it is
immediately copied.

Ken Foskey

On 21/05/2011, at 3:40 AM, Brian Sayatovic
<bs...@creditinfonet.com> wrote:

> How have folks gone about setting up Lucene in a server farm?  Just a
network-accessible shared directory?
> 
> Regards,
> Brian.
> ________________________________
> 
> Learn more about the products, services and technology solutions 
> available from CIN Legal Data Services at: 
> www.cinlegal.com<http://www.cinlegal.com>
> 
> This message may contain confidential / proprietary information from
CIN Legal Data Service and Credit Infonet, Inc.. If you are not an
intended recipient, please refrain from the disclosure, copying,
distribution or use of this information. All such unauthorized actions
are strictly prohibited. If you have received this transmission in
error, please notify the sender by e-mail at
bsayatovic@creditinfonet.com and delete all copies of this material from
any computer.

---------------------------------------------------------
Disclaimer 

This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

Oxford Analytica Ltd
Registered in England: No. 1196703
5 Alfred Street, Oxford
United Kingdom, OX1 4EH
---------------------------------------------------------


Re: [Lucene.Net] Server farm sharing Lucene

Posted by Ken Foskey <kf...@tpg.com.au>.
Shared directory means network so you have two latencies and much more traffic on the network.

.net has file monitor which will trigger a function on change of file.  You can use this to push a file on change.  If you do this copy it to the same file system (partition) then move it into place after so it is immediately copied.

Ken Foskey

On 21/05/2011, at 3:40 AM, Brian Sayatovic <bs...@creditinfonet.com> wrote:

> How have folks gone about setting up Lucene in a server farm?  Just a network-accessible shared directory?
> 
> Regards,
> Brian.
> ________________________________
> 
> Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>
> 
> This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

Re: [Lucene.Net] Server farm sharing Lucene

Posted by Gustavo Sandrigo <gu...@gmail.com>.
Brian
I am dealing with a similar situation as the liveliness of the index.
I am looking for options to deal with this and so far I have found this open
source project that liked-in created.
Look at the documentation, it has a very nice way of dealing with this
issue.

http://sna-projects.com/zoie/

I guess I will need to build my own implementation of the idea, or see about
getting people to help with porting this to .net

I hope this helps is some way.


On Fri, May 20, 2011 at 12:24 PM, Shashi Kant <sk...@sloan.mit.edu> wrote:

> Not a direct answer, but have you looked at Elastic search?
> http://www.elasticsearch.org/
>
>
> On Fri, May 20, 2011 at 2:44 PM, Ben West <bw...@yahoo.com> wrote:
> > The idea of a scheduled task was just a very simple one. I think
> Microsoft's DFS is a glorified form of this: it just listens for changes on
> one server and copies them over to the others. I'm sure you can find many
> other tools which do something similar. You would need to check
> IndexReader.IsCurrent periodically, but I guess you must already be doing
> that.
> >
> > Also: premature optimization is the root of all evil. If you don't have
> any problems with how it works now, don't let me confuse you into creating
> some :-) For a smallish index, all but the most egregious misuses of Lucene
> are still pretty fast.
> >
> > -Ben
> >
> >
> > ----- Original Message -----
> > From: Brian Sayatovic <bs...@creditinfonet.com>
> > To: "lucene-net-user@lucene.apache.org" <
> lucene-net-user@lucene.apache.org>; Ben West <bw...@yahoo.com>
> > Cc:
> > Sent: Friday, May 20, 2011 1:28 PM
> > Subject: RE: [Lucene.Net] Server farm sharing Lucene
> >
> > I'm also concerned with the "liveliness".  We have index updates
> happening in conjunction with writes to our database.  Thus, if a user
> creates a record, it's instantly indexed.  That means they can create an
> entry and instantly search for it.
> >
> > If I were to schedule period index updates, they wouldn't' be able to do
> this.
> >
> > Thus far, our dozens of developers have been all sharing a network
> accessible index in this manner.  No one has complained, but then again,
> we're not yet focusing on performance of search (many other concerns in
> front of that).
> >
> > Based on your statements, I may need to re-prioritize the risk
> mitigation.
> >
> > Regards,
> > Brian.
> >
> > -----Original Message-----
> > From: Ben West [mailto:bwsithspawn00@yahoo.com]
> > Sent: Friday, May 20, 2011 2:07 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: [Lucene.Net] Server farm sharing Lucene
> >
> > The Lucene FAQ (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed)
> specifically warns against using remote file systems. Depending on what you
> mean by "network-accessible", it could be a lot slower. You (probably) want
> something of the form: the data is stored locally, but is updated
> periodically from a remote location. The simplest thing is a scheduled task
> which just copies over the new index every day at midnight.
> >
> > Even with an ideal filesystem, you're going to have to deal with paying
> an additional warmup penalty that you wouldn't get in an NRT configuration.
> >
> > Another thing to note is that, while it's very easy to have multiple
> readers, it is really hard to have multiple IndexWriters. We just have one
> writer, and deal with the fact that it's not highly available.
> >
> > Hope this helps,
> > -Ben
> >
> > ----- Original Message -----
> > From: Brian Sayatovic <bs...@creditinfonet.com>
> > To: "lucene-net-user@lucene.apache.org" <
> lucene-net-user@lucene.apache.org>
> > Cc:
> > Sent: Friday, May 20, 2011 12:40 PM
> > Subject: [Lucene.Net] Server farm sharing Lucene
> >
> > How have folks gone about setting up Lucene in a server farm?  Just a
> network-accessible shared directory?
> >
> > Regards,
> > Brian.
> > ________________________________
> >
> > Learn more about the products, services and technology solutions
> available from CIN Legal Data Services at: www.cinlegal.com<
> http://www.cinlegal.com>
> >
> > This message may contain confidential / proprietary information from CIN
> Legal Data Service and Credit Infonet, Inc.. If you are not an intended
> recipient, please refrain from the disclosure, copying, distribution or use
> of this information. All such unauthorized actions are strictly prohibited.
> If you have received this transmission in error, please notify the sender by
> e-mail at bsayatovic@creditinfonet.com and delete all copies of this
> material from any computer.
> >
> > ________________________________
> >
> > Learn more about the products, services and technology solutions
> available from CIN Legal Data Services at: www.cinlegal.com<
> http://www.cinlegal.com>
> >
> > This message may contain confidential / proprietary information from CIN
> Legal Data Service and Credit Infonet, Inc.. If you are not an intended
> recipient, please refrain from the disclosure, copying, distribution or use
> of this information. All such unauthorized actions are strictly prohibited.
> If you have received this transmission in error, please notify the sender by
> e-mail at bsayatovic@creditinfonet.com and delete all copies of this
> material from any computer.
> >
> >
>

Re: [Lucene.Net] Server farm sharing Lucene

Posted by Shashi Kant <sk...@sloan.mit.edu>.
Not a direct answer, but have you looked at Elastic search?
http://www.elasticsearch.org/


On Fri, May 20, 2011 at 2:44 PM, Ben West <bw...@yahoo.com> wrote:
> The idea of a scheduled task was just a very simple one. I think Microsoft's DFS is a glorified form of this: it just listens for changes on one server and copies them over to the others. I'm sure you can find many other tools which do something similar. You would need to check IndexReader.IsCurrent periodically, but I guess you must already be doing that.
>
> Also: premature optimization is the root of all evil. If you don't have any problems with how it works now, don't let me confuse you into creating some :-) For a smallish index, all but the most egregious misuses of Lucene are still pretty fast.
>
> -Ben
>
>
> ----- Original Message -----
> From: Brian Sayatovic <bs...@creditinfonet.com>
> To: "lucene-net-user@lucene.apache.org" <lu...@lucene.apache.org>; Ben West <bw...@yahoo.com>
> Cc:
> Sent: Friday, May 20, 2011 1:28 PM
> Subject: RE: [Lucene.Net] Server farm sharing Lucene
>
> I'm also concerned with the "liveliness".  We have index updates happening in conjunction with writes to our database.  Thus, if a user creates a record, it's instantly indexed.  That means they can create an entry and instantly search for it.
>
> If I were to schedule period index updates, they wouldn't' be able to do this.
>
> Thus far, our dozens of developers have been all sharing a network accessible index in this manner.  No one has complained, but then again, we're not yet focusing on performance of search (many other concerns in front of that).
>
> Based on your statements, I may need to re-prioritize the risk mitigation.
>
> Regards,
> Brian.
>
> -----Original Message-----
> From: Ben West [mailto:bwsithspawn00@yahoo.com]
> Sent: Friday, May 20, 2011 2:07 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] Server farm sharing Lucene
>
> The Lucene FAQ (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) specifically warns against using remote file systems. Depending on what you mean by "network-accessible", it could be a lot slower. You (probably) want something of the form: the data is stored locally, but is updated periodically from a remote location. The simplest thing is a scheduled task which just copies over the new index every day at midnight.
>
> Even with an ideal filesystem, you're going to have to deal with paying an additional warmup penalty that you wouldn't get in an NRT configuration.
>
> Another thing to note is that, while it's very easy to have multiple readers, it is really hard to have multiple IndexWriters. We just have one writer, and deal with the fact that it's not highly available.
>
> Hope this helps,
> -Ben
>
> ----- Original Message -----
> From: Brian Sayatovic <bs...@creditinfonet.com>
> To: "lucene-net-user@lucene.apache.org" <lu...@lucene.apache.org>
> Cc:
> Sent: Friday, May 20, 2011 12:40 PM
> Subject: [Lucene.Net] Server farm sharing Lucene
>
> How have folks gone about setting up Lucene in a server farm?  Just a network-accessible shared directory?
>
> Regards,
> Brian.
> ________________________________
>
> Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>
>
> This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
>
> ________________________________
>
> Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>
>
> This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.
>
>

Re: [Lucene.Net] Server farm sharing Lucene

Posted by Ben West <bw...@yahoo.com>.
The idea of a scheduled task was just a very simple one. I think Microsoft's DFS is a glorified form of this: it just listens for changes on one server and copies them over to the others. I'm sure you can find many other tools which do something similar. You would need to check IndexReader.IsCurrent periodically, but I guess you must already be doing that.

Also: premature optimization is the root of all evil. If you don't have any problems with how it works now, don't let me confuse you into creating some :-) For a smallish index, all but the most egregious misuses of Lucene are still pretty fast. 

-Ben


----- Original Message -----
From: Brian Sayatovic <bs...@creditinfonet.com>
To: "lucene-net-user@lucene.apache.org" <lu...@lucene.apache.org>; Ben West <bw...@yahoo.com>
Cc: 
Sent: Friday, May 20, 2011 1:28 PM
Subject: RE: [Lucene.Net] Server farm sharing Lucene

I'm also concerned with the "liveliness".  We have index updates happening in conjunction with writes to our database.  Thus, if a user creates a record, it's instantly indexed.  That means they can create an entry and instantly search for it.

If I were to schedule period index updates, they wouldn't' be able to do this.

Thus far, our dozens of developers have been all sharing a network accessible index in this manner.  No one has complained, but then again, we're not yet focusing on performance of search (many other concerns in front of that).

Based on your statements, I may need to re-prioritize the risk mitigation.

Regards,
Brian.

-----Original Message-----
From: Ben West [mailto:bwsithspawn00@yahoo.com]
Sent: Friday, May 20, 2011 2:07 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] Server farm sharing Lucene

The Lucene FAQ (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) specifically warns against using remote file systems. Depending on what you mean by "network-accessible", it could be a lot slower. You (probably) want something of the form: the data is stored locally, but is updated periodically from a remote location. The simplest thing is a scheduled task which just copies over the new index every day at midnight.

Even with an ideal filesystem, you're going to have to deal with paying an additional warmup penalty that you wouldn't get in an NRT configuration.

Another thing to note is that, while it's very easy to have multiple readers, it is really hard to have multiple IndexWriters. We just have one writer, and deal with the fact that it's not highly available.

Hope this helps,
-Ben

----- Original Message -----
From: Brian Sayatovic <bs...@creditinfonet.com>
To: "lucene-net-user@lucene.apache.org" <lu...@lucene.apache.org>
Cc:
Sent: Friday, May 20, 2011 12:40 PM
Subject: [Lucene.Net] Server farm sharing Lucene

How have folks gone about setting up Lucene in a server farm?  Just a network-accessible shared directory?

Regards,
Brian.
________________________________

Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

________________________________

Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.


RE: [Lucene.Net] Server farm sharing Lucene

Posted by Brian Sayatovic <bs...@creditinfonet.com>.
I'm also concerned with the "liveliness".  We have index updates happening in conjunction with writes to our database.  Thus, if a user creates a record, it's instantly indexed.  That means they can create an entry and instantly search for it.

If I were to schedule period index updates, they wouldn't' be able to do this.

Thus far, our dozens of developers have been all sharing a network accessible index in this manner.  No one has complained, but then again, we're not yet focusing on performance of search (many other concerns in front of that).

Based on your statements, I may need to re-prioritize the risk mitigation.

Regards,
Brian.

-----Original Message-----
From: Ben West [mailto:bwsithspawn00@yahoo.com]
Sent: Friday, May 20, 2011 2:07 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] Server farm sharing Lucene

The Lucene FAQ (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) specifically warns against using remote file systems. Depending on what you mean by "network-accessible", it could be a lot slower. You (probably) want something of the form: the data is stored locally, but is updated periodically from a remote location. The simplest thing is a scheduled task which just copies over the new index every day at midnight.

Even with an ideal filesystem, you're going to have to deal with paying an additional warmup penalty that you wouldn't get in an NRT configuration.

Another thing to note is that, while it's very easy to have multiple readers, it is really hard to have multiple IndexWriters. We just have one writer, and deal with the fact that it's not highly available.

Hope this helps,
-Ben

----- Original Message -----
From: Brian Sayatovic <bs...@creditinfonet.com>
To: "lucene-net-user@lucene.apache.org" <lu...@lucene.apache.org>
Cc:
Sent: Friday, May 20, 2011 12:40 PM
Subject: [Lucene.Net] Server farm sharing Lucene

How have folks gone about setting up Lucene in a server farm?  Just a network-accessible shared directory?

Regards,
Brian.
________________________________

Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

________________________________

Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.

Re: [Lucene.Net] Server farm sharing Lucene

Posted by Ben West <bw...@yahoo.com>.
The Lucene FAQ (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) specifically warns against using remote file systems. Depending on what you mean by "network-accessible", it could be a lot slower. You (probably) want something of the form: the data is stored locally, but is updated periodically from a remote location. The simplest thing is a scheduled task which just copies over the new index every day at midnight. 

Even with an ideal filesystem, you're going to have to deal with paying an additional warmup penalty that you wouldn't get in an NRT configuration.

Another thing to note is that, while it's very easy to have multiple readers, it is really hard to have multiple IndexWriters. We just have one writer, and deal with the fact that it's not highly available. 

Hope this helps,
-Ben

----- Original Message -----
From: Brian Sayatovic <bs...@creditinfonet.com>
To: "lucene-net-user@lucene.apache.org" <lu...@lucene.apache.org>
Cc: 
Sent: Friday, May 20, 2011 12:40 PM
Subject: [Lucene.Net] Server farm sharing Lucene

How have folks gone about setting up Lucene in a server farm?  Just a network-accessible shared directory?

Regards,
Brian.
________________________________

Learn more about the products, services and technology solutions available from CIN Legal Data Services at: www.cinlegal.com<http://www.cinlegal.com>

This message may contain confidential / proprietary information from CIN Legal Data Service and Credit Infonet, Inc.. If you are not an intended recipient, please refrain from the disclosure, copying, distribution or use of this information. All such unauthorized actions are strictly prohibited. If you have received this transmission in error, please notify the sender by e-mail at bsayatovic@creditinfonet.com and delete all copies of this material from any computer.