You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sol myr <so...@gmail.com> on 2011/03/22 09:30:34 UTC

Distributing a Lucene application?

Hi,
What are my options for distributing an application that uses Lucene?

Our current application works against a database of INVENTORY. We schedule
hourly checks for modified items (timestamp-based), and update a single
Lucene index.
Now we want to distribute out application, to a Grid, with failover, and a
bit of data sharing:
Say we have 2 branches - New York and Los Angeles.

(1) Inventory of the NY branch is handled by 2 application servers, and 2
database copies. They are exact replicates, for failover/load balance.
Similarly, the LA branch gets 2 application servers and 2 databases.

(2) 90% of the time, each branch "minds its own business" and isn't
interested in the other branch's inventory.
However on rare occasions, an LA administrator needs to search the NY
inventory (we can compromise on data freshness, e.g. show data 10 hours
old).

Does Lucene have built-in support for any of this?
If I'm to do this "from scratch" I'll probably just let each application
server maintain its own copy of Lucene index (with data only from its own
city, and hourly updates as before).
And for the requirement of "LA admin searching the NY inventory" I'd
schedule a task to copy the NY index into the LA server, every 10 hours.

Is this a reasonable approach? Or are there Lucene-management tools that
would handle it better?
Thanks :)

Re: Distributing a Lucene application?

Posted by sol myr <so...@gmail.com>.
Thanks very much, sounds great :)

On Thu, Mar 24, 2011 at 9:13 PM, Chris Lu <ch...@gmail.com> wrote:

> It's great that the requirement is loose...
> But I suppose users would ask for more later.
>
> Well, I worked on DBSight, which covers more than just search. It also
> includes scheduling indexing, reindexing, and even rendering.
> In your case, you just need to specify a SQL and have index up and running
> in several minutes.
> You can even embed a widget to put search UI to any page.
>
> btw, DBSight also has facet search.
>
>
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>
>
> On 2011/3/23 0:57, sol myr wrote:
>
>> Thanks :)
>> Thankfully we don't delete from the database - just mark items as
>> "inactive"
>> (actual delete occurs only in a yearly cleanup process).
>> We can live with inaccurate results, including deleted/inactive items.
>>
>> Have you used DBSight? Would you mind sharing your opinion - did you like
>> it
>> better than, say, SOLR?
>> Thanks :)
>>
>>
>> On Tue, Mar 22, 2011 at 7:06 PM, Chris Lu<ch...@gmail.com>  wrote:
>>
>>  Each database having its own index should be fine.
>>> However, just checking modified timestamp may not be enough, since there
>>> could be items deleted.
>>>
>>> You can check DBSight for this purpose. It can do remote index
>>> replication
>>> across WAN.
>>> But, if the NY index is synchronized before NY database does, this means
>>> the NY index could return results not found in NY database, correct?
>>>
>>> --
>>> Chris Lu
>>> -------------------------
>>> Instant Scalable Full-Text Search On Any Database/Application
>>> site: http://www.dbsight.net
>>> demo: http://search.dbsight.com
>>> Lucene Database Search in 3 minutes:
>>>
>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>
>>>
>>>
>>> On 3/22/2011 1:30 AM, sol myr wrote:
>>>
>>>  Hi,
>>>> What are my options for distributing an application that uses Lucene?
>>>>
>>>> Our current application works against a database of INVENTORY. We
>>>> schedule
>>>> hourly checks for modified items (timestamp-based), and update a single
>>>> Lucene index.
>>>> Now we want to distribute out application, to a Grid, with failover, and
>>>> a
>>>> bit of data sharing:
>>>> Say we have 2 branches - New York and Los Angeles.
>>>>
>>>> (1) Inventory of the NY branch is handled by 2 application servers, and
>>>> 2
>>>> database copies. They are exact replicates, for failover/load balance.
>>>> Similarly, the LA branch gets 2 application servers and 2 databases.
>>>>
>>>> (2) 90% of the time, each branch "minds its own business" and isn't
>>>> interested in the other branch's inventory.
>>>> However on rare occasions, an LA administrator needs to search the NY
>>>> inventory (we can compromise on data freshness, e.g. show data 10 hours
>>>> old).
>>>>
>>>> Does Lucene have built-in support for any of this?
>>>> If I'm to do this "from scratch" I'll probably just let each application
>>>> server maintain its own copy of Lucene index (with data only from its
>>>> own
>>>> city, and hourly updates as before).
>>>> And for the requirement of "LA admin searching the NY inventory" I'd
>>>> schedule a task to copy the NY index into the LA server, every 10 hours.
>>>>
>>>> Is this a reasonable approach? Or are there Lucene-management tools that
>>>> would handle it better?
>>>> Thanks :)
>>>>
>>>>
>>>>  ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Distributing a Lucene application?

Posted by Chris Lu <ch...@gmail.com>.
It's great that the requirement is loose...
But I suppose users would ask for more later.

Well, I worked on DBSight, which covers more than just search. It also includes scheduling indexing, reindexing, and even rendering.
In your case, you just need to specify a SQL and have index up and running in several minutes.
You can even embed a widget to put search UI to any page.

btw, DBSight also has facet search.

Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes


On 2011/3/23 0:57, sol myr wrote:
> Thanks :)
> Thankfully we don't delete from the database - just mark items as "inactive"
> (actual delete occurs only in a yearly cleanup process).
> We can live with inaccurate results, including deleted/inactive items.
>
> Have you used DBSight? Would you mind sharing your opinion - did you like it
> better than, say, SOLR?
> Thanks :)
>
>
> On Tue, Mar 22, 2011 at 7:06 PM, Chris Lu<ch...@gmail.com>  wrote:
>
>> Each database having its own index should be fine.
>> However, just checking modified timestamp may not be enough, since there
>> could be items deleted.
>>
>> You can check DBSight for this purpose. It can do remote index replication
>> across WAN.
>> But, if the NY index is synchronized before NY database does, this means
>> the NY index could return results not found in NY database, correct?
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>
>>
>>
>> On 3/22/2011 1:30 AM, sol myr wrote:
>>
>>> Hi,
>>> What are my options for distributing an application that uses Lucene?
>>>
>>> Our current application works against a database of INVENTORY. We schedule
>>> hourly checks for modified items (timestamp-based), and update a single
>>> Lucene index.
>>> Now we want to distribute out application, to a Grid, with failover, and a
>>> bit of data sharing:
>>> Say we have 2 branches - New York and Los Angeles.
>>>
>>> (1) Inventory of the NY branch is handled by 2 application servers, and 2
>>> database copies. They are exact replicates, for failover/load balance.
>>> Similarly, the LA branch gets 2 application servers and 2 databases.
>>>
>>> (2) 90% of the time, each branch "minds its own business" and isn't
>>> interested in the other branch's inventory.
>>> However on rare occasions, an LA administrator needs to search the NY
>>> inventory (we can compromise on data freshness, e.g. show data 10 hours
>>> old).
>>>
>>> Does Lucene have built-in support for any of this?
>>> If I'm to do this "from scratch" I'll probably just let each application
>>> server maintain its own copy of Lucene index (with data only from its own
>>> city, and hourly updates as before).
>>> And for the requirement of "LA admin searching the NY inventory" I'd
>>> schedule a task to copy the NY index into the LA server, every 10 hours.
>>>
>>> Is this a reasonable approach? Or are there Lucene-management tools that
>>> would handle it better?
>>> Thanks :)
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Distributing a Lucene application?

Posted by sol myr <so...@gmail.com>.
Thanks :)
Thankfully we don't delete from the database - just mark items as "inactive"
(actual delete occurs only in a yearly cleanup process).
We can live with inaccurate results, including deleted/inactive items.

Have you used DBSight? Would you mind sharing your opinion - did you like it
better than, say, SOLR?
Thanks :)


On Tue, Mar 22, 2011 at 7:06 PM, Chris Lu <ch...@gmail.com> wrote:

> Each database having its own index should be fine.
> However, just checking modified timestamp may not be enough, since there
> could be items deleted.
>
> You can check DBSight for this purpose. It can do remote index replication
> across WAN.
> But, if the NY index is synchronized before NY database does, this means
> the NY index could return results not found in NY database, correct?
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>
>
>
> On 3/22/2011 1:30 AM, sol myr wrote:
>
>> Hi,
>> What are my options for distributing an application that uses Lucene?
>>
>> Our current application works against a database of INVENTORY. We schedule
>> hourly checks for modified items (timestamp-based), and update a single
>> Lucene index.
>> Now we want to distribute out application, to a Grid, with failover, and a
>> bit of data sharing:
>> Say we have 2 branches - New York and Los Angeles.
>>
>> (1) Inventory of the NY branch is handled by 2 application servers, and 2
>> database copies. They are exact replicates, for failover/load balance.
>> Similarly, the LA branch gets 2 application servers and 2 databases.
>>
>> (2) 90% of the time, each branch "minds its own business" and isn't
>> interested in the other branch's inventory.
>> However on rare occasions, an LA administrator needs to search the NY
>> inventory (we can compromise on data freshness, e.g. show data 10 hours
>> old).
>>
>> Does Lucene have built-in support for any of this?
>> If I'm to do this "from scratch" I'll probably just let each application
>> server maintain its own copy of Lucene index (with data only from its own
>> city, and hourly updates as before).
>> And for the requirement of "LA admin searching the NY inventory" I'd
>> schedule a task to copy the NY index into the LA server, every 10 hours.
>>
>> Is this a reasonable approach? Or are there Lucene-management tools that
>> would handle it better?
>> Thanks :)
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Distributing a Lucene application?

Posted by Chris Lu <ch...@gmail.com>.
Each database having its own index should be fine.
However, just checking modified timestamp may not be enough, since there 
could be items deleted.

You can check DBSight for this purpose. It can do remote index 
replication across WAN.
But, if the NY index is synchronized before NY database does, this means 
the NY index could return results not found in NY database, correct?

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes


On 3/22/2011 1:30 AM, sol myr wrote:
> Hi,
> What are my options for distributing an application that uses Lucene?
>
> Our current application works against a database of INVENTORY. We schedule
> hourly checks for modified items (timestamp-based), and update a single
> Lucene index.
> Now we want to distribute out application, to a Grid, with failover, and a
> bit of data sharing:
> Say we have 2 branches - New York and Los Angeles.
>
> (1) Inventory of the NY branch is handled by 2 application servers, and 2
> database copies. They are exact replicates, for failover/load balance.
> Similarly, the LA branch gets 2 application servers and 2 databases.
>
> (2) 90% of the time, each branch "minds its own business" and isn't
> interested in the other branch's inventory.
> However on rare occasions, an LA administrator needs to search the NY
> inventory (we can compromise on data freshness, e.g. show data 10 hours
> old).
>
> Does Lucene have built-in support for any of this?
> If I'm to do this "from scratch" I'll probably just let each application
> server maintain its own copy of Lucene index (with data only from its own
> city, and hourly updates as before).
> And for the requirement of "LA admin searching the NY inventory" I'd
> schedule a task to copy the NY index into the LA server, every 10 hours.
>
> Is this a reasonable approach? Or are there Lucene-management tools that
> would handle it better?
> Thanks :)
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Distributing a Lucene application?

Posted by Johannes Zillmann <jz...@googlemail.com>.
Can have a look at http://katta.sourceforge.net/ or http://wiki.apache.org/solr/SolrCollectionDistributionScripts

HTH
Johannes

On Mar 22, 2011, at 9:30 AM, sol myr wrote:

> Hi,
> What are my options for distributing an application that uses Lucene?
> 
> Our current application works against a database of INVENTORY. We schedule
> hourly checks for modified items (timestamp-based), and update a single
> Lucene index.
> Now we want to distribute out application, to a Grid, with failover, and a
> bit of data sharing:
> Say we have 2 branches - New York and Los Angeles.
> 
> (1) Inventory of the NY branch is handled by 2 application servers, and 2
> database copies. They are exact replicates, for failover/load balance.
> Similarly, the LA branch gets 2 application servers and 2 databases.
> 
> (2) 90% of the time, each branch "minds its own business" and isn't
> interested in the other branch's inventory.
> However on rare occasions, an LA administrator needs to search the NY
> inventory (we can compromise on data freshness, e.g. show data 10 hours
> old).
> 
> Does Lucene have built-in support for any of this?
> If I'm to do this "from scratch" I'll probably just let each application
> server maintain its own copy of Lucene index (with data only from its own
> city, and hourly updates as before).
> And for the requirement of "LA admin searching the NY inventory" I'd
> schedule a task to copy the NY index into the LA server, every 10 hours.
> 
> Is this a reasonable approach? Or are there Lucene-management tools that
> would handle it better?
> Thanks :)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org