You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by K a r n a v <ka...@gmail.com> on 2011/02/03 13:15:58 UTC

partial update over lucene index

I want to do partial update over lucene index (rather than doing complete
indexing everytime i want to do only partial update)
how can i do that...
please give me idea or atleast give me the supporting links to guide me.

-- 
*Thanks & Regards*,
*Karunaker Reddy V
*

Re: partial update over lucene index

Posted by Troy Howard <th...@gmail.com>.
Wyatt (and all): 1 Lakh = 100,000

If the suggestions already provided do not help to solve your problem
(I think they would), another option is to use multiple indexes.

Suppose you start with a index that holds 1000 records. 15 minutes
later, you want to update the index with 500 new records, modify 100
records from the previous set, and delete 10 records from the previous
set.

You can make a new index, then add the new 500 records to it. Also add
the 100 modified records to it (with the latest information). In the
old index, delete both the 100 modified records and the 10 deleted
records. You can then use a MultiSearcher to search against two
indexes at the same time.

Every time you make these changes, you can add another index, and
delete from the old indexes, until you get to a time of low-traffic
where you can merge indexes and start over.

This is essentially what Lucene does in a single index, but you may
find it has lower impact on your site, and less latency to do in
multiple separate indexes.

Thanks,
Troy


On Thu, Feb 3, 2011 at 7:30 AM, Wyatt Barnett <wy...@gmail.com> wrote:
> I have no idea what 1lkh is, but let me take a stab at answering your
> question: the Trick here is to make sure you have some key in the
> record then, when the update happens, you:
>
> 1) lookup the record in the lucene index by key
> 2) remove the old record
> 3) add the new record
>
> Index should take it into account pretty much immediately and all is
> right with the world.
>
> On Thu, Feb 3, 2011 at 10:27 AM, K a r n a v <ka...@gmail.com> wrote:
>> ok let me given an example for partial updating..
>>
>> lets say i have 1 lakh records in my database table T1..and i'm trying to
>> index these records using lucene....
>> the query which retirives 1lakh records itself takes more than 10mins and
>> once i retrieve these records ..the lucene logic will do the indexing part
>> ..assume it may take 5 mins...so total 15 mins... ..if the T1 records are
>> growing very fast like...50k records will be added daily...and database
>> query wil take much time day by day....in the mean while the first 1lak
>> records will be updated i need to fetch all the creords( the preior 1lkh
>> records + new records) and need to index.... repeatedly i need to do this.
>>
>> the whole process is taking more than 4o mins...and the newly added records
>> are not listing in site becoz the index is updating after every 40mins are
>> 1hr. so this is not good...
>>
>> So what i want is....when a record is added/updated....at that time only i
>> want to update the index.......so my querying time is reduced and indexing
>> time also..... for this one....im asking how can i update/add new/delete the
>> existing docs ...
>> for the existing index...how can i add new records...
>>
>> give me an example for this....i mean example code..or idea...or atleast
>> websites links to guide me....
>>
>> thx in advance...
>>
>> On Thu, Feb 3, 2011 at 6:31 PM, digy digy <di...@gmail.com> wrote:
>>
>>> What do you mean by "partially updating" the index. This seems to be a
>>> XY-problem (http://www.perlmonks.org/index.pl?node_id=542341)
>>>
>>> DIGY
>>>
>>> On Thu, Feb 3, 2011 at 2:15 PM, K a r n a v <karunakerreddyv@gmail.com
>>> >wrote:
>>>
>>> > I want to do partial update over lucene index (rather than doing complete
>>> > indexing everytime i want to do only partial update)
>>> > how can i do that...
>>> > please give me idea or atleast give me the supporting links to guide me.
>>> >
>>> > --
>>> > *Thanks & Regards*,
>>> > *Karunaker Reddy V
>>> > *
>>> >
>>>
>>
>>
>>
>> --
>> *Thanks & Regards*,
>> *Karunaker Reddy V
>>
>> *http://www.flickr.com/photos/karnav/
>>
>> *Ooh!!*, and one more thing: *no matter who you are, you were built to be
>> brilliant and designed to make a difference in this world*.* PLEASE DOT IT*!
>>
>

Re: partial update over lucene index

Posted by Wyatt Barnett <wy...@gmail.com>.
I have no idea what 1lkh is, but let me take a stab at answering your
question: the Trick here is to make sure you have some key in the
record then, when the update happens, you:

1) lookup the record in the lucene index by key
2) remove the old record
3) add the new record

Index should take it into account pretty much immediately and all is
right with the world.

On Thu, Feb 3, 2011 at 10:27 AM, K a r n a v <ka...@gmail.com> wrote:
> ok let me given an example for partial updating..
>
> lets say i have 1 lakh records in my database table T1..and i'm trying to
> index these records using lucene....
> the query which retirives 1lakh records itself takes more than 10mins and
> once i retrieve these records ..the lucene logic will do the indexing part
> ..assume it may take 5 mins...so total 15 mins... ..if the T1 records are
> growing very fast like...50k records will be added daily...and database
> query wil take much time day by day....in the mean while the first 1lak
> records will be updated i need to fetch all the creords( the preior 1lkh
> records + new records) and need to index.... repeatedly i need to do this.
>
> the whole process is taking more than 4o mins...and the newly added records
> are not listing in site becoz the index is updating after every 40mins are
> 1hr. so this is not good...
>
> So what i want is....when a record is added/updated....at that time only i
> want to update the index.......so my querying time is reduced and indexing
> time also..... for this one....im asking how can i update/add new/delete the
> existing docs ...
> for the existing index...how can i add new records...
>
> give me an example for this....i mean example code..or idea...or atleast
> websites links to guide me....
>
> thx in advance...
>
> On Thu, Feb 3, 2011 at 6:31 PM, digy digy <di...@gmail.com> wrote:
>
>> What do you mean by "partially updating" the index. This seems to be a
>> XY-problem (http://www.perlmonks.org/index.pl?node_id=542341)
>>
>> DIGY
>>
>> On Thu, Feb 3, 2011 at 2:15 PM, K a r n a v <karunakerreddyv@gmail.com
>> >wrote:
>>
>> > I want to do partial update over lucene index (rather than doing complete
>> > indexing everytime i want to do only partial update)
>> > how can i do that...
>> > please give me idea or atleast give me the supporting links to guide me.
>> >
>> > --
>> > *Thanks & Regards*,
>> > *Karunaker Reddy V
>> > *
>> >
>>
>
>
>
> --
> *Thanks & Regards*,
> *Karunaker Reddy V
>
> *http://www.flickr.com/photos/karnav/
>
> *Ooh!!*, and one more thing: *no matter who you are, you were built to be
> brilliant and designed to make a difference in this world*.* PLEASE DOT IT*!
>

RE: partial update over lucene index

Posted by Digy <di...@gmail.com>.
I think I still don't understand your problem since it seems as easy as
"selecting recently updated rows from DB then invoking IndexWriter's
UpdateDocument in a loop".
If you don't have a unique key you may use the hash value of some of the DB
columns.

DIGY

-----Original Message-----
From: K a r n a v [mailto:karunakerreddyv@gmail.com] 
Sent: Thursday, February 03, 2011 5:27 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: partial update over lucene index

ok let me given an example for partial updating..

lets say i have 1 lakh records in my database table T1..and i'm trying to
index these records using lucene....
the query which retirives 1lakh records itself takes more than 10mins and
once i retrieve these records ..the lucene logic will do the indexing part
..assume it may take 5 mins...so total 15 mins... ..if the T1 records are
growing very fast like...50k records will be added daily...and database
query wil take much time day by day....in the mean while the first 1lak
records will be updated i need to fetch all the creords( the preior 1lkh
records + new records) and need to index.... repeatedly i need to do this.

the whole process is taking more than 4o mins...and the newly added records
are not listing in site becoz the index is updating after every 40mins are
1hr. so this is not good...

So what i want is....when a record is added/updated....at that time only i
want to update the index.......so my querying time is reduced and indexing
time also..... for this one....im asking how can i update/add new/delete the
existing docs ...
for the existing index...how can i add new records...

give me an example for this....i mean example code..or idea...or atleast
websites links to guide me....

thx in advance...

On Thu, Feb 3, 2011 at 6:31 PM, digy digy <di...@gmail.com> wrote:

> What do you mean by "partially updating" the index. This seems to be a
> XY-problem (http://www.perlmonks.org/index.pl?node_id=542341)
>
> DIGY
>
> On Thu, Feb 3, 2011 at 2:15 PM, K a r n a v <karunakerreddyv@gmail.com
> >wrote:
>
> > I want to do partial update over lucene index (rather than doing
complete
> > indexing everytime i want to do only partial update)
> > how can i do that...
> > please give me idea or atleast give me the supporting links to guide me.
> >
> > --
> > *Thanks & Regards*,
> > *Karunaker Reddy V
> > *
> >
>



-- 
*Thanks & Regards*,
*Karunaker Reddy V

*http://www.flickr.com/photos/karnav/

*Ooh!!*, and one more thing: *no matter who you are, you were built to be
brilliant and designed to make a difference in this world*.* PLEASE DOT IT*!


Re: partial update over lucene index

Posted by K a r n a v <ka...@gmail.com>.
ok let me given an example for partial updating..

lets say i have 1 lakh records in my database table T1..and i'm trying to
index these records using lucene....
the query which retirives 1lakh records itself takes more than 10mins and
once i retrieve these records ..the lucene logic will do the indexing part
..assume it may take 5 mins...so total 15 mins... ..if the T1 records are
growing very fast like...50k records will be added daily...and database
query wil take much time day by day....in the mean while the first 1lak
records will be updated i need to fetch all the creords( the preior 1lkh
records + new records) and need to index.... repeatedly i need to do this.

the whole process is taking more than 4o mins...and the newly added records
are not listing in site becoz the index is updating after every 40mins are
1hr. so this is not good...

So what i want is....when a record is added/updated....at that time only i
want to update the index.......so my querying time is reduced and indexing
time also..... for this one....im asking how can i update/add new/delete the
existing docs ...
for the existing index...how can i add new records...

give me an example for this....i mean example code..or idea...or atleast
websites links to guide me....

thx in advance...

On Thu, Feb 3, 2011 at 6:31 PM, digy digy <di...@gmail.com> wrote:

> What do you mean by "partially updating" the index. This seems to be a
> XY-problem (http://www.perlmonks.org/index.pl?node_id=542341)
>
> DIGY
>
> On Thu, Feb 3, 2011 at 2:15 PM, K a r n a v <karunakerreddyv@gmail.com
> >wrote:
>
> > I want to do partial update over lucene index (rather than doing complete
> > indexing everytime i want to do only partial update)
> > how can i do that...
> > please give me idea or atleast give me the supporting links to guide me.
> >
> > --
> > *Thanks & Regards*,
> > *Karunaker Reddy V
> > *
> >
>



-- 
*Thanks & Regards*,
*Karunaker Reddy V

*http://www.flickr.com/photos/karnav/

*Ooh!!*, and one more thing: *no matter who you are, you were built to be
brilliant and designed to make a difference in this world*.* PLEASE DOT IT*!

Re: partial update over lucene index

Posted by digy digy <di...@gmail.com>.
What do you mean by "partially updating" the index. This seems to be a
XY-problem (http://www.perlmonks.org/index.pl?node_id=542341)

DIGY

On Thu, Feb 3, 2011 at 2:15 PM, K a r n a v <ka...@gmail.com>wrote:

> I want to do partial update over lucene index (rather than doing complete
> indexing everytime i want to do only partial update)
> how can i do that...
> please give me idea or atleast give me the supporting links to guide me.
>
> --
> *Thanks & Regards*,
> *Karunaker Reddy V
> *
>

RE: partial update over lucene index

Posted by Josh Handel <Jo...@catapultsystems.com>.
How I handle this is that I look up a document that I want to update (ya that means bloating my index with some kind of PK that is meaningful to me) then I delete that document, then add the new one.. Its not a super fast approach.. But my index is only feed new and updated articles on a schedule.. And if the document is new I just add it.  And to keep my readers in step, I actually wrap my lucene indexes in a WCF service I lovingly call Rikene so my readers know when to refresh  based on index writes.  (my WCF service also does nightly merges as so the deleted data does get pruned out rather than bloating the index files un-necessarily)

I know that's pretty high level, but I hope that gets you pointed in the right direction.

Josh


-----Original Message-----
From: K a r n a v [mailto:karunakerreddyv@gmail.com] 
Sent: Thursday, February 03, 2011 6:16 AM
To: lucene-net-user@lucene.apache.org
Subject: partial update over lucene index

I want to do partial update over lucene index (rather than doing complete indexing everytime i want to do only partial update) how can i do that...
please give me idea or atleast give me the supporting links to guide me.

--
*Thanks & Regards*,
*Karunaker Reddy V
*