You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Eugeny N Dzhurinsky <eu...@jdevelop.com> on 2005/10/05 10:01:00 UTC

partial reindex

Is it possible somehow to change some partial fields in indexed documents
without reindexing all documents?

The thing is we have set of "searchable" documents and set of access
privileges (which builds the tree-like structure, i'e access privileges could be 
inherited from parent node) for these documents. I was supposed to provide 
some "keyword" when indexing documents, which keyword will be "flatten" rights, 
i'e privileges merged from parent nodes (if required), in the same way as it is
described in the Lucene in action book appendix (SecurityFilterTest).

But if some of parent nodes changes, I need to reindex underlying documents,
in the worst case - if root node is changed, whole set of documents needs to be
reindexed, which could take a lot of time.

So if it is possible to reindex only specified field in document index - it
will be great, if it is possible to reindex some specified field in specified
set of documents - it will do the best for me.

-- 
Eugene N Dzhurinsky

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: partial reindex

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Oct 5, 2005, at 8:56 AM, Eugeny N Dzhurinsky wrote:

> On Wed, Oct 05, 2005 at 08:38:21AM -0400, Erik Hatcher wrote:
>
>>> But could Lucene mix up 2 indexes in single query?
>>>
>> Using ParallelReader - yes.  Read the javadocs to learn more.
>>
>
> May be MultiReader? I didn't find ParallelReader in my API docs for  
> Lucene
> 1.4.3.

*arg* :)  As I said in my original message... ParallelReader is in  
Subversion trunk.  It does not exist in a released version of Lucene  
yet.  It will be part of 1.9.  The trunk of Subversion is quite  
stable - feel free to build Lucene from there and use it.

> let's say we have index of documents, each document has it's own  
> unique ID.
> and we have another index of ACLs for each document. Is it possible  
> to use
> some kind of "join" for the document index and for the ACL index,  
> and create
> search query, which will instantly validate the ACL for document,  
> using UID
> key in both document index and acl index?

This is basically what ParallelReader does - joins two parallel  
indexes that line up _exactly_ in document insertion order.

> I'm just trying to avoid post-filtering at all.

One strong recommendation I make regarding Lucene (and coding in  
general): don't pre-suppose things about performance.  Implement the  
solution in the cleanest most logical way, try it, and if there are  
issues, then tune.

> P.S. Really useful book, I like it.

Thank you!

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: partial reindex

Posted by Yonik Seeley <ys...@gmail.com>.
> May be MultiReader? I didn't find ParallelReader in my API docs for Lucene
1.4.3.

It's not in 1.4.3... you need to check out the latest 1.9 development
version from Subversion (the source code repository used now).


-Yonik
Now hiring -- http://tinyurl.com/7m67g


On 10/5/05, Eugeny N Dzhurinsky <eu...@jdevelop.com> wrote:
>
> On Wed, Oct 05, 2005 at 08:38:21AM -0400, Erik Hatcher wrote:
> > > But could Lucene mix up 2 indexes in single query?
> > Using ParallelReader - yes. Read the javadocs to learn more.
>
> May be MultiReader? I didn't find ParallelReader in my API docs for Lucene
> 1.4.3.
>
> Im'm trying to think in this way:
>
> let's say we have index of documents, each document has it's own unique
> ID.
> and we have another index of ACLs for each document. Is it possible to use
> some kind of "join" for the document index and for the ACL index, and
> create
> search query, which will instantly validate the ACL for document, using
> UID
> key in both document index and acl index?
>
> I'm just trying to avoid post-filtering at all.
>
> P.S. Really useful book, I like it.
>
> --
> Eugene N Dzhurinsky
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: partial reindex

Posted by Eugeny N Dzhurinsky <eu...@jdevelop.com>.
On Wed, Oct 05, 2005 at 08:38:21AM -0400, Erik Hatcher wrote:
> > But could Lucene mix up 2 indexes in single query?
> Using ParallelReader - yes.  Read the javadocs to learn more.

May be MultiReader? I didn't find ParallelReader in my API docs for Lucene
1.4.3.

Im'm trying to think in this way:

let's say we have index of documents, each document has it's own unique ID.
and we have another index of ACLs for each document. Is it possible to use
some kind of "join" for the document index and for the ACL index, and create
search query, which will instantly validate the ACL for document, using UID
key in both document index and acl index?

I'm just trying to avoid post-filtering at all.

P.S. Really useful book, I like it.

-- 
Eugene N Dzhurinsky

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: partial reindex

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Oct 5, 2005, at 7:38 AM, Eugeny N Dzhurinsky wrote:
> On Wed, Oct 05, 2005 at 07:03:45AM -0400, Erik Hatcher wrote:
>
>> On Oct 5, 2005, at 4:01 AM, Eugeny N Dzhurinsky wrote:
>>
>>> Is it possible somehow to change some partial fields in indexed
>>> documents without reindexing all documents?
>>>
>> No, not with Lucene 1.4.3.  But the Subversion trunk has a feature
>> that can facilitate this sort of thing by building two indexes, one
>> with the data and one with the security information.  Look at
>> ParallelReader and it's javadocs.
>>
>
> Ok, thanks, I will review things you mentioned. But could Lucene  
> mix up 2
> indexes in single query?

Using ParallelReader - yes.  Read the javadocs to learn more.

>> However, for data like permissions, ACL's, groups, etc, it may be
>> better to keep the information where it originally resides and have a
>> Filter that accesses the external data.  It would likely be easier
>> and quicker to re-instantiate Filter's than to rebuild a security
>> index and less duplication.
>>
>
> But what about the case if only 1 result is allowed to be displayed by
> application, but there are thousands of hits, which needs to be  
> filtered for
> ACLs?

Filters may be expensive to create, but once created and cached they  
are rapid.  Each user, for example, may have their own associated  
filter.  Or each group, or something like that.  You'd only need to  
rebuild the filters when permissions changed, but that may be better  
than rebuilding an index.

> As far as I understand, this can be the pain, because Lucene keeps  
> the results
> in memory, correct?

Results?!  As in Hits?  No.   Filters, yes, perhaps.  Architecturally  
that is your decision though.

> So it is better to find solution for pre-filtering rather
> than post-filtering, since last one could be resource-expensive.

I recommend trying it out and seeing if it works for you.

> May be you
> could suggest some another index/search engines?

I'm a little too biased for that :)

     Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: partial reindex

Posted by Eugeny N Dzhurinsky <eu...@jdevelop.com>.
On Wed, Oct 05, 2005 at 07:03:45AM -0400, Erik Hatcher wrote:
> On Oct 5, 2005, at 4:01 AM, Eugeny N Dzhurinsky wrote:
> >Is it possible somehow to change some partial fields in indexed  
> >documents without reindexing all documents?
> No, not with Lucene 1.4.3.  But the Subversion trunk has a feature  
> that can facilitate this sort of thing by building two indexes, one  
> with the data and one with the security information.  Look at  
> ParallelReader and it's javadocs.

Ok, thanks, I will review things you mentioned. But could Lucene mix up 2
indexes in single query?

> >The thing is we have set of "searchable" documents and set of access
> >privileges (which builds the tree-like structure, i'e access  
> >privileges could be
> >inherited from parent node) for these documents. I was supposed to  
> >provide
> >some "keyword" when indexing documents, which keyword will be  
> >"flatten" rights,
> >i'e privileges merged from parent nodes (if required), in the same  
> >way as it is
> >described in the Lucene in action book appendix (SecurityFilterTest).
> 
> However, for data like permissions, ACL's, groups, etc, it may be  
> better to keep the information where it originally resides and have a  
> Filter that accesses the external data.  It would likely be easier  
> and quicker to re-instantiate Filter's than to rebuild a security  
> index and less duplication.

But what about the case if only 1 result is allowed to be displayed by
application, but there are thousands of hits, which needs to be filtered for
ACLs?

As far as I understand, this can be the pain, because Lucene keeps the results
in memory, correct? So it is better to find solution for pre-filtering rather
than post-filtering, since last one could be resource-expensive. May be you
could suggest some another index/search engines?

-- 
Eugene N Dzhurinsky

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: partial reindex

Posted by houyang <hu...@oracle.com>.
Hi Erik
Practically it is a big performance issue by access external security data through filter. And it could take even longer than the end user could wait.
It is true that it is an extra cost to rebuild the security index but it seems there are no other better options.

Regards,
hui
-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Wednesday, October 05, 2005 4:04 AM
To: java-user@lucene.apache.org
Subject: Re: partial reindex


On Oct 5, 2005, at 4:01 AM, Eugeny N Dzhurinsky wrote:
> Is it possible somehow to change some partial fields in indexed
> documents
> without reindexing all documents?

No, not with Lucene 1.4.3.  But the Subversion trunk has a feature
that can facilitate this sort of thing by building two indexes, one
with the data and one with the security information.  Look at
ParallelReader and it's javadocs.

> The thing is we have set of "searchable" documents and set of access
> privileges (which builds the tree-like structure, i'e access
> privileges could be
> inherited from parent node) for these documents. I was supposed to
> provide
> some "keyword" when indexing documents, which keyword will be
> "flatten" rights,
> i'e privileges merged from parent nodes (if required), in the same
> way as it is
> described in the Lucene in action book appendix (SecurityFilterTest).

However, for data like permissions, ACL's, groups, etc, it may be
better to keep the information where it originally resides and have a
Filter that accesses the external data.  It would likely be easier
and quicker to re-instantiate Filter's than to rebuild a security
index and less duplication.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: partial reindex

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Oct 5, 2005, at 4:01 AM, Eugeny N Dzhurinsky wrote:
> Is it possible somehow to change some partial fields in indexed  
> documents
> without reindexing all documents?

No, not with Lucene 1.4.3.  But the Subversion trunk has a feature  
that can facilitate this sort of thing by building two indexes, one  
with the data and one with the security information.  Look at  
ParallelReader and it's javadocs.

> The thing is we have set of "searchable" documents and set of access
> privileges (which builds the tree-like structure, i'e access  
> privileges could be
> inherited from parent node) for these documents. I was supposed to  
> provide
> some "keyword" when indexing documents, which keyword will be  
> "flatten" rights,
> i'e privileges merged from parent nodes (if required), in the same  
> way as it is
> described in the Lucene in action book appendix (SecurityFilterTest).

However, for data like permissions, ACL's, groups, etc, it may be  
better to keep the information where it originally resides and have a  
Filter that accesses the external data.  It would likely be easier  
and quicker to re-instantiate Filter's than to rebuild a security  
index and less duplication.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org