You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Arnon Mazza <ar...@yahoo.com> on 2012/02/01 15:05:11 UTC
Join between indexes
Assume we have a Lucene index over which several types of analyses are performed.
Assume that the conclusions of some analysis require that new tokens be added to existing documents in the index.
For example, a repeating pattern p (sequence of words) that appears in a large part of the documents should be tagged in every document in its exact position.
Now it is required to execute proximity queries involving standard terms and also the pattern p (e.g. find all documents in which the word "hello" is adjacent to the pattern p).
Is there a way of achieving this without re-indexing all the documents where the pattern p was found ?
In other words, is it possible to maintain a separate index that would keep only patterns->docIds/positions, and then join between the two indexes ?
If not, is there a plan to support this in the future ?
Thanks,
Arnon.
Re: Join between indexes
Posted by Arnon Mazza <ar...@yahoo.com>.
Thanks, that's a very nice feature.
Would it also enable joining on the docId level, meaning that part of a document is kept in some index and another part of the same document is kept in another index ?
In the example that was given in the articles & comments link, that could be for instance:
articles index:
- docId=1: "(1) this (2) paper (3) is (4) about (5) lucene". (numbers are positions in the doc).
comments index:
- docId=1: "(3) very (4) recommended".
So that one would be able to know that the comment "very recommended" was written next to the word "paper".
(Conceptually the query could be: articles.paper NEAR comments."very recommended").
Is this also part of the feature ?
Thanks,
Arnon.
From: Francisco A. Lozano <fl...@gmail.com>
To: java-user@lucene.apache.org
Sent: Wednesday, February 1, 2012 7:56 PM
Subject: Re: Join between indexes
Wow, thanks for pointing this out, didn't know such a feature was in progress.
I see a mention that there are some chances this will be released in
3.6... crossing my fingers :)
Francisco A. Lozano
On Wed, Feb 1, 2012 at 17:09, Simon Willnauer
<si...@googlemail.com> wrote:
> maybe this link will help: http://bit.ly/AhwIw6
>
> simon
>
> On Wed, Feb 1, 2012 at 3:05 PM, Arnon Mazza <ar...@yahoo.com> wrote:
>> Assume we have a Lucene index over which several types of analyses are performed.
>>
>> Assume that the conclusions of some analysis require that new tokens be added to existing documents in the index.
>> For example, a repeating pattern p (sequence of words) that appears in a large part of the documents should be tagged in every document in its exact position.
>>
>> Now it is required to execute proximity queries involving standard terms and also the pattern p (e.g. find all documents in which the word "hello" is adjacent to the pattern p).
>>
>> Is there a way of achieving this without re-indexing all the documents where the pattern p was found ?
>> In other words, is it possible to maintain a separate index that would keep only patterns->docIds/positions, and then join between the two indexes ?
>>
>> If not, is there a plan to support this in the future ?
>>
>> Thanks,
>> Arnon.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Join between indexes
Posted by "Francisco A. Lozano" <fl...@gmail.com>.
Wow, thanks for pointing this out, didn't know such a feature was in progress.
I see a mention that there are some chances this will be released in
3.6... crossing my fingers :)
Francisco A. Lozano
On Wed, Feb 1, 2012 at 17:09, Simon Willnauer
<si...@googlemail.com> wrote:
> maybe this link will help: http://bit.ly/AhwIw6
>
> simon
>
> On Wed, Feb 1, 2012 at 3:05 PM, Arnon Mazza <ar...@yahoo.com> wrote:
>> Assume we have a Lucene index over which several types of analyses are performed.
>>
>> Assume that the conclusions of some analysis require that new tokens be added to existing documents in the index.
>> For example, a repeating pattern p (sequence of words) that appears in a large part of the documents should be tagged in every document in its exact position.
>>
>> Now it is required to execute proximity queries involving standard terms and also the pattern p (e.g. find all documents in which the word "hello" is adjacent to the pattern p).
>>
>> Is there a way of achieving this without re-indexing all the documents where the pattern p was found ?
>> In other words, is it possible to maintain a separate index that would keep only patterns->docIds/positions, and then join between the two indexes ?
>>
>> If not, is there a plan to support this in the future ?
>>
>> Thanks,
>> Arnon.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Join between indexes
Posted by Simon Willnauer <si...@googlemail.com>.
maybe this link will help: http://bit.ly/AhwIw6
simon
On Wed, Feb 1, 2012 at 3:05 PM, Arnon Mazza <ar...@yahoo.com> wrote:
> Assume we have a Lucene index over which several types of analyses are performed.
>
> Assume that the conclusions of some analysis require that new tokens be added to existing documents in the index.
> For example, a repeating pattern p (sequence of words) that appears in a large part of the documents should be tagged in every document in its exact position.
>
> Now it is required to execute proximity queries involving standard terms and also the pattern p (e.g. find all documents in which the word "hello" is adjacent to the pattern p).
>
> Is there a way of achieving this without re-indexing all the documents where the pattern p was found ?
> In other words, is it possible to maintain a separate index that would keep only patterns->docIds/positions, and then join between the two indexes ?
>
> If not, is there a plan to support this in the future ?
>
> Thanks,
> Arnon.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org