You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Arnon Mazza <ar...@yahoo.com> on 2012/02/01 15:05:11 UTC

Join between indexes

Assume we have a Lucene index over which several types of analyses are performed.
 
Assume that the conclusions of some analysis require that new tokens be added to existing documents in the index.
For example, a repeating pattern p (sequence of words) that appears in a large part of the documents should be tagged in every document in its exact position.
 
Now it is required to execute proximity queries involving standard terms and also the pattern p (e.g. find all documents in which the word "hello" is adjacent to the pattern p).
 
Is there a way of achieving this without re-indexing all the documents where the pattern p was found ?
In other words, is it possible to maintain a separate index that would keep only patterns->docIds/positions, and then join between the two indexes ?
 
If not, is there a plan to support this in the future ?
 
Thanks,
Arnon.

Re: Join between indexes

Posted by Arnon Mazza <ar...@yahoo.com>.
Thanks, that's a very nice feature.
 
Would it also enable joining on the docId level, meaning that part of a document is kept in some index and another part of the same document is kept in another index ?
 
In the example that was given in the articles & comments link, that could be for instance:
articles index:
- docId=1: "(1) this (2) paper (3) is (4) about (5) lucene". (numbers are positions in the doc).
comments index:
- docId=1: "(3) very (4) recommended".
 
So that one would be able to know that the comment "very recommended" was written next to the word "paper".
(Conceptually the query could be: articles.paper NEAR comments."very recommended").
 
Is this also part of the feature ?
 
Thanks,
Arnon.

From: Francisco A. Lozano <fl...@gmail.com>
To: java-user@lucene.apache.org 
Sent: Wednesday, February 1, 2012 7:56 PM
Subject: Re: Join between indexes

Wow, thanks for pointing this out, didn't know such a feature was in progress.

I see a mention that there are some chances this will be released in
3.6... crossing my fingers :)

Francisco A. Lozano



On Wed, Feb 1, 2012 at 17:09, Simon Willnauer
<si...@googlemail.com> wrote:
> maybe this link will help: http://bit.ly/AhwIw6
>
> simon
>
> On Wed, Feb 1, 2012 at 3:05 PM, Arnon Mazza <ar...@yahoo.com> wrote:
>> Assume we have a Lucene index over which several types of analyses are performed.
>>
>> Assume that the conclusions of some analysis require that new tokens be added to existing documents in the index.
>> For example, a repeating pattern p (sequence of words) that appears in a large part of the documents should be tagged in every document in its exact position.
>>
>> Now it is required to execute proximity queries involving standard terms and also the pattern p (e.g. find all documents in which the word "hello" is adjacent to the pattern p).
>>
>> Is there a way of achieving this without re-indexing all the documents where the pattern p was found ?
>> In other words, is it possible to maintain a separate index that would keep only patterns->docIds/positions, and then join between the two indexes ?
>>
>> If not, is there a plan to support this in the future ?
>>
>> Thanks,
>> Arnon.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Join between indexes

Posted by "Francisco A. Lozano" <fl...@gmail.com>.
Wow, thanks for pointing this out, didn't know such a feature was in progress.

I see a mention that there are some chances this will be released in
3.6... crossing my fingers :)

Francisco A. Lozano



On Wed, Feb 1, 2012 at 17:09, Simon Willnauer
<si...@googlemail.com> wrote:
> maybe this link will help: http://bit.ly/AhwIw6
>
> simon
>
> On Wed, Feb 1, 2012 at 3:05 PM, Arnon Mazza <ar...@yahoo.com> wrote:
>> Assume we have a Lucene index over which several types of analyses are performed.
>>
>> Assume that the conclusions of some analysis require that new tokens be added to existing documents in the index.
>> For example, a repeating pattern p (sequence of words) that appears in a large part of the documents should be tagged in every document in its exact position.
>>
>> Now it is required to execute proximity queries involving standard terms and also the pattern p (e.g. find all documents in which the word "hello" is adjacent to the pattern p).
>>
>> Is there a way of achieving this without re-indexing all the documents where the pattern p was found ?
>> In other words, is it possible to maintain a separate index that would keep only patterns->docIds/positions, and then join between the two indexes ?
>>
>> If not, is there a plan to support this in the future ?
>>
>> Thanks,
>> Arnon.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Join between indexes

Posted by Simon Willnauer <si...@googlemail.com>.
maybe this link will help: http://bit.ly/AhwIw6

simon

On Wed, Feb 1, 2012 at 3:05 PM, Arnon Mazza <ar...@yahoo.com> wrote:
> Assume we have a Lucene index over which several types of analyses are performed.
>
> Assume that the conclusions of some analysis require that new tokens be added to existing documents in the index.
> For example, a repeating pattern p (sequence of words) that appears in a large part of the documents should be tagged in every document in its exact position.
>
> Now it is required to execute proximity queries involving standard terms and also the pattern p (e.g. find all documents in which the word "hello" is adjacent to the pattern p).
>
> Is there a way of achieving this without re-indexing all the documents where the pattern p was found ?
> In other words, is it possible to maintain a separate index that would keep only patterns->docIds/positions, and then join between the two indexes ?
>
> If not, is there a plan to support this in the future ?
>
> Thanks,
> Arnon.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org