You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Chetan Mehrotra <ch...@gmail.com> on 2014/04/14 06:54:33 UTC

Using Lucene indexes for property queries

Hi,

In JR2 I believe Lucene was used for all types of queries and not only
for full text searches. In Oak we have our own PropertyIndexes for
handling queries involving constraints on properties. This I believe
provides a more accurate result as its built on top of mvcc support so
results obtained are consistent with session state/revision.

However this involves creating a index for property to be queried. And
the way currently property indexes are stored they consume quite a bit
of state (at least in DocumentNodeStore). In comparison Lucene stores
the index content in quite compact form.

In quite a few cases (like user choice based query builder) it might
not be known in advance which property the user would use. As we
already have all string property indexed in Lucene. Would it be
possible to use Lucene for performing such queries? Or allow the user
to choose which types of index he wants to use depending on the
usecase.

Chetan Mehrotra

RE: Using Lucene indexes for property queries

Posted by "Amit.. Gupta." <am...@adobe.com>.
> In theory, the Lucene index could be used quite easily. As far as I 
> see, we would only need to change the cost function of the Lucene 
> index (return a reasonable cost even if there is no full-text constraint).

+1 for allowing use of lucene indexes for property constraint, there advanced search use cases i.e. support GQL like search queries. Then some applications allow customers to perform ad hoc searches based on custom properties..
In such cases, searchable properties are not known in advance. A small lag should be acceptable in such cases.

Regards,
Amit


-----Original Message-----
From: Chetan Mehrotra [mailto:chetan.mehrotra@gmail.com] 
Sent: 14 April 2014 14:48
To: oak-dev@jackrabbit.apache.org
Subject: Re: Using Lucene indexes for property queries

> Should we let the
user decide whether it's OK to use an asynchronous index for this case

+1 for that. It has been the case with JR2 (I may be wrong here). And
when user is searching for say some asset via DAM in Adobe CQ then he would be ok if result is not for latest head. A small lag should be acceptable. This would enable scenarios where traversal would be too costly and Lucene can still be used to provide required results in a lot lesser time.
Chetan Mehrotra


On Mon, Apr 14, 2014 at 2:33 PM, Thomas Mueller <mu...@adobe.com> wrote:
> Hi,
>
> In theory, the Lucene index could be used quite easily. As far as I 
> see, we would only need to change the cost function of the Lucene 
> index (return a reasonable cost even if there is no full-text constraint).
>
> One problem might be: the Lucene index is asynchronous, and the user 
> might expect the result to be up-to-date. The user knows this already 
> for full-text constraints, but not for property constraints. Should we 
> let the user decide whether it's OK to use an asynchronous index for this case?
> For example by specifying an option in the query (for example similar 
> to the "order by", at the very end of the query, "option async")? So a 
> query that can use an asynchronous index would look like this:
>
>   //*[@prop = 'x'] option async
> or
>   //*[@prop = 'x'] order by @otherProperty option async or
>   select [jcr:path] from [nt:base] as a where [prop] > 1 option async
>
>
> Regards,
> Thomas
>
>
>
>
>
>
> On 14/04/14 06:54, "Chetan Mehrotra" <ch...@gmail.com> wrote:
>
>>Hi,
>>
>>In JR2 I believe Lucene was used for all types of queries and not only
>>for full text searches. In Oak we have our own PropertyIndexes for
>>handling queries involving constraints on properties. This I believe
>>provides a more accurate result as its built on top of mvcc support so
>>results obtained are consistent with session state/revision.
>>
>>However this involves creating a index for property to be queried. And
>>the way currently property indexes are stored they consume quite a bit
>>of state (at least in DocumentNodeStore). In comparison Lucene stores
>>the index content in quite compact form.
>>
>>In quite a few cases (like user choice based query builder) it might
>>not be known in advance which property the user would use. As we
>>already have all string property indexed in Lucene. Would it be
>>possible to use Lucene for performing such queries? Or allow the user
>>to choose which types of index he wants to use depending on the
>>usecase.
>>
>>Chetan Mehrotra
>

Re: Using Lucene indexes for property queries

Posted by Chetan Mehrotra <ch...@gmail.com>.
> Should we let the
user decide whether it's OK to use an asynchronous index for this case

+1 for that. It has been the case with JR2 (I may be wrong here). And
when user is searching for say some asset via DAM in Adobe CQ then he
would be ok if result is not for latest head. A small lag should be
acceptable. This would enable scenarios where traversal would be too
costly and Lucene can still be used to provide required results in a
lot lesser time.
Chetan Mehrotra


On Mon, Apr 14, 2014 at 2:33 PM, Thomas Mueller <mu...@adobe.com> wrote:
> Hi,
>
> In theory, the Lucene index could be used quite easily. As far as I see,
> we would only need to change the cost function of the Lucene index (return
> a reasonable cost even if there is no full-text constraint).
>
> One problem might be: the Lucene index is asynchronous, and the user might
> expect the result to be up-to-date. The user knows this already for
> full-text constraints, but not for property constraints. Should we let the
> user decide whether it's OK to use an asynchronous index for this case?
> For example by specifying an option in the query (for example similar to
> the "order by", at the very end of the query, "option async")? So a query
> that can use an asynchronous index would look like this:
>
>   //*[@prop = 'x'] option async
> or
>   //*[@prop = 'x'] order by @otherProperty option async
> or
>   select [jcr:path] from [nt:base] as a where [prop] > 1 option async
>
>
> Regards,
> Thomas
>
>
>
>
>
>
> On 14/04/14 06:54, "Chetan Mehrotra" <ch...@gmail.com> wrote:
>
>>Hi,
>>
>>In JR2 I believe Lucene was used for all types of queries and not only
>>for full text searches. In Oak we have our own PropertyIndexes for
>>handling queries involving constraints on properties. This I believe
>>provides a more accurate result as its built on top of mvcc support so
>>results obtained are consistent with session state/revision.
>>
>>However this involves creating a index for property to be queried. And
>>the way currently property indexes are stored they consume quite a bit
>>of state (at least in DocumentNodeStore). In comparison Lucene stores
>>the index content in quite compact form.
>>
>>In quite a few cases (like user choice based query builder) it might
>>not be known in advance which property the user would use. As we
>>already have all string property indexed in Lucene. Would it be
>>possible to use Lucene for performing such queries? Or allow the user
>>to choose which types of index he wants to use depending on the
>>usecase.
>>
>>Chetan Mehrotra
>

Re: Using Lucene indexes for property queries

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

In theory, the Lucene index could be used quite easily. As far as I see,
we would only need to change the cost function of the Lucene index (return
a reasonable cost even if there is no full-text constraint).

One problem might be: the Lucene index is asynchronous, and the user might
expect the result to be up-to-date. The user knows this already for
full-text constraints, but not for property constraints. Should we let the
user decide whether it's OK to use an asynchronous index for this case?
For example by specifying an option in the query (for example similar to
the "order by", at the very end of the query, "option async")? So a query
that can use an asynchronous index would look like this:

  //*[@prop = 'x'] option async
or
  //*[@prop = 'x'] order by @otherProperty option async
or
  select [jcr:path] from [nt:base] as a where [prop] > 1 option async


Regards,
Thomas






On 14/04/14 06:54, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>Hi,
>
>In JR2 I believe Lucene was used for all types of queries and not only
>for full text searches. In Oak we have our own PropertyIndexes for
>handling queries involving constraints on properties. This I believe
>provides a more accurate result as its built on top of mvcc support so
>results obtained are consistent with session state/revision.
>
>However this involves creating a index for property to be queried. And
>the way currently property indexes are stored they consume quite a bit
>of state (at least in DocumentNodeStore). In comparison Lucene stores
>the index content in quite compact form.
>
>In quite a few cases (like user choice based query builder) it might
>not be known in advance which property the user would use. As we
>already have all string property indexed in Lucene. Would it be
>possible to use Lucene for performing such queries? Or allow the user
>to choose which types of index he wants to use depending on the
>usecase.
>
>Chetan Mehrotra