You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Diane Hardman <dh...@pivotal.io> on 2017/07/12 17:37:44 UTC

Proposal: Lucene indexing/searching for nested objects

The Geode 1.2.0 release includes Lucene text search fully integrated and
tested (no longer experimental). We are now proposing enhancements to
improve Lucene usability in Geode.

Some Geode users create data models that include nested and complex
objects. The current Geode Lucene integration supports indexing and
querying only the top-level fields in the data object. The objective of
this proposal is to support indexing and querying an arbitrary depth of
nested objects.


Please review the proposal in the following wiki page and give us your
feedback.

https://cwiki.apache.org/confluence/display/GEODE/Lucene+Text+Search+on+Nested+Object

Re: Proposal: Lucene indexing/searching for nested objects

Posted by Jacob Barrett <jb...@pivotal.io>.
Good point! Sounds good then.

Sent from my iPhone

> On Jul 20, 2017, at 11:15 AM, Dan Smith <ds...@pivotal.io> wrote:
> 
>> On Thu, Jul 20, 2017 at 10:57 AM, Jacob Barrett <jb...@pivotal.io> wrote:
>> 
>> I really feel like an annotation would make the most sense. How likely is
>> it that the object could not be annotated or the serializer for the object
>> is not tightly coupled with the object? Having to map objects to
>> serializers externally then becomes a lot more complicated to keep
>> consistent.
>> 
> 
> Well, with PDX serialization there may not even be a java class, or it may
> not be present on the server. So annotations don't really cover all of the
> use cases. With the proposed API, you could plug in an annotation based
> serializer, if you wanted to.
> 
> -Dan

Re: Proposal: Lucene indexing/searching for nested objects

Posted by Dan Smith <ds...@pivotal.io>.
On Thu, Jul 20, 2017 at 10:57 AM, Jacob Barrett <jb...@pivotal.io> wrote:

> I really feel like an annotation would make the most sense. How likely is
> it that the object could not be annotated or the serializer for the object
> is not tightly coupled with the object? Having to map objects to
> serializers externally then becomes a lot more complicated to keep
> consistent.
>

Well, with PDX serialization there may not even be a java class, or it may
not be present on the server. So annotations don't really cover all of the
use cases. With the proposed API, you could plug in an annotation based
serializer, if you wanted to.

-Dan

Re: Proposal: Lucene indexing/searching for nested objects

Posted by Jacob Barrett <jb...@pivotal.io>.
I really feel like an annotation would make the most sense. How likely is it that the object could not be annotated or the serializer for the object is not tightly coupled with the object? Having to map objects to serializers externally then becomes a lot more complicated to keep consistent.

Sent from my iPhone

> On Jul 20, 2017, at 10:38 AM, Dan Smith <ds...@pivotal.io> wrote:
> 
> This proposal doesn't really talk about XML or gfsh support.
> 
> The XML should probably just be a nested xml element, like this. It should
> have the same support for declarables that other callbacks in the xml do.
> 
> <lucene:index name="index1">
>  <lucene:serializer> com.mycompany.MySerializer </lucene:serializer>
> </lucene:index>
> 
> The gfsh command to create an index should also accept a serializer, like
> this
> 
> create lucene index --serializer=com.mycompany.MySerializer
> 
> If there are no objections I'll update the proposal.
> 
> -Dan
> 
>> On Tue, Jul 18, 2017 at 10:38 AM, Dan Smith <ds...@pivotal.io> wrote:
>> 
>> I think this LuceneSerializer API needs a slight tweak. In order to
>> implement the proposed FlatFormatSerializer, the serializer needs access to
>> the index configuration to see what fields the user wants to index. We
>> should also pass the LuceneIndex to the serializer.
>> 
>> public interface LuceneSerializer {
>>  Collection<Document> toDocuments(Object value, *LuceneIndex index*);
>> }
>> 
>>> On Thu, Jul 13, 2017 at 2:19 PM, Dan Smith <ds...@pivotal.io> wrote:
>>> 
>>> On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett <jb...@pivotal.io>
>>> wrote:
>>> 
>>>> Collections are really tough in Lucene because you have to flatten the
>>>> document. I struggled against it for some time on a project a few years ago
>>>> and ultimately decided to index the relationships separately and then merge
>>>> the results.
>>>> 
>>> 
>>> Yeah, this is part of the motivation for providing the LuceneSerializer
>>> API. We can provide a built in serializer that just flattens all nested
>>> collections into a single field, but users could also write their own
>>> implementation that converts the nested objects into separate lucene
>>> documents and use some of query classes in org.apache.lucene.search.join if
>>> they really need to.
>>> 
>>> It's not part of the goal here, but I think this LuceneSerializer API
>>> could also make it easier to do spatial indexing, because users could
>>> create a serializer that converts their gemfire object into a Lucene
>>> document with GeoPointFields.
>>> 
>>> -Dan
>>> 
>>> 
>> 

Re: Proposal: Lucene indexing/searching for nested objects

Posted by Dan Smith <ds...@pivotal.io>.
This proposal doesn't really talk about XML or gfsh support.

The XML should probably just be a nested xml element, like this. It should
have the same support for declarables that other callbacks in the xml do.

<lucene:index name="index1">
  <lucene:serializer> com.mycompany.MySerializer </lucene:serializer>
</lucene:index>

The gfsh command to create an index should also accept a serializer, like
this

create lucene index --serializer=com.mycompany.MySerializer

If there are no objections I'll update the proposal.

-Dan

On Tue, Jul 18, 2017 at 10:38 AM, Dan Smith <ds...@pivotal.io> wrote:

> I think this LuceneSerializer API needs a slight tweak. In order to
> implement the proposed FlatFormatSerializer, the serializer needs access to
> the index configuration to see what fields the user wants to index. We
> should also pass the LuceneIndex to the serializer.
>
> public interface LuceneSerializer {
>   Collection<Document> toDocuments(Object value, *LuceneIndex index*);
> }
>
> On Thu, Jul 13, 2017 at 2:19 PM, Dan Smith <ds...@pivotal.io> wrote:
>
>> On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett <jb...@pivotal.io>
>> wrote:
>>
>>> Collections are really tough in Lucene because you have to flatten the
>>> document. I struggled against it for some time on a project a few years ago
>>> and ultimately decided to index the relationships separately and then merge
>>> the results.
>>>
>>
>> Yeah, this is part of the motivation for providing the LuceneSerializer
>> API. We can provide a built in serializer that just flattens all nested
>> collections into a single field, but users could also write their own
>> implementation that converts the nested objects into separate lucene
>> documents and use some of query classes in org.apache.lucene.search.join if
>> they really need to.
>>
>> It's not part of the goal here, but I think this LuceneSerializer API
>> could also make it easier to do spatial indexing, because users could
>> create a serializer that converts their gemfire object into a Lucene
>> document with GeoPointFields.
>>
>> -Dan
>>
>>
>

Re: Proposal: Lucene indexing/searching for nested objects

Posted by Dan Smith <ds...@pivotal.io>.
I think this LuceneSerializer API needs a slight tweak. In order to
implement the proposed FlatFormatSerializer, the serializer needs access to
the index configuration to see what fields the user wants to index. We
should also pass the LuceneIndex to the serializer.

public interface LuceneSerializer {
  Collection<Document> toDocuments(Object value, *LuceneIndex index*);
}

On Thu, Jul 13, 2017 at 2:19 PM, Dan Smith <ds...@pivotal.io> wrote:

> On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett <jb...@pivotal.io>
> wrote:
>
>> Collections are really tough in Lucene because you have to flatten the
>> document. I struggled against it for some time on a project a few years ago
>> and ultimately decided to index the relationships separately and then merge
>> the results.
>>
>
> Yeah, this is part of the motivation for providing the LuceneSerializer
> API. We can provide a built in serializer that just flattens all nested
> collections into a single field, but users could also write their own
> implementation that converts the nested objects into separate lucene
> documents and use some of query classes in org.apache.lucene.search.join if
> they really need to.
>
> It's not part of the goal here, but I think this LuceneSerializer API
> could also make it easier to do spatial indexing, because users could
> create a serializer that converts their gemfire object into a Lucene
> document with GeoPointFields.
>
> -Dan
>
>

Re: Proposal: Lucene indexing/searching for nested objects

Posted by Dan Smith <ds...@pivotal.io>.
On Thu, Jul 13, 2017 at 11:26 AM, Jacob Barrett <jb...@pivotal.io> wrote:

> Collections are really tough in Lucene because you have to flatten the
> document. I struggled against it for some time on a project a few years ago
> and ultimately decided to index the relationships separately and then merge
> the results.
>

Yeah, this is part of the motivation for providing the LuceneSerializer
API. We can provide a built in serializer that just flattens all nested
collections into a single field, but users could also write their own
implementation that converts the nested objects into separate lucene
documents and use some of query classes in org.apache.lucene.search.join if
they really need to.

It's not part of the goal here, but I think this LuceneSerializer API could
also make it easier to do spatial indexing, because users could create a
serializer that converts their gemfire object into a Lucene document with
GeoPointFields.

-Dan

Re: Proposal: Lucene indexing/searching for nested objects

Posted by Jacob Barrett <jb...@pivotal.io>.
Collections are really tough in Lucene because you have to flatten the document. I struggled against it for some time on a project a few years ago and ultimately decided to index the relationships separately and then merge the results.


Sent from my iPhone

> On Jul 13, 2017, at 11:13 AM, Dan Smith <ds...@pivotal.io> wrote:
> 
> +1 Looks good. I think we should consider adding support for collections as
> well, but that doesn't have to be in the first cut.
> 
> -Dan
> 
>> On Wed, Jul 12, 2017 at 10:37 AM, Diane Hardman <dh...@pivotal.io> wrote:
>> 
>> The Geode 1.2.0 release includes Lucene text search fully integrated and
>> tested (no longer experimental). We are now proposing enhancements to
>> improve Lucene usability in Geode.
>> 
>> Some Geode users create data models that include nested and complex
>> objects. The current Geode Lucene integration supports indexing and
>> querying only the top-level fields in the data object. The objective of
>> this proposal is to support indexing and querying an arbitrary depth of
>> nested objects.
>> 
>> 
>> Please review the proposal in the following wiki page and give us your
>> feedback.
>> 
>> https://cwiki.apache.org/confluence/display/GEODE/
>> Lucene+Text+Search+on+Nested+Object
>> 

Re: Proposal: Lucene indexing/searching for nested objects

Posted by Dan Smith <ds...@pivotal.io>.
+1 Looks good. I think we should consider adding support for collections as
well, but that doesn't have to be in the first cut.

-Dan

On Wed, Jul 12, 2017 at 10:37 AM, Diane Hardman <dh...@pivotal.io> wrote:

> The Geode 1.2.0 release includes Lucene text search fully integrated and
> tested (no longer experimental). We are now proposing enhancements to
> improve Lucene usability in Geode.
>
> Some Geode users create data models that include nested and complex
> objects. The current Geode Lucene integration supports indexing and
> querying only the top-level fields in the data object. The objective of
> this proposal is to support indexing and querying an arbitrary depth of
> nested objects.
>
>
> Please review the proposal in the following wiki page and give us your
> feedback.
>
> https://cwiki.apache.org/confluence/display/GEODE/
> Lucene+Text+Search+on+Nested+Object
>