You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Muthu <mu...@gmail.com> on 2017/06/15 23:45:35 UTC

How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Folks,

If a field annotated with @QueryTextField contains comma separated values
would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Muthu <mu...@gmail.com>.
Thanks Andrey..i got that after looking through your earlier reply. What i
was curious is the reason for comma with space instead of just space
character. The reason as i understand is to tokenize the entire words in
between as tokens.

@Manu,

Thanks for the additional info. Let me look at it.

Regards,
Muthu

On Fri, Jun 23, 2017 at 8:58 AM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> Hi Muthu,
>
> Using comma as separator is bad idea in common case. As you see from Unicode
> standard, 3,456.789   wouldn't break into 2 words.
> It would be better to use space character or comma (or other separator you
> want) with space.
>
> On Fri, Jun 23, 2017 at 3:22 PM, Manu <ma...@hotmail.com> wrote:
>
>> Hi,
>>
>> If you need advanced lucene search you could modify GridLuceneIndex to
>> parse
>> KeyCacheObject and CacheObject on store method to create additional
>> IndexableFields applying transformation to non string values.
>>
>> We just integrate cassandra-lucene-index concept from stratio
>> implementation
>> (https://github.com/Stratio/cassandra-lucene-index, documentation here
>> https://github.com/Stratio/cassandra-lucene-index/blob/branc
>> h-3.0.13/doc/documentation.rst)
>> on GridLuceneIndex to support advanced lucene search like spatial,
>> bitemporal, maps, list... based on mappers modifying @QueryTextField (with
>> allow add mapper definition, i.e. how you want to index fields on lucene)
>> and modifying annotation processor on CacheConfiguration. This allow use
>> advanced lucene search on standard ignite SqlQueries not only on
>> TextQuery,
>> that has a very limited functionality. GridLuceneIndex is now a
>> GridH2Index!!,  so we could make complex joins with other entities using
>> complex lucene filters. Functionality and performance results are
>> awesome!!
>>
>> Also we have made some improvements to indexing module... like
>> auto-register
>> NEW sqlfields, auto rebuild and create NEW indexes...if entity definitions
>> change.
>>
>> When we have some free time we will share the code for community!
>>
>> Bye!
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-ignite-users.705
>> 18.x6.nabble.com/How-does-Ignite-Lucene-based-text-indexing-
>> querying-work-if-a-field-has-comma-separated-values-tp13830p14064.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Muthu,

Using comma as separator is bad idea in common case. As you see from Unicode
standard, 3,456.789   wouldn't break into 2 words.
It would be better to use space character or comma (or other separator you
want) with space.

On Fri, Jun 23, 2017 at 3:22 PM, Manu <ma...@hotmail.com> wrote:

> Hi,
>
> If you need advanced lucene search you could modify GridLuceneIndex to
> parse
> KeyCacheObject and CacheObject on store method to create additional
> IndexableFields applying transformation to non string values.
>
> We just integrate cassandra-lucene-index concept from stratio
> implementation
> (https://github.com/Stratio/cassandra-lucene-index, documentation here
> https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.13/doc/
> documentation.rst)
> on GridLuceneIndex to support advanced lucene search like spatial,
> bitemporal, maps, list... based on mappers modifying @QueryTextField (with
> allow add mapper definition, i.e. how you want to index fields on lucene)
> and modifying annotation processor on CacheConfiguration. This allow use
> advanced lucene search on standard ignite SqlQueries not only on TextQuery,
> that has a very limited functionality. GridLuceneIndex is now a
> GridH2Index!!,  so we could make complex joins with other entities using
> complex lucene filters. Functionality and performance results are awesome!!
>
> Also we have made some improvements to indexing module... like
> auto-register
> NEW sqlfields, auto rebuild and create NEW indexes...if entity definitions
> change.
>
> When we have some free time we will share the code for community!
>
> Bye!
>
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/How-does-Ignite-Lucene-based-text-
> indexing-querying-work-if-a-field-has-comma-separated-
> values-tp13830p14064.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Best regards,
Andrey V. Mashenkov

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Manu <ma...@hotmail.com>.
Hi,

If you need advanced lucene search you could modify GridLuceneIndex to parse
KeyCacheObject and CacheObject on store method to create additional
IndexableFields applying transformation to non string values.

We just integrate cassandra-lucene-index concept from stratio implementation
(https://github.com/Stratio/cassandra-lucene-index, documentation here
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.13/doc/documentation.rst)
on GridLuceneIndex to support advanced lucene search like spatial,
bitemporal, maps, list... based on mappers modifying @QueryTextField (with
allow add mapper definition, i.e. how you want to index fields on lucene)
and modifying annotation processor on CacheConfiguration. This allow use
advanced lucene search on standard ignite SqlQueries not only on TextQuery,
that has a very limited functionality. GridLuceneIndex is now a
GridH2Index!!,  so we could make complex joins with other entities using
complex lucene filters. Functionality and performance results are awesome!!

Also we have made some improvements to indexing module... like auto-register
NEW sqlfields, auto rebuild and create NEW indexes...if entity definitions
change.

When we have some free time we will share the code for community!

Bye!




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-does-Ignite-Lucene-based-text-indexing-querying-work-if-a-field-has-comma-separated-values-tp13830p14064.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Muthu <mu...@gmail.com>.
Okay...thanks for the input..just to understand better the comma is needed
with space to make sure it tokenizes the entire words in between as tokens?

Regards,
Muthu

On Wed, Jun 21, 2017 at 8:58 AM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> Not quite, comma separated String will be tokenized by Lucene
> StandartTokenizer,
> according to Unicode standard [1].
>
> I'd recommend to use ", " (comma with a space character) as separator.
>
> [1] http://unicode.org/reports/tr29/
>
> On Tue, Jun 20, 2017 at 11:55 PM, Muthu <mu...@gmail.com> wrote:
>
>> The objects in my ignite cache have a List<String> as member...so i have
>> to change it to a comma separated String if i have to able to perform text
>> searches..correct?
>>
>> Regards,
>> Muthu
>>
>> On Tue, Jun 20, 2017 at 1:49 PM, Muthu <mu...@gmail.com> wrote:
>>
>>> Okay..alrite..thanks Andrey.
>>>
>>> Regards,
>>> Muthu
>>>
>>> On Tue, Jun 20, 2017 at 1:30 PM, Andrey Mashenkov <
>>> andrey.mashenkov@gmail.com> wrote:
>>>
>>>> Hi,
>>>> I mean object fields which are type of String.
>>>>
>>>> 20 июня 2017 г. 23:04 пользователь "Muthu" <mu...@gmail.com>
>>>> написал:
>>>>
>>>> Okay...btw what is an object String?
>>>>>
>>>>> Regards,
>>>>> Muthu
>>>>>
>>>>> On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <
>>>>> andrey.mashenkov@gmail.com> wrote:
>>>>>
>>>>>> No, only Strings and object String fields are supported.
>>>>>>
>>>>>> 16 июня 2017 г. 21:27 пользователь "Muthu" <mu...@gmail.com>
>>>>>> написал:
>>>>>>
>>>>>> Great!...thanks for the info...how about a list of strings
>>>>>> (List<String>)...will it also be handled (an array value in the key-value
>>>>>> pair)?
>>>>>>
>>>>>> Regards,
>>>>>> Muthu
>>>>>>
>>>>>> On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <
>>>>>> andrey.mashenkov@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Muthu,
>>>>>>>
>>>>>>> Yes, field value will be tokenized with Lucene StandartAnalyzer [1].
>>>>>>>
>>>>>>> [1] http://lucene.apache.org/core/3_5_0/api/core/org/apache/
>>>>>>> lucene/analysis/standard/StandardAnalyzer.html
>>>>>>>
>>>>>>> On Fri, Jun 16, 2017 at 2:45 AM, Muthu <mu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Folks,
>>>>>>>>
>>>>>>>> If a field annotated with @QueryTextField contains comma separated
>>>>>>>> values would this be tokenized before being indexed by Lucene? How does it
>>>>>>>> work?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Muthu
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Andrey V. Mashenkov
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Andrey Mashenkov <an...@gmail.com>.
Not quite, comma separated String will be tokenized by Lucene
StandartTokenizer,
according to Unicode standard [1].

I'd recommend to use ", " (comma with a space character) as separator.

[1] http://unicode.org/reports/tr29/

On Tue, Jun 20, 2017 at 11:55 PM, Muthu <mu...@gmail.com> wrote:

> The objects in my ignite cache have a List<String> as member...so i have
> to change it to a comma separated String if i have to able to perform text
> searches..correct?
>
> Regards,
> Muthu
>
> On Tue, Jun 20, 2017 at 1:49 PM, Muthu <mu...@gmail.com> wrote:
>
>> Okay..alrite..thanks Andrey.
>>
>> Regards,
>> Muthu
>>
>> On Tue, Jun 20, 2017 at 1:30 PM, Andrey Mashenkov <
>> andrey.mashenkov@gmail.com> wrote:
>>
>>> Hi,
>>> I mean object fields which are type of String.
>>>
>>> 20 июня 2017 г. 23:04 пользователь "Muthu" <mu...@gmail.com>
>>> написал:
>>>
>>> Okay...btw what is an object String?
>>>>
>>>> Regards,
>>>> Muthu
>>>>
>>>> On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <
>>>> andrey.mashenkov@gmail.com> wrote:
>>>>
>>>>> No, only Strings and object String fields are supported.
>>>>>
>>>>> 16 июня 2017 г. 21:27 пользователь "Muthu" <mu...@gmail.com>
>>>>> написал:
>>>>>
>>>>> Great!...thanks for the info...how about a list of strings
>>>>> (List<String>)...will it also be handled (an array value in the key-value
>>>>> pair)?
>>>>>
>>>>> Regards,
>>>>> Muthu
>>>>>
>>>>> On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <
>>>>> andrey.mashenkov@gmail.com> wrote:
>>>>>
>>>>>> Hi Muthu,
>>>>>>
>>>>>> Yes, field value will be tokenized with Lucene StandartAnalyzer [1].
>>>>>>
>>>>>> [1] http://lucene.apache.org/core/3_5_0/api/core/org/apache/
>>>>>> lucene/analysis/standard/StandardAnalyzer.html
>>>>>>
>>>>>> On Fri, Jun 16, 2017 at 2:45 AM, Muthu <mu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Folks,
>>>>>>>
>>>>>>> If a field annotated with @QueryTextField contains comma separated
>>>>>>> values would this be tokenized before being indexed by Lucene? How does it
>>>>>>> work?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Muthu
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Andrey V. Mashenkov
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>


-- 
Best regards,
Andrey V. Mashenkov

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Muthu <mu...@gmail.com>.
The objects in my ignite cache have a List<String> as member...so i have to
change it to a comma separated String if i have to able to perform text
searches..correct?

Regards,
Muthu

On Tue, Jun 20, 2017 at 1:49 PM, Muthu <mu...@gmail.com> wrote:

> Okay..alrite..thanks Andrey.
>
> Regards,
> Muthu
>
> On Tue, Jun 20, 2017 at 1:30 PM, Andrey Mashenkov <
> andrey.mashenkov@gmail.com> wrote:
>
>> Hi,
>> I mean object fields which are type of String.
>>
>> 20 июня 2017 г. 23:04 пользователь "Muthu" <mu...@gmail.com>
>> написал:
>>
>> Okay...btw what is an object String?
>>>
>>> Regards,
>>> Muthu
>>>
>>> On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <
>>> andrey.mashenkov@gmail.com> wrote:
>>>
>>>> No, only Strings and object String fields are supported.
>>>>
>>>> 16 июня 2017 г. 21:27 пользователь "Muthu" <mu...@gmail.com>
>>>> написал:
>>>>
>>>> Great!...thanks for the info...how about a list of strings
>>>> (List<String>)...will it also be handled (an array value in the key-value
>>>> pair)?
>>>>
>>>> Regards,
>>>> Muthu
>>>>
>>>> On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <
>>>> andrey.mashenkov@gmail.com> wrote:
>>>>
>>>>> Hi Muthu,
>>>>>
>>>>> Yes, field value will be tokenized with Lucene StandartAnalyzer [1].
>>>>>
>>>>> [1] http://lucene.apache.org/core/3_5_0/api/core/org/apache/
>>>>> lucene/analysis/standard/StandardAnalyzer.html
>>>>>
>>>>> On Fri, Jun 16, 2017 at 2:45 AM, Muthu <mu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Folks,
>>>>>>
>>>>>> If a field annotated with @QueryTextField contains comma separated
>>>>>> values would this be tokenized before being indexed by Lucene? How does it
>>>>>> work?
>>>>>>
>>>>>> Regards,
>>>>>> Muthu
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Andrey V. Mashenkov
>>>>>
>>>>
>>>>
>>>>
>>>
>

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Muthu <mu...@gmail.com>.
Okay..alrite..thanks Andrey.

Regards,
Muthu

On Tue, Jun 20, 2017 at 1:30 PM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> Hi,
> I mean object fields which are type of String.
>
> 20 июня 2017 г. 23:04 пользователь "Muthu" <mu...@gmail.com>
> написал:
>
> Okay...btw what is an object String?
>>
>> Regards,
>> Muthu
>>
>> On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <
>> andrey.mashenkov@gmail.com> wrote:
>>
>>> No, only Strings and object String fields are supported.
>>>
>>> 16 июня 2017 г. 21:27 пользователь "Muthu" <mu...@gmail.com>
>>> написал:
>>>
>>> Great!...thanks for the info...how about a list of strings
>>> (List<String>)...will it also be handled (an array value in the key-value
>>> pair)?
>>>
>>> Regards,
>>> Muthu
>>>
>>> On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <
>>> andrey.mashenkov@gmail.com> wrote:
>>>
>>>> Hi Muthu,
>>>>
>>>> Yes, field value will be tokenized with Lucene StandartAnalyzer [1].
>>>>
>>>> [1] http://lucene.apache.org/core/3_5_0/api/core/org/apache/
>>>> lucene/analysis/standard/StandardAnalyzer.html
>>>>
>>>> On Fri, Jun 16, 2017 at 2:45 AM, Muthu <mu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Folks,
>>>>>
>>>>> If a field annotated with @QueryTextField contains comma separated
>>>>> values would this be tokenized before being indexed by Lucene? How does it
>>>>> work?
>>>>>
>>>>> Regards,
>>>>> Muthu
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrey V. Mashenkov
>>>>
>>>
>>>
>>>
>>

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,
I mean object fields which are type of String.

20 июня 2017 г. 23:04 пользователь "Muthu" <mu...@gmail.com>
написал:

> Okay...btw what is an object String?
>
> Regards,
> Muthu
>
> On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <
> andrey.mashenkov@gmail.com> wrote:
>
>> No, only Strings and object String fields are supported.
>>
>> 16 июня 2017 г. 21:27 пользователь "Muthu" <mu...@gmail.com>
>> написал:
>>
>> Great!...thanks for the info...how about a list of strings
>> (List<String>)...will it also be handled (an array value in the key-value
>> pair)?
>>
>> Regards,
>> Muthu
>>
>> On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <
>> andrey.mashenkov@gmail.com> wrote:
>>
>>> Hi Muthu,
>>>
>>> Yes, field value will be tokenized with Lucene StandartAnalyzer [1].
>>>
>>> [1] http://lucene.apache.org/core/3_5_0/api/core/org/apache/
>>> lucene/analysis/standard/StandardAnalyzer.html
>>>
>>> On Fri, Jun 16, 2017 at 2:45 AM, Muthu <mu...@gmail.com>
>>> wrote:
>>>
>>>> Folks,
>>>>
>>>> If a field annotated with @QueryTextField contains comma separated
>>>> values would this be tokenized before being indexed by Lucene? How does it
>>>> work?
>>>>
>>>> Regards,
>>>> Muthu
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>
>>
>>
>

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Muthu <mu...@gmail.com>.
Okay...btw what is an object String?

Regards,
Muthu

On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> No, only Strings and object String fields are supported.
>
> 16 июня 2017 г. 21:27 пользователь "Muthu" <mu...@gmail.com>
> написал:
>
> Great!...thanks for the info...how about a list of strings
> (List<String>)...will it also be handled (an array value in the key-value
> pair)?
>
> Regards,
> Muthu
>
> On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <
> andrey.mashenkov@gmail.com> wrote:
>
>> Hi Muthu,
>>
>> Yes, field value will be tokenized with Lucene StandartAnalyzer [1].
>>
>> [1] http://lucene.apache.org/core/3_5_0/api/core/org/apache/
>> lucene/analysis/standard/StandardAnalyzer.html
>>
>> On Fri, Jun 16, 2017 at 2:45 AM, Muthu <mu...@gmail.com> wrote:
>>
>>> Folks,
>>>
>>> If a field annotated with @QueryTextField contains comma separated
>>> values would this be tokenized before being indexed by Lucene? How does it
>>> work?
>>>
>>> Regards,
>>> Muthu
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>
>

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Andrey Mashenkov <an...@gmail.com>.
No, only Strings and object String fields are supported.

16 июня 2017 г. 21:27 пользователь "Muthu" <mu...@gmail.com>
написал:

Great!...thanks for the info...how about a list of strings
(List<String>)...will it also be handled (an array value in the key-value
pair)?

Regards,
Muthu

On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> Hi Muthu,
>
> Yes, field value will be tokenized with Lucene StandartAnalyzer [1].
>
> [1] http://lucene.apache.org/core/3_5_0/api/core/org/apache/
> lucene/analysis/standard/StandardAnalyzer.html
>
> On Fri, Jun 16, 2017 at 2:45 AM, Muthu <mu...@gmail.com> wrote:
>
>> Folks,
>>
>> If a field annotated with @QueryTextField contains comma separated values
>> would this be tokenized before being indexed by Lucene? How does it work?
>>
>> Regards,
>> Muthu
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Muthu <mu...@gmail.com>.
Great!...thanks for the info...how about a list of strings
(List<String>)...will it also be handled (an array value in the key-value
pair)?

Regards,
Muthu

On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> Hi Muthu,
>
> Yes, field value will be tokenized with Lucene StandartAnalyzer [1].
>
> [1] http://lucene.apache.org/core/3_5_0/api/core/org/
> apache/lucene/analysis/standard/StandardAnalyzer.html
>
> On Fri, Jun 16, 2017 at 2:45 AM, Muthu <mu...@gmail.com> wrote:
>
>> Folks,
>>
>> If a field annotated with @QueryTextField contains comma separated values
>> would this be tokenized before being indexed by Lucene? How does it work?
>>
>> Regards,
>> Muthu
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Muthu,

Yes, field value will be tokenized with Lucene StandartAnalyzer [1].

[1]
http://lucene.apache.org/core/3_5_0/api/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html

On Fri, Jun 16, 2017 at 2:45 AM, Muthu <mu...@gmail.com> wrote:

> Folks,
>
> If a field annotated with @QueryTextField contains comma separated values
> would this be tokenized before being indexed by Lucene? How does it work?
>
> Regards,
> Muthu
>



-- 
Best regards,
Andrey V. Mashenkov