You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2018/01/01 03:00:20 UTC

Query fields with data of certain length

Hi,

Would like to check, if it is possible to query a field which has data of
more than a certain length?

Like for example, I want to query the field subject that has more than 255
bytes. Is it possible?

I am currently using Solr 6.5.1.

Regards,
Edwin

Re: Query fields with data of certain length

Posted by Emir Arnautović <em...@sematext.com>.
Hi Edwin,
If it is one time thing you can use regex to filter out results that are not long enough. Something like: subject:/.{255,}.*/.
Of course, this means subject is not tokenized.

It would be probably best if you index subject length as separate field and include it in query as subject_length:[255 TO *].

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Jan 2018, at 04:00, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
> 
> Hi,
> 
> Would like to check, if it is possible to query a field which has data of
> more than a certain length?
> 
> Like for example, I want to query the field subject that has more than 255
> bytes. Is it possible?
> 
> I am currently using Solr 6.5.1.
> 
> Regards,
> Edwin


Re: Query fields with data of certain length

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi,

Thanks for the reply.

Meaning we have to write this custom QParser ourselves?

Regards,
Edwin


On 3 February 2018 at 03:28, Chris Hostetter <ho...@fucit.org>
wrote:

>
> : Have you manage to get the regex for this string in Chinese:
> 预支款管理及账务处理办法 ?
>         ...
> : > An example of the string in Chinese is 预支款管理及账务处理办法
> : >
> : > The number of characters is 12, but the expected length should be 36.
>         ...
> : >> > So this would likely be different from what the operating system
> : >> counts, as
> : >> > the operating system may consider each Chinese characters as 3 to 4
> : >> bytes.
> : >> > Which is probably why I could not find any record with
> : >> subject:/.{255,}.*/
>
> Java regexes operate on unicode strings, so ".' matches any *character*
> There is no regex syntax to match an any "byte" so a regex based approach
> is never going to be viable.
>
> You're best bet is to check the byte count when indexing -- but even then
> you'd need some custom code since things like
> FieldLengthUpdateProcessorFactory are well behaved and count the
> *characters* of the unicode strings.
>
> If you absolutely can't reindex, then you'd need a custom QParser that
> produced a custom Query object that iterated over the TermEnum looking at
> the buffers and counting the bytes in each term -- matching each doc
> assocaited with those terms.
>
>
>
> -Hoss
> http://www.lucidworks.com/

Re: Query fields with data of certain length

Posted by Chris Hostetter <ho...@fucit.org>.
: Have you manage to get the regex for this string in Chinese: 预支款管理及账务处理办法 ?
	...
: > An example of the string in Chinese is 预支款管理及账务处理办法
: >
: > The number of characters is 12, but the expected length should be 36.
	...
: >> > So this would likely be different from what the operating system
: >> counts, as
: >> > the operating system may consider each Chinese characters as 3 to 4
: >> bytes.
: >> > Which is probably why I could not find any record with
: >> subject:/.{255,}.*/

Java regexes operate on unicode strings, so ".' matches any *character*
There is no regex syntax to match an any "byte" so a regex based approach 
is never going to be viable.

You're best bet is to check the byte count when indexing -- but even then 
you'd need some custom code since things like 
FieldLengthUpdateProcessorFactory are well behaved and count the 
*characters* of the unicode strings.

If you absolutely can't reindex, then you'd need a custom QParser that 
produced a custom Query object that iterated over the TermEnum looking at 
the buffers and counting the bytes in each term -- matching each doc 
assocaited with those terms.



-Hoss
http://www.lucidworks.com/

Re: Query fields with data of certain length

Posted by Emir Arnautović <em...@sematext.com>.
Hi Edwin,
Unfortunately, I was not able find regex that would work in your case.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Feb 2018, at 05:42, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
> 
> Hi,
> 
> Have you manage to get the regex for this string in Chinese: 预支款管理及账务处理办法 ?
> 
> Regards,
> Edwin
> 
> 
> On 4 January 2018 at 18:04, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> 
>> Hi Emir,
>> 
>> An example of the string in Chinese is 预支款管理及账务处理办法
>> 
>> The number of characters is 12, but the expected length should be 36.
>> 
>> Regards,
>> Edwin
>> 
>> 
>> On 4 January 2018 at 16:21, Emir Arnautović <em...@sematext.com>
>> wrote:
>> 
>>> Hi Edwin,
>>> I don’t have enough knowledge in eastern languages to know what is
>>> expected number when you as for sting length. Maybe you can try some of
>>> regex unicode settings and see if you’ll get what you need: try setting
>>> unicode flag with (?U) or try using regex groups and ranges. If you provide
>>> example string and expected length, maybe we could provide you regex.
>>> 
>>> Thanks,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
>>>> On 4 Jan 2018, at 04:37, Zheng Lin Edwin Yeo <ed...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi Emir,
>>>> 
>>>> So this would likely be different from what the operating system
>>> counts, as
>>>> the operating system may consider each Chinese characters as 3 to 4
>>> bytes.
>>>> Which is probably why I could not find any record with
>>> subject:/.{255,}.*/
>>>> 
>>>> Is there other tools that we can use to query the length for data that
>>> are
>>>> already indexed which are not in the standard English language? (Eg:
>>>> Chinese, Japanese, etc)
>>>> 
>>>> Regards,
>>>> Edwin
>>>> 
>>>> On 3 January 2018 at 23:51, Emir Arnautović <
>>> emir.arnautovic@sematext.com>
>>>> wrote:
>>>> 
>>>>> Hi Edwin,
>>>>> I do not know, but my guess would be that each character is counted as
>>> 1
>>>>> in regex regardless how many bytes it takes in used encoding.
>>>>> 
>>>>> Regards,
>>>>> Emir
>>>>> --
>>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>>> Solr & Elasticsearch Consulting Support Training -
>>> http://sematext.com/
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 3 Jan 2018, at 16:43, Zheng Lin Edwin Yeo <ed...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Thanks for the reply.
>>>>>> 
>>>>>> I am doing the search on existing data that has already been indexed,
>>> and
>>>>>> it is likely to be a one time thing.
>>>>>> 
>>>>>> This  subject:/.{255,}.*/  works for English characters. However,
>>> there
>>>>> are
>>>>>> Chinese characters in some of the records. The length seems to be more
>>>>> than
>>>>>> 255, but it does not shows up in the results.
>>>>>> 
>>>>>> Do you know how the length for Chinese characters and other languages
>>> are
>>>>>> being determined?
>>>>>> 
>>>>>> Regards,
>>>>>> Edwin
>>>>>> 
>>>>>> 
>>>>>> On 3 January 2018 at 23:01, Alexandre Rafalovitch <arafalov@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Do that during indexing as Emir suggested. Specifically, use an
>>>>>>> UpdateRequestProcessor chain, probably with the Clone and FieldLength
>>>>>>> processors: http://www.solr-start.com/javadoc/solr-lucene/org/
>>>>>>> apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Alex.
>>>>>>> 
>>>>>>> On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <
>>> edwinyeozl@gmail.com
>>>>>> 
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Would like to check, if it is possible to query a field which has
>>> data
>>>>> of
>>>>>>>> more than a certain length?
>>>>>>>> 
>>>>>>>> Like for example, I want to query the field subject that has more
>>> than
>>>>>>> 255
>>>>>>>> bytes. Is it possible?
>>>>>>>> 
>>>>>>>> I am currently using Solr 6.5.1.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Edwin
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Re: Query fields with data of certain length

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi,

Have you manage to get the regex for this string in Chinese: 预支款管理及账务处理办法 ?

Regards,
Edwin


On 4 January 2018 at 18:04, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi Emir,
>
> An example of the string in Chinese is 预支款管理及账务处理办法
>
> The number of characters is 12, but the expected length should be 36.
>
> Regards,
> Edwin
>
>
> On 4 January 2018 at 16:21, Emir Arnautović <em...@sematext.com>
> wrote:
>
>> Hi Edwin,
>> I don’t have enough knowledge in eastern languages to know what is
>> expected number when you as for sting length. Maybe you can try some of
>> regex unicode settings and see if you’ll get what you need: try setting
>> unicode flag with (?U) or try using regex groups and ranges. If you provide
>> example string and expected length, maybe we could provide you regex.
>>
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 4 Jan 2018, at 04:37, Zheng Lin Edwin Yeo <ed...@gmail.com>
>> wrote:
>> >
>> > Hi Emir,
>> >
>> > So this would likely be different from what the operating system
>> counts, as
>> > the operating system may consider each Chinese characters as 3 to 4
>> bytes.
>> > Which is probably why I could not find any record with
>> subject:/.{255,}.*/
>> >
>> > Is there other tools that we can use to query the length for data that
>> are
>> > already indexed which are not in the standard English language? (Eg:
>> > Chinese, Japanese, etc)
>> >
>> > Regards,
>> > Edwin
>> >
>> > On 3 January 2018 at 23:51, Emir Arnautović <
>> emir.arnautovic@sematext.com>
>> > wrote:
>> >
>> >> Hi Edwin,
>> >> I do not know, but my guess would be that each character is counted as
>> 1
>> >> in regex regardless how many bytes it takes in used encoding.
>> >>
>> >> Regards,
>> >> Emir
>> >> --
>> >> Monitoring - Log Management - Alerting - Anomaly Detection
>> >> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/
>> >>
>> >>
>> >>
>> >>> On 3 Jan 2018, at 16:43, Zheng Lin Edwin Yeo <ed...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Thanks for the reply.
>> >>>
>> >>> I am doing the search on existing data that has already been indexed,
>> and
>> >>> it is likely to be a one time thing.
>> >>>
>> >>> This  subject:/.{255,}.*/  works for English characters. However,
>> there
>> >> are
>> >>> Chinese characters in some of the records. The length seems to be more
>> >> than
>> >>> 255, but it does not shows up in the results.
>> >>>
>> >>> Do you know how the length for Chinese characters and other languages
>> are
>> >>> being determined?
>> >>>
>> >>> Regards,
>> >>> Edwin
>> >>>
>> >>>
>> >>> On 3 January 2018 at 23:01, Alexandre Rafalovitch <arafalov@gmail.com
>> >
>> >>> wrote:
>> >>>
>> >>>> Do that during indexing as Emir suggested. Specifically, use an
>> >>>> UpdateRequestProcessor chain, probably with the Clone and FieldLength
>> >>>> processors: http://www.solr-start.com/javadoc/solr-lucene/org/
>> >>>> apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
>> >>>>
>> >>>> Regards,
>> >>>>  Alex.
>> >>>>
>> >>>> On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <
>> edwinyeozl@gmail.com
>> >>>
>> >>>> wrote:
>> >>>>> Hi,
>> >>>>>
>> >>>>> Would like to check, if it is possible to query a field which has
>> data
>> >> of
>> >>>>> more than a certain length?
>> >>>>>
>> >>>>> Like for example, I want to query the field subject that has more
>> than
>> >>>> 255
>> >>>>> bytes. Is it possible?
>> >>>>>
>> >>>>> I am currently using Solr 6.5.1.
>> >>>>>
>> >>>>> Regards,
>> >>>>> Edwin
>> >>>>
>> >>
>> >>
>>
>>
>

Re: Query fields with data of certain length

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Emir,

An example of the string in Chinese is 预支款管理及账务处理办法

The number of characters is 12, but the expected length should be 36.

Regards,
Edwin


On 4 January 2018 at 16:21, Emir Arnautović <em...@sematext.com>
wrote:

> Hi Edwin,
> I don’t have enough knowledge in eastern languages to know what is
> expected number when you as for sting length. Maybe you can try some of
> regex unicode settings and see if you’ll get what you need: try setting
> unicode flag with (?U) or try using regex groups and ranges. If you provide
> example string and expected length, maybe we could provide you regex.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 4 Jan 2018, at 04:37, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> >
> > Hi Emir,
> >
> > So this would likely be different from what the operating system counts,
> as
> > the operating system may consider each Chinese characters as 3 to 4
> bytes.
> > Which is probably why I could not find any record with
> subject:/.{255,}.*/
> >
> > Is there other tools that we can use to query the length for data that
> are
> > already indexed which are not in the standard English language? (Eg:
> > Chinese, Japanese, etc)
> >
> > Regards,
> > Edwin
> >
> > On 3 January 2018 at 23:51, Emir Arnautović <
> emir.arnautovic@sematext.com>
> > wrote:
> >
> >> Hi Edwin,
> >> I do not know, but my guess would be that each character is counted as 1
> >> in regex regardless how many bytes it takes in used encoding.
> >>
> >> Regards,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 3 Jan 2018, at 16:43, Zheng Lin Edwin Yeo <ed...@gmail.com>
> >> wrote:
> >>>
> >>> Thanks for the reply.
> >>>
> >>> I am doing the search on existing data that has already been indexed,
> and
> >>> it is likely to be a one time thing.
> >>>
> >>> This  subject:/.{255,}.*/  works for English characters. However, there
> >> are
> >>> Chinese characters in some of the records. The length seems to be more
> >> than
> >>> 255, but it does not shows up in the results.
> >>>
> >>> Do you know how the length for Chinese characters and other languages
> are
> >>> being determined?
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>>
> >>> On 3 January 2018 at 23:01, Alexandre Rafalovitch <ar...@gmail.com>
> >>> wrote:
> >>>
> >>>> Do that during indexing as Emir suggested. Specifically, use an
> >>>> UpdateRequestProcessor chain, probably with the Clone and FieldLength
> >>>> processors: http://www.solr-start.com/javadoc/solr-lucene/org/
> >>>> apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
> >>>>
> >>>> Regards,
> >>>>  Alex.
> >>>>
> >>>> On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com
> >>>
> >>>> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Would like to check, if it is possible to query a field which has
> data
> >> of
> >>>>> more than a certain length?
> >>>>>
> >>>>> Like for example, I want to query the field subject that has more
> than
> >>>> 255
> >>>>> bytes. Is it possible?
> >>>>>
> >>>>> I am currently using Solr 6.5.1.
> >>>>>
> >>>>> Regards,
> >>>>> Edwin
> >>>>
> >>
> >>
>
>

Re: Query fields with data of certain length

Posted by Emir Arnautović <em...@sematext.com>.
Hi Edwin,
I don’t have enough knowledge in eastern languages to know what is expected number when you as for sting length. Maybe you can try some of regex unicode settings and see if you’ll get what you need: try setting unicode flag with (?U) or try using regex groups and ranges. If you provide example string and expected length, maybe we could provide you regex.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Jan 2018, at 04:37, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
> 
> Hi Emir,
> 
> So this would likely be different from what the operating system counts, as
> the operating system may consider each Chinese characters as 3 to 4 bytes.
> Which is probably why I could not find any record with subject:/.{255,}.*/
> 
> Is there other tools that we can use to query the length for data that are
> already indexed which are not in the standard English language? (Eg:
> Chinese, Japanese, etc)
> 
> Regards,
> Edwin
> 
> On 3 January 2018 at 23:51, Emir Arnautović <em...@sematext.com>
> wrote:
> 
>> Hi Edwin,
>> I do not know, but my guess would be that each character is counted as 1
>> in regex regardless how many bytes it takes in used encoding.
>> 
>> Regards,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 3 Jan 2018, at 16:43, Zheng Lin Edwin Yeo <ed...@gmail.com>
>> wrote:
>>> 
>>> Thanks for the reply.
>>> 
>>> I am doing the search on existing data that has already been indexed, and
>>> it is likely to be a one time thing.
>>> 
>>> This  subject:/.{255,}.*/  works for English characters. However, there
>> are
>>> Chinese characters in some of the records. The length seems to be more
>> than
>>> 255, but it does not shows up in the results.
>>> 
>>> Do you know how the length for Chinese characters and other languages are
>>> being determined?
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> 
>>> On 3 January 2018 at 23:01, Alexandre Rafalovitch <ar...@gmail.com>
>>> wrote:
>>> 
>>>> Do that during indexing as Emir suggested. Specifically, use an
>>>> UpdateRequestProcessor chain, probably with the Clone and FieldLength
>>>> processors: http://www.solr-start.com/javadoc/solr-lucene/org/
>>>> apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
>>>> 
>>>> Regards,
>>>>  Alex.
>>>> 
>>>> On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
>>> 
>>>> wrote:
>>>>> Hi,
>>>>> 
>>>>> Would like to check, if it is possible to query a field which has data
>> of
>>>>> more than a certain length?
>>>>> 
>>>>> Like for example, I want to query the field subject that has more than
>>>> 255
>>>>> bytes. Is it possible?
>>>>> 
>>>>> I am currently using Solr 6.5.1.
>>>>> 
>>>>> Regards,
>>>>> Edwin
>>>> 
>> 
>> 


Re: Query fields with data of certain length

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Emir,

So this would likely be different from what the operating system counts, as
the operating system may consider each Chinese characters as 3 to 4 bytes.
Which is probably why I could not find any record with subject:/.{255,}.*/

Is there other tools that we can use to query the length for data that are
already indexed which are not in the standard English language? (Eg:
Chinese, Japanese, etc)

Regards,
Edwin

On 3 January 2018 at 23:51, Emir Arnautović <em...@sematext.com>
wrote:

> Hi Edwin,
> I do not know, but my guess would be that each character is counted as 1
> in regex regardless how many bytes it takes in used encoding.
>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 3 Jan 2018, at 16:43, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> >
> > Thanks for the reply.
> >
> > I am doing the search on existing data that has already been indexed, and
> > it is likely to be a one time thing.
> >
> > This  subject:/.{255,}.*/  works for English characters. However, there
> are
> > Chinese characters in some of the records. The length seems to be more
> than
> > 255, but it does not shows up in the results.
> >
> > Do you know how the length for Chinese characters and other languages are
> > being determined?
> >
> > Regards,
> > Edwin
> >
> >
> > On 3 January 2018 at 23:01, Alexandre Rafalovitch <ar...@gmail.com>
> > wrote:
> >
> >> Do that during indexing as Emir suggested. Specifically, use an
> >> UpdateRequestProcessor chain, probably with the Clone and FieldLength
> >> processors: http://www.solr-start.com/javadoc/solr-lucene/org/
> >> apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
> >>
> >> Regards,
> >>   Alex.
> >>
> >> On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> >> wrote:
> >>> Hi,
> >>>
> >>> Would like to check, if it is possible to query a field which has data
> of
> >>> more than a certain length?
> >>>
> >>> Like for example, I want to query the field subject that has more than
> >> 255
> >>> bytes. Is it possible?
> >>>
> >>> I am currently using Solr 6.5.1.
> >>>
> >>> Regards,
> >>> Edwin
> >>
>
>

Re: Query fields with data of certain length

Posted by Emir Arnautović <em...@sematext.com>.
Hi Edwin,
I do not know, but my guess would be that each character is counted as 1 in regex regardless how many bytes it takes in used encoding.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 3 Jan 2018, at 16:43, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
> 
> Thanks for the reply.
> 
> I am doing the search on existing data that has already been indexed, and
> it is likely to be a one time thing.
> 
> This  subject:/.{255,}.*/  works for English characters. However, there are
> Chinese characters in some of the records. The length seems to be more than
> 255, but it does not shows up in the results.
> 
> Do you know how the length for Chinese characters and other languages are
> being determined?
> 
> Regards,
> Edwin
> 
> 
> On 3 January 2018 at 23:01, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
> 
>> Do that during indexing as Emir suggested. Specifically, use an
>> UpdateRequestProcessor chain, probably with the Clone and FieldLength
>> processors: http://www.solr-start.com/javadoc/solr-lucene/org/
>> apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
>> 
>> Regards,
>>   Alex.
>> 
>> On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <ed...@gmail.com>
>> wrote:
>>> Hi,
>>> 
>>> Would like to check, if it is possible to query a field which has data of
>>> more than a certain length?
>>> 
>>> Like for example, I want to query the field subject that has more than
>> 255
>>> bytes. Is it possible?
>>> 
>>> I am currently using Solr 6.5.1.
>>> 
>>> Regards,
>>> Edwin
>> 


Re: Query fields with data of certain length

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Thanks for the reply.

I am doing the search on existing data that has already been indexed, and
it is likely to be a one time thing.

This  subject:/.{255,}.*/  works for English characters. However, there are
Chinese characters in some of the records. The length seems to be more than
255, but it does not shows up in the results.

Do you know how the length for Chinese characters and other languages are
being determined?

Regards,
Edwin


On 3 January 2018 at 23:01, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> Do that during indexing as Emir suggested. Specifically, use an
> UpdateRequestProcessor chain, probably with the Clone and FieldLength
> processors: http://www.solr-start.com/javadoc/solr-lucene/org/
> apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
>
> Regards,
>    Alex.
>
> On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> > Hi,
> >
> > Would like to check, if it is possible to query a field which has data of
> > more than a certain length?
> >
> > Like for example, I want to query the field subject that has more than
> 255
> > bytes. Is it possible?
> >
> > I am currently using Solr 6.5.1.
> >
> > Regards,
> > Edwin
>

Re: Query fields with data of certain length

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Do that during indexing as Emir suggested. Specifically, use an
UpdateRequestProcessor chain, probably with the Clone and FieldLength
processors: http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html

Regards,
   Alex.

On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
> Hi,
>
> Would like to check, if it is possible to query a field which has data of
> more than a certain length?
>
> Like for example, I want to query the field subject that has more than 255
> bytes. Is it possible?
>
> I am currently using Solr 6.5.1.
>
> Regards,
> Edwin