You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Diego Ceccarelli (BLOOMBERG/ LONDON)" <dc...@bloomberg.net> on 2018/01/26 17:46:57 UTC

Re:***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

Hi Zahid, if you want to allow searching only if the query is shorter than a certain number of terms / characters, I would do it before calling solr probably, otherwise you could write a QueryParserPlugin (see [1]) and check that the query is sound before processing it. 
See also: http://coding-art.blogspot.co.uk/2016/05/writing-custom-solr-query-parser-for.html

Cheers,
Diego

[1] https://wiki.apache.org/solr/SolrPlugins


From: solr-user@lucene.apache.org At: 01/26/18 13:24:36To:  solr-user@lucene.apache.org
Cc:  apache@elyograg.org
Subject: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

Hi All,

Is there any way I can restrict Solr search query to look for specified
number of characters/words (for only searching purposes not for
highlighting)

*For example:*

*Indexed content:*
*I am a man of my words I am a lazy man...........*

Search to consider only below mentioned (words=7 or characters=16)
*I am a man of my words*

If I search for *lazy *no record should find.
If I search for *a *1 record should find.


Thanks
Zahid Iqbal



Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

Posted by Muhammad Zahid Iqbal <za...@northbaysolutions.net>.
Hi Alessandro,

Thanks for making it more clear. As I mentioned I do not want to change my
index (mentioned in subject) for the feature I requested.


search query will have to look for first 100 characters indexed in same XYZ
field. "
How can I achieve this without changing index? I want at searching side.


On Mon, Jan 29, 2018 at 4:13 PM, alessandro.benedetti <a....@sease.io>
wrote:

> This seems different from what you initially asked ( and Diego responded)
> "One is simple, search query will look for whole content indexed in XYZ
> field
> Other one is, search query will have to look for first 100 characters
> indexed in same XYZ field. "
>
> This is still doable at Indexing time using a copy field.
> You can have your "originalField" and your "truncatedField" with no problem
> at all.
> Just use a combination of copyFields[1] and what Erick suggested.
>
> Cheers
>
> [1] https://lucene.apache.org/solr/guide/6_6/copying-fields.html
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

Posted by "alessandro.benedetti" <a....@sease.io>.
This seems different from what you initially asked ( and Diego responded)
"One is simple, search query will look for whole content indexed in XYZ
field 
Other one is, search query will have to look for first 100 characters 
indexed in same XYZ field. "

This is still doable at Indexing time using a copy field.
You can have your "originalField" and your "truncatedField" with no problem
at all.
Just use a combination of copyFields[1] and what Erick suggested.

Cheers

[1] https://lucene.apache.org/solr/guide/6_6/copying-fields.html



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

Posted by Emir Arnautović <em...@sematext.com>.
Hi Muhammad,
If the limit(s) are static, you can still do it at index time: Assuming you send “content” field, you index it fully (and store if needed), and you use copy field to copy to content_limitted field where you use limit token count filter to index only first X tokens: https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LimitTokenCountFilter <https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LimitTokenCountFilter>

You can use CloneFieldUpdateProcessorFactory in combination with TruncateFieldUpdateProcessorFactory to do the similar thing in update request processor chain.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 29 Jan 2018, at 11:51, Muhammad Zahid Iqbal <za...@northbaysolutions.net> wrote:
> 
> Thanks Erick.
> 
> This is fine but I do not want to update my indexes as this configuration
> will get applied to indexing as well. I have a requirement where one field
> (XYZ) of type (text) requires two types of searches.
> 
> One is simple, search query will look for whole content indexed in XYZ field
> Other one is, search query will have to look for first 100 characters
> indexed in same XYZ field.
> 
> So I just want to do this at query time only.
> 
> Any idea? Would be much appreciated!
> 
> 
> On Sat, Jan 27, 2018 at 10:27 PM, Erick Erickson <er...@gmail.com>
> wrote:
> 
>> Sure, use TruncateFieldUpdateProcessorFactory in your update chain,
>> here's the base definition:
>> 
>>  <updateRequestProcessorChain name="truncate">
>>    <processor class="solr.TruncateFieldUpdateProcessorFactory">
>>      <str name="fieldName">trunc</str>
>>      <int name="maxLength">5</int>
>>    </processor>
>>  </updateRequestProcessorChain>
>> 
>> This _can_ be configured to operate on "all StrField", or "all
>> TextFields" as well, see the Javadocs.
>> 
>> This is static, that is the field is truncated at index time so you
>> can't change the values per-request.
>> 
>> Best,
>> Erick
>> 
>> 
>> 
>> On Sat, Jan 27, 2018 at 6:32 AM, Muhammad Zahid Iqbal
>> <za...@northbaysolutions.net> wrote:
>>> Thanks.
>>> 
>>> I do not want to search if the query is shorter than a certain number of
>>> terms/characters.
>>> 
>>> For example, I have a 10MB document indexed in Solr what I want is to
>>> search query in first 1MB content of that indexed document.
>>> 
>>> Any workaround e.g .can I send query to Solr to look for only 1MB from
>>> start of document.?
>>> 
>>> 
>>> 
>>> On Fri, Jan 26, 2018 at 10:46 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) <
>>> dceccarelli4@bloomberg.net> wrote:
>>> 
>>>> Hi Zahid, if you want to allow searching only if the query is shorter
>> than
>>>> a certain number of terms / characters, I would do it before calling
>> solr
>>>> probably, otherwise you could write a QueryParserPlugin (see [1]) and
>> check
>>>> that the query is sound before processing it.
>>>> See also: http://coding-art.blogspot.co.uk/2016/05/writing-custom-
>>>> solr-query-parser-for.html
>>>> 
>>>> Cheers,
>>>> Diego
>>>> 
>>>> [1] https://wiki.apache.org/solr/SolrPlugins
>>>> 
>>>> 
>>>> From: solr-user@lucene.apache.org At: 01/26/18 13:24:36To:
>>>> solr-user@lucene.apache.org
>>>> Cc:  apache@elyograg.org
>>>> Subject: ***UNCHECKED*** Limit Solr search to number of character/words
>>>> (without changing index)
>>>> 
>>>> Hi All,
>>>> 
>>>> Is there any way I can restrict Solr search query to look for specified
>>>> number of characters/words (for only searching purposes not for
>>>> highlighting)
>>>> 
>>>> *For example:*
>>>> 
>>>> *Indexed content:*
>>>> *I am a man of my words I am a lazy man...........*
>>>> 
>>>> Search to consider only below mentioned (words=7 or characters=16)
>>>> *I am a man of my words*
>>>> 
>>>> If I search for *lazy *no record should find.
>>>> If I search for *a *1 record should find.
>>>> 
>>>> 
>>>> Thanks
>>>> Zahid Iqbal
>>>> 
>>>> 
>>>> 
>> 


Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

Posted by Muhammad Zahid Iqbal <za...@northbaysolutions.net>.
Thanks Erick.

This is fine but I do not want to update my indexes as this configuration
will get applied to indexing as well. I have a requirement where one field
(XYZ) of type (text) requires two types of searches.

One is simple, search query will look for whole content indexed in XYZ field
Other one is, search query will have to look for first 100 characters
indexed in same XYZ field.

So I just want to do this at query time only.

Any idea? Would be much appreciated!


On Sat, Jan 27, 2018 at 10:27 PM, Erick Erickson <er...@gmail.com>
wrote:

> Sure, use TruncateFieldUpdateProcessorFactory in your update chain,
> here's the base definition:
>
>   <updateRequestProcessorChain name="truncate">
>     <processor class="solr.TruncateFieldUpdateProcessorFactory">
>       <str name="fieldName">trunc</str>
>       <int name="maxLength">5</int>
>     </processor>
>   </updateRequestProcessorChain>
>
> This _can_ be configured to operate on "all StrField", or "all
> TextFields" as well, see the Javadocs.
>
> This is static, that is the field is truncated at index time so you
> can't change the values per-request.
>
> Best,
> Erick
>
>
>
> On Sat, Jan 27, 2018 at 6:32 AM, Muhammad Zahid Iqbal
> <za...@northbaysolutions.net> wrote:
> > Thanks.
> >
> > I do not want to search if the query is shorter than a certain number of
> > terms/characters.
> >
> > For example, I have a 10MB document indexed in Solr what I want is to
> > search query in first 1MB content of that indexed document.
> >
> > Any workaround e.g .can I send query to Solr to look for only 1MB from
> > start of document.?
> >
> >
> >
> > On Fri, Jan 26, 2018 at 10:46 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > dceccarelli4@bloomberg.net> wrote:
> >
> >> Hi Zahid, if you want to allow searching only if the query is shorter
> than
> >> a certain number of terms / characters, I would do it before calling
> solr
> >> probably, otherwise you could write a QueryParserPlugin (see [1]) and
> check
> >> that the query is sound before processing it.
> >> See also: http://coding-art.blogspot.co.uk/2016/05/writing-custom-
> >> solr-query-parser-for.html
> >>
> >> Cheers,
> >> Diego
> >>
> >> [1] https://wiki.apache.org/solr/SolrPlugins
> >>
> >>
> >> From: solr-user@lucene.apache.org At: 01/26/18 13:24:36To:
> >> solr-user@lucene.apache.org
> >> Cc:  apache@elyograg.org
> >> Subject: ***UNCHECKED*** Limit Solr search to number of character/words
> >> (without changing index)
> >>
> >> Hi All,
> >>
> >> Is there any way I can restrict Solr search query to look for specified
> >> number of characters/words (for only searching purposes not for
> >> highlighting)
> >>
> >> *For example:*
> >>
> >> *Indexed content:*
> >> *I am a man of my words I am a lazy man...........*
> >>
> >> Search to consider only below mentioned (words=7 or characters=16)
> >> *I am a man of my words*
> >>
> >> If I search for *lazy *no record should find.
> >> If I search for *a *1 record should find.
> >>
> >>
> >> Thanks
> >> Zahid Iqbal
> >>
> >>
> >>
>

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

Posted by Erick Erickson <er...@gmail.com>.
Sure, use TruncateFieldUpdateProcessorFactory in your update chain,
here's the base definition:

  <updateRequestProcessorChain name="truncate">
    <processor class="solr.TruncateFieldUpdateProcessorFactory">
      <str name="fieldName">trunc</str>
      <int name="maxLength">5</int>
    </processor>
  </updateRequestProcessorChain>

This _can_ be configured to operate on "all StrField", or "all
TextFields" as well, see the Javadocs.

This is static, that is the field is truncated at index time so you
can't change the values per-request.

Best,
Erick



On Sat, Jan 27, 2018 at 6:32 AM, Muhammad Zahid Iqbal
<za...@northbaysolutions.net> wrote:
> Thanks.
>
> I do not want to search if the query is shorter than a certain number of
> terms/characters.
>
> For example, I have a 10MB document indexed in Solr what I want is to
> search query in first 1MB content of that indexed document.
>
> Any workaround e.g .can I send query to Solr to look for only 1MB from
> start of document.?
>
>
>
> On Fri, Jan 26, 2018 at 10:46 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) <
> dceccarelli4@bloomberg.net> wrote:
>
>> Hi Zahid, if you want to allow searching only if the query is shorter than
>> a certain number of terms / characters, I would do it before calling solr
>> probably, otherwise you could write a QueryParserPlugin (see [1]) and check
>> that the query is sound before processing it.
>> See also: http://coding-art.blogspot.co.uk/2016/05/writing-custom-
>> solr-query-parser-for.html
>>
>> Cheers,
>> Diego
>>
>> [1] https://wiki.apache.org/solr/SolrPlugins
>>
>>
>> From: solr-user@lucene.apache.org At: 01/26/18 13:24:36To:
>> solr-user@lucene.apache.org
>> Cc:  apache@elyograg.org
>> Subject: ***UNCHECKED*** Limit Solr search to number of character/words
>> (without changing index)
>>
>> Hi All,
>>
>> Is there any way I can restrict Solr search query to look for specified
>> number of characters/words (for only searching purposes not for
>> highlighting)
>>
>> *For example:*
>>
>> *Indexed content:*
>> *I am a man of my words I am a lazy man...........*
>>
>> Search to consider only below mentioned (words=7 or characters=16)
>> *I am a man of my words*
>>
>> If I search for *lazy *no record should find.
>> If I search for *a *1 record should find.
>>
>>
>> Thanks
>> Zahid Iqbal
>>
>>
>>

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

Posted by Muhammad Zahid Iqbal <za...@northbaysolutions.net>.
Thanks.

I do not want to search if the query is shorter than a certain number of
terms/characters.

For example, I have a 10MB document indexed in Solr what I want is to
search query in first 1MB content of that indexed document.

Any workaround e.g .can I send query to Solr to look for only 1MB from
start of document.?



On Fri, Jan 26, 2018 at 10:46 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) <
dceccarelli4@bloomberg.net> wrote:

> Hi Zahid, if you want to allow searching only if the query is shorter than
> a certain number of terms / characters, I would do it before calling solr
> probably, otherwise you could write a QueryParserPlugin (see [1]) and check
> that the query is sound before processing it.
> See also: http://coding-art.blogspot.co.uk/2016/05/writing-custom-
> solr-query-parser-for.html
>
> Cheers,
> Diego
>
> [1] https://wiki.apache.org/solr/SolrPlugins
>
>
> From: solr-user@lucene.apache.org At: 01/26/18 13:24:36To:
> solr-user@lucene.apache.org
> Cc:  apache@elyograg.org
> Subject: ***UNCHECKED*** Limit Solr search to number of character/words
> (without changing index)
>
> Hi All,
>
> Is there any way I can restrict Solr search query to look for specified
> number of characters/words (for only searching purposes not for
> highlighting)
>
> *For example:*
>
> *Indexed content:*
> *I am a man of my words I am a lazy man...........*
>
> Search to consider only below mentioned (words=7 or characters=16)
> *I am a man of my words*
>
> If I search for *lazy *no record should find.
> If I search for *a *1 record should find.
>
>
> Thanks
> Zahid Iqbal
>
>
>

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

Posted by "alessandro.benedetti" <a....@sease.io>.
Taking a look to Lucene code, this seems the closest query to your
requirement :

org.apache.lucene.search.spans.SpanPositionRangeQuery

But it is not used in Solr out of the box according to what I know.
You may potentially develop a query parser and use it to reach your goals.

Given that, I think the index time strategy will be much easier and it will
just require a re-index and few small changes at query time configuration.
Another possibility may be to use payloads and the related query parser, but
also in this case you would need to re-index so it is unlikely that this
option would be your favorite.
I appreciate the fact you can not re-index, so in this case you will need to
follow the other approaches ( developing components).

Regards





-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html