You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Steve Rowe <sa...@gmail.com> on 2017/04/04 21:32:14 UTC

Re: Solr Shingle is not working properly in solr 6.5.0

Hi Aman,

I’ve created <https://issues.apache.org/jira/browse/SOLR-10423> for this problem.

--
Steve
www.lucidworks.com

> On Mar 31, 2017, at 7:34 AM, Aman Deep Singh <am...@gmail.com> wrote:
> 
> Hi Rich,
> Query creation is correct only thing what causing the problem is that
> Boolean + query while building the lucene query which causing all tokens to
> be matched in the document (equivalent of mm=100%) even though I use mm=1
> it was using BOOLEAN + query as
> normal query one plus one abc
> Lucene query -
> +(((+nameShingle:one plus +nameShingle:plus one +nameShingle:one abc))
> ((+nameShingle:one plus +nameShingle:plus one abc)) ((+nameShingle:one plus
> one +nameShingle:one abc)) (nameShingle:one plus one abc))
> 
> Now since my doc contains only one plus one thus --
> one plus ,plus one, one plus one
> thus due to Boolean + it was not matching.
> Thanks,
> Aman Deep Singh
> 
> On Fri, Mar 31, 2017 at 4:41 PM Rick Leir <rl...@leirtech.com> wrote:
> 
>> Hi Aman
>> Did you try the Admin Analysis tool? It will show you which filters are
>> effective at index and query time. It will help you understand why you are
>> not getting a mach.
>> Cheers -- Rick
>> 
>> On March 31, 2017 2:36:33 AM EDT, Aman Deep Singh <
>> amandeep.cool99@gmail.com> wrote:
>>> Hi,
>>> I was trying to use the shingle filter but it was not creating the
>>> query as
>>> desirable.
>>> 
>>> my schema is
>>> <fieldType name="cust_shingle" class="solr.TextField"
>>> positionIncrementGap=
>>> "100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/>
>>> <filter
>>> class="solr.ShingleFilterFactory" outputUnigrams="false"
>>> maxShingleSize="4"
>>> /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer>
>>> </fieldType>
>>> <field name="nameShingle" type="cust_shingle" indexed="true"
>>> stored="true"/>
>>> 
>>> my solr query is
>>> 
>> http://localhost:8983/solr/productCollection/select?defType=edismax&debugQuery=true&q=one%20plus%20one%20four&qf=nameShingle&
>>> *sow=false*&wt=xml
>>> 
>>> and it was creating the parsed query as
>>> <str name="parsedquery">
>>> (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one
>>> +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus
>>> +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one
>>> plus
>>> one +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus
>>> one
>>> four)))~1)/no_coord
>>> </str>
>>> <str name="parsedquery_toString">
>>> *+((((+nameShingle:one plus +nameShingle:plus one +nameShingle:one
>>> four))
>>> ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one
>>> plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)*
>>> </str>
>>> 
>>> 
>>> So ideally token creations is perfect but in the query it is using
>>> boolean + operator which is causing the problem as if i have a document
>>> with name as
>>> "one plus one" ,according to the shingles it has to matched as its
>>> token
>>> will be  ("one plus","one plus one","plus one") .
>>> I have tried using the q.op and played around the mm also but nothing
>>> is
>>> giving me the correct response.
>>> Any idea how i can fetch that document even if the document is missing
>>> any
>>> token.
>>> 
>>> My expected response will be getting the document
>>> "one plus one" even the user query has any additional term like "one
>>> plus
>>> one two" and so on.
>>> 
>>> 
>>> Thanks,
>>> Aman Deep Singh
>> 
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Solr Shingle is not working properly in solr 6.5.0

Posted by Steve Rowe <sa...@gmail.com>.

Aman,

In forthcoming Solr 6.5.1, this problem will be addressed by setting a new <fieldtype> option named “enableGraphQueries” to “false".

Your fieldtype will look like this:

-----
<fieldType name="cust_shingle" class=“solr.TextField" positionIncrementGap=“100” enableGraphQueries=“false”>
  <analyzer> 
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.ShingleFilterFactory" outputUnigrams=“false" maxShingleSize="4”/>
    <filter class="solr.LowerCaseFilterFactory”/>
  </analyzer>
</fieldType>
-----

--
Steve
www.lucidworks.com

> On Apr 4, 2017, at 5:32 PM, Steve Rowe <sa...@gmail.com> wrote:
> 
> Hi Aman,
> 
> I’ve created <https://issues.apache.org/jira/browse/SOLR-10423> for this problem.
> 
> --
> Steve
> www.lucidworks.com
> 
>> On Mar 31, 2017, at 7:34 AM, Aman Deep Singh <am...@gmail.com> wrote:
>> 
>> Hi Rich,
>> Query creation is correct only thing what causing the problem is that
>> Boolean + query while building the lucene query which causing all tokens to
>> be matched in the document (equivalent of mm=100%) even though I use mm=1
>> it was using BOOLEAN + query as
>> normal query one plus one abc
>> Lucene query -
>> +(((+nameShingle:one plus +nameShingle:plus one +nameShingle:one abc))
>> ((+nameShingle:one plus +nameShingle:plus one abc)) ((+nameShingle:one plus
>> one +nameShingle:one abc)) (nameShingle:one plus one abc))
>> 
>> Now since my doc contains only one plus one thus --
>> one plus ,plus one, one plus one
>> thus due to Boolean + it was not matching.
>> Thanks,
>> Aman Deep Singh
>> 
>> On Fri, Mar 31, 2017 at 4:41 PM Rick Leir <rl...@leirtech.com> wrote:
>> 
>>> Hi Aman
>>> Did you try the Admin Analysis tool? It will show you which filters are
>>> effective at index and query time. It will help you understand why you are
>>> not getting a mach.
>>> Cheers -- Rick
>>> 
>>> On March 31, 2017 2:36:33 AM EDT, Aman Deep Singh <
>>> amandeep.cool99@gmail.com> wrote:
>>>> Hi,
>>>> I was trying to use the shingle filter but it was not creating the
>>>> query as
>>>> desirable.
>>>> 
>>>> my schema is
>>>> <fieldType name="cust_shingle" class="solr.TextField"
>>>> positionIncrementGap=
>>>> "100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/>
>>>> <filter
>>>> class="solr.ShingleFilterFactory" outputUnigrams="false"
>>>> maxShingleSize="4"
>>>> /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer>
>>>> </fieldType>
>>>> <field name="nameShingle" type="cust_shingle" indexed="true"
>>>> stored="true"/>
>>>> 
>>>> my solr query is
>>>> 
>>> http://localhost:8983/solr/productCollection/select?defType=edismax&debugQuery=true&q=one%20plus%20one%20four&qf=nameShingle&
>>>> *sow=false*&wt=xml
>>>> 
>>>> and it was creating the parsed query as
>>>> <str name="parsedquery">
>>>> (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one
>>>> +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus
>>>> +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one
>>>> plus
>>>> one +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus
>>>> one
>>>> four)))~1)/no_coord
>>>> </str>
>>>> <str name="parsedquery_toString">
>>>> *+((((+nameShingle:one plus +nameShingle:plus one +nameShingle:one
>>>> four))
>>>> ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one
>>>> plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)*
>>>> </str>
>>>> 
>>>> 
>>>> So ideally token creations is perfect but in the query it is using
>>>> boolean + operator which is causing the problem as if i have a document
>>>> with name as
>>>> "one plus one" ,according to the shingles it has to matched as its
>>>> token
>>>> will be  ("one plus","one plus one","plus one") .
>>>> I have tried using the q.op and played around the mm also but nothing
>>>> is
>>>> giving me the correct response.
>>>> Any idea how i can fetch that document even if the document is missing
>>>> any
>>>> token.
>>>> 
>>>> My expected response will be getting the document
>>>> "one plus one" even the user query has any additional term like "one
>>>> plus
>>>> one two" and so on.
>>>> 
>>>> 
>>>> Thanks,
>>>> Aman Deep Singh
>>> 
>>> --
>>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>