You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vit <bu...@yahoo.com> on 2014/11/25 14:20:18 UTC

Help on matching a shingle in a query to a shingle in the document

Example what I need:
Query:
Hi likes *this kind of winter *weather
Document shingle field:
They like *this kind of winter *with many sunny days 

So I need to match *this kind of winter *.

What tokenisers and filters and maybe something else should be used for this
kind of match. 

I tried for example this one, but it matches the entire query to a shingle:
<fieldType name="text_shingle" class="solr.TextField"
positionIncrementGap="100">
   <analyzer type="index">
     <tokenizer class="solr.StandardTokenizerFactory"/>
     <filter class="solr.LowerCaseFilterFactory" />
     <filter class="solr.ShingleFilterFactory" minShingleSize="2"
maxShingleSize="5"
             outputUnigrams="false" outputUnigramsIfNoShingles="true"
tokenSeparator=" "/>
   </analyzer>
   <analyzer type="query">
     <tokenizer class="solr.StandardTokenizerFactory"/>
     <filter class="solr.LowerCaseFilterFactory" />
     <filter class="solr.ShingleFilterFactory" minShingleSize="2"
maxShingleSize="5"
             outputUnigrams="false" outputUnigramsIfNoShingles="true"
tokenSeparator=" "/>
   </analyzer>
 </fieldType>




--
View this message in context: http://lucene.472066.n3.nabble.com/Help-on-matching-a-shingle-in-a-query-to-a-shingle-in-the-document-tp4170852.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help on matching a shingle in a query to a shingle in the document

Posted by vit <bu...@yahoo.com>.
Erick,
What you are saying of course makes perfect sense. 
But in our particular situation there is a high probability that an
essential part of the query will match a meaningful part or a business name
in a short description indexed as shingle. 
Also it is better than just a broad match.
Besides I am on the research stage and will run some analysis for queries
and results. 

So from what you are saying to reach my goal I need to shingle a query
myself in the preprocessing stage and try to match it using OR with my
shingled field. Is it correct? Or there is some more elegant way to handle
it.     



--
View this message in context: http://lucene.472066.n3.nabble.com/Help-on-matching-a-shingle-in-a-query-to-a-shingle-in-the-document-tp4170852p4170905.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help on matching a shingle in a query to a shingle in the document

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Sounds like an attempt to identify stable Multi Word Units, sometimes
used in Natural Language Processing.

In that case, a Shingle factory plus using the field as a facet might
do the trick.

The shingle will generate a "token" that is "this kind of winter" and
facet will give back a count for it. The query then does not matter or
will be on a different field.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 25 November 2014 at 10:28, Erick Erickson <er...@gmail.com> wrote:
> Tokenizers, filters and the like have no real way to
> figure out that some words in the query are to be
> ignored. In your example, how would one algorithmically
> determine that "this kind of winter" is important and that
> "Hi", "likes" and "weather" aren't? What's different
> about like/likes that indicates that the stemmed version
> of "like" shouldn't be important? Both the query
> and text could match "likes this kind of winter".
>
> This feels like an XY problem, what use-case are you
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Tue, Nov 25, 2014 at 5:20 AM, vit <bu...@yahoo.com> wrote:
>> Example what I need:
>> Query:
>> Hi likes *this kind of winter *weather
>> Document shingle field:
>> They like *this kind of winter *with many sunny days
>>
>> So I need to match *this kind of winter *.
>>
>> What tokenisers and filters and maybe something else should be used for this
>> kind of match.
>>
>> I tried for example this one, but it matches the entire query to a shingle:
>> <fieldType name="text_shingle" class="solr.TextField"
>> positionIncrementGap="100">
>>    <analyzer type="index">
>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>      <filter class="solr.LowerCaseFilterFactory" />
>>      <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>> maxShingleSize="5"
>>              outputUnigrams="false" outputUnigramsIfNoShingles="true"
>> tokenSeparator=" "/>
>>    </analyzer>
>>    <analyzer type="query">
>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>      <filter class="solr.LowerCaseFilterFactory" />
>>      <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>> maxShingleSize="5"
>>              outputUnigrams="false" outputUnigramsIfNoShingles="true"
>> tokenSeparator=" "/>
>>    </analyzer>
>>  </fieldType>
>>
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Help-on-matching-a-shingle-in-a-query-to-a-shingle-in-the-document-tp4170852.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help on matching a shingle in a query to a shingle in the document

Posted by Erick Erickson <er...@gmail.com>.
Tokenizers, filters and the like have no real way to
figure out that some words in the query are to be
ignored. In your example, how would one algorithmically
determine that "this kind of winter" is important and that
"Hi", "likes" and "weather" aren't? What's different
about like/likes that indicates that the stemmed version
of "like" shouldn't be important? Both the query
and text could match "likes this kind of winter".

This feels like an XY problem, what use-case are you
trying to solve?

Best,
Erick



On Tue, Nov 25, 2014 at 5:20 AM, vit <bu...@yahoo.com> wrote:
> Example what I need:
> Query:
> Hi likes *this kind of winter *weather
> Document shingle field:
> They like *this kind of winter *with many sunny days
>
> So I need to match *this kind of winter *.
>
> What tokenisers and filters and maybe something else should be used for this
> kind of match.
>
> I tried for example this one, but it matches the entire query to a shingle:
> <fieldType name="text_shingle" class="solr.TextField"
> positionIncrementGap="100">
>    <analyzer type="index">
>      <tokenizer class="solr.StandardTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory" />
>      <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="5"
>              outputUnigrams="false" outputUnigramsIfNoShingles="true"
> tokenSeparator=" "/>
>    </analyzer>
>    <analyzer type="query">
>      <tokenizer class="solr.StandardTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory" />
>      <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="5"
>              outputUnigrams="false" outputUnigramsIfNoShingles="true"
> tokenSeparator=" "/>
>    </analyzer>
>  </fieldType>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Help-on-matching-a-shingle-in-a-query-to-a-shingle-in-the-document-tp4170852.html
> Sent from the Solr - User mailing list archive at Nabble.com.