You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Tirthankar Chatterjee <tc...@commvault.com> on 2011/06/13 21:07:57 UTC

Using Edismax

We are using edismax for query and the query fired is (url:_2010)

http://redcarpet2.dm2.commvault.com:27000/solr/select/?q=url: 2010&version=2.2&start=0&rows=10&indent=on&defType=edismax<http://redcarpet2.dm2.commvault.com:27000/solr/select/?q=url:%202010&version=2.2&start=0&rows=10&indent=on&defType=edismax>

the url field is of type text_rev

Results that SOLR returns has 1 extra item which we don't want to get. How do we achieve that?

Results:

SPC265_SharePoint_2010.pptx
OpenTRs2010.xlsx(we don't want this to be returned)


Thanks in advance!!!

Tirthankar


******************Legal Disclaimer***************************
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you."
*********************************************************

Re: Using Edismax

Posted by Jan Høydahl <ja...@cominvent.com>.

Hi,

Let's assume you're using Solr version 3.1.0 and an unmodified FieldType "text_rev". It looks like this:

    <fieldType name="text_rev" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
           maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
      </analyzer>
      ...

Also let's assume that what you have two docs in your index with these URLs:
A:"http://my.host/SPC265_SharePoint_2010.pptx"
B:"http://my.host/OpenTRs2010.xlsx"

Now you want to match only A and not B, and you attempt that using q=url:_2010

What happens here can easily be simulated by http://localhost:8983/solr/admin/analysis.jsp:

Re: Using Edismax

Posted by Ahmet Arslan <io...@yahoo.com>.

> Thx for the reply. But what can I do to avoid getting 2010.
> I wanted a phrase query with underscore, so it would return
> results with underscore2010 only.

For example, you can remove WordDelimeterFilterFactory from your field type definition.

According to your needs, you can use an other fieldType for your url field.

Re: Using Edismax

Posted by Tirthankar Chatterjee <tc...@commvault.com>.

Eric
Thx for the reply. But what can I do to avoid getting 2010. I wanted a phrase query with underscore, so it would return results with underscore2010 only.

Sent from iPod

On Jun 13, 2011, at 3:47 PM, "Erick Erickson" <er...@gmail.com> wrote:

> You haven't supplied the information that's really
> needed to help here, please review:
> 
> http://wiki.apache.org/solr/UsingMailingLists
> 
> But at a guess your analysis chain contains
> WordDelimiterFilterFactory, which is splitting
> the input stream into tokens on letter/number
> changes, and capitalization changes. So you're
> getting "2010" indexed as a separate token and
> you're also searching on it...
> 
> Best
> Erick
> 
> On Mon, Jun 13, 2011 at 3:07 PM, Tirthankar Chatterjee
> <tc...@commvault.com> wrote:
>> We are using edismax for query and the query fired is (url:_2010)
>> 
>> http://redcarpet2.dm2.commvault.com:27000/solr/select/?q=url: 2010&version=2.2&start=0&rows=10&indent=on&defType=edismax<http://redcarpet2.dm2.commvault.com:27000/solr/select/?q=url:%202010&version=2.2&start=0&rows=10&indent=on&defType=edismax>
>> 
>> the url field is of type text_rev
>> 
>> Results that SOLR returns has 1 extra item which we don't want to get. How do we achieve that?
>> 
>> Results:
>> 
>> SPC265_SharePoint_2010.pptx
>> OpenTRs2010.xlsx(we don't want this to be returned)
>> 
>> 
>> Thanks in advance!!!
>> 
>> Tirthankar
>> 
>> 
>> ******************Legal Disclaimer***************************
>> "This communication may contain confidential and privileged
>> material for the sole use of the intended recipient. Any
>> unauthorized review, use or distribution by others is strictly
>> prohibited. If you have received the message in error, please
>> advise the sender by reply email and delete the message. Thank
>> you."
>> *********************************************************

Re: Using Edismax

Posted by Erick Erickson <er...@gmail.com>.

You haven't supplied the information that's really
needed to help here, please review:

http://wiki.apache.org/solr/UsingMailingLists

But at a guess your analysis chain contains
WordDelimiterFilterFactory, which is splitting
the input stream into tokens on letter/number
changes, and capitalization changes. So you're
getting "2010" indexed as a separate token and
you're also searching on it...

Best
Erick

On Mon, Jun 13, 2011 at 3:07 PM, Tirthankar Chatterjee
<tc...@commvault.com> wrote:
> We are using edismax for query and the query fired is (url:_2010)
>
> http://redcarpet2.dm2.commvault.com:27000/solr/select/?q=url: 2010&version=2.2&start=0&rows=10&indent=on&defType=edismax<http://redcarpet2.dm2.commvault.com:27000/solr/select/?q=url:%202010&version=2.2&start=0&rows=10&indent=on&defType=edismax>
>
> the url field is of type text_rev
>
> Results that SOLR returns has 1 extra item which we don't want to get. How do we achieve that?
>
> Results:
>
> SPC265_SharePoint_2010.pptx
> OpenTRs2010.xlsx(we don't want this to be returned)
>
>
> Thanks in advance!!!
>
> Tirthankar
>
>
> ******************Legal Disclaimer***************************
> "This communication may contain confidential and privileged
> material for the sole use of the intended recipient. Any
> unauthorized review, use or distribution by others is strictly
> prohibited. If you have received the message in error, please
> advise the sender by reply email and delete the message. Thank
> you."
> *********************************************************

RE: Using Edismax

Posted by Tirthankar Chatterjee <tc...@commvault.com>.

There is an (underscore) character before 2010

-----Original Message-----
From: Tirthankar Chatterjee [mailto:tchatterjee@commvault.com]
Sent: Monday, June 13, 2011 3:08 PM
To: solr-user@lucene.apache.org
Subject: Using Edismax

We are using edismax for query and the query fired is (url:_2010)

http://redcarpet2.dm2.commvault.com:27000/solr/select/?q=url: 2010&version=2.2&start=0&rows=10&indent=on&defType=edismax<http://redcarpet2.dm2.commvault.com:27000/solr/select/?q=url:%202010&version=2.2&start=0&rows=10&indent=on&defType=edismax>

the url field is of type text_rev

Results that SOLR returns has 1 extra item which we don't want to get. How do we achieve that?

Results:

SPC265_SharePoint_2010.pptx
OpenTRs2010.xlsx(we don't want this to be returned)

Thanks in advance!!!

Tirthankar

******************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you."
*********************************************************