You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "anurag.jain" <an...@gmail.com> on 2013/04/12 17:32:44 UTC

Which tokenizer or analizer should use and field type

my schema file is :

<copyField source="title" dest ="keyword"/>
<copyField source="body" dest ="keyword"/>
<copyField source="company_name" dest="keyword"/>
<copyField source="company_profile" dest="keyword"/>

<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="body" type="text_general" indexed="true" stored="true"/>
<field name="company_name" type="text_general" indexed="true"
stored="true"/>
<field name="company_profile" type="text_general" indexed="true"
stored="true"/>

<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>





values are like,

title: "Assistant Coach/ Junior Assistant"
body: "<p> <http://i.imgur.com/buPga.jpg> <br /><br />Oil India Ltd. invites
applications for the post of <strong>Sr Medical Officer (Paediatrics)
</strong><br /> www.freshersworld.com<br /> <strong>Qualification</strong> :
MD (Paediatrics) <br /><br /> <strong>No of Post</strong> : 1UR<br /> <br
/><strong> Pay Scale</strong> : Rs 32900 -58000 <br /> <br /> <strong>Age as
on 11.04.2013</strong> : 32 yrs<br /> </p><p><strong>Selection Procedure :
</strong>Selection for the above post will be based on Written Test, Group
Discussion (GD), Viva-Voce and Medical Examination.<br /> </p>"

company_profile: "<p>The story of <strong>Oil India Limited (OIL)</strong>
traces and symbolises the development and growth of the Indian petroleum
industry. From the discovery of crude oil in the far east of India at
Digboi, Assam in 1889 to its present status as a fully integrated upstream
petroleum company, OIL has come far, crossing many milestones.</p>",

company_name: "Oil India Limited",



please give me suggestion about field type i should use.

keyword is copyfield i am using for search. i do not want to search on html
content.

How search will happen ?


if i give words to search

project assistant,manager


it only should give me keyword have project assistance or manager.

right now it is giving me results which has project or assistance or manager
that is wrong case for me.

Please give me solution for it. I have to complete that task by today thats
why i am not able to do research on it. 


need field type definitions for each field. and how search query i'll write
?? 

thanks in advance






--
View this message in context: http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Which tokenizer or analizer should use and field type

Posted by Erick Erickson <er...@gmail.com>.

try executing these with &debug=all and examine the resulting parsed query,
that'll show you exactly how the query is parsed.

Also, the query language is not strictly boolean, see:
http://searchhub.org/2011/12/28/why-not-and-or-and-not/

The first thing I would try would be to parenthesize explicitly as

keyword:((assistant AND coach) OR (iit AND kanpur))

Best
Erick

On Sat, Apr 13, 2013 at 7:06 PM, anurag.jain <an...@gmail.com> wrote:
> Hi, If you can help me in. It will solve my problem.
>
> keyword:(*assistant AND coach*) giving me 1 result.
>
> keyword:(*iit AND kanpur*)  giving me 2 result.
>
> But query:-
>
> keyword:(*assistant AND coach* OR (*iit AND kanpur*)) giving me only 1
> result.
>
> Also i tried. keyword:(*assistant AND coach* OR (*:* *iit AND kanpur*))
> giving me only 1 result. Don't know why.
>
> How query should look like ?? please help me to find out solution.
>
> Thanks in advance.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591p4055837.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Which tokenizer or analizer should use and field type

Posted by "anurag.jain" <an...@gmail.com>.

Hi, If you can help me in. It will solve my problem.

keyword:(*assistant AND coach*) giving me 1 result.

keyword:(*iit AND kanpur*)  giving me 2 result.

But query:- 

keyword:(*assistant AND coach* OR (*iit AND kanpur*)) giving me only 1
result.

Also i tried. keyword:(*assistant AND coach* OR (*:* *iit AND kanpur*))
giving me only 1 result. Don't know why. 

How query should look like ?? please help me to find out solution. 

Thanks in advance.





--
View this message in context: http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591p4055837.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Which tokenizer or analizer should use and field type

Posted by "anurag.jain" <an...@gmail.com>.

I tried both way.
(project AND assistant) OR manager 

"project assistant"~5 OR manager 


it is working properly.
but i got problem.

if i give query projec assistant, then it is not able to find out. 

and what is meaning of ~5 ?

If i write *projec assistant* then it is able to find out but it give
project or assistant. 

My objective is to search like - Mysql like operator, %search word% .

How to write query which is exactly like , Mysql like operator. 

Thanks 

Need help As soon as possible






--
View this message in context: http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591p4055833.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Which tokenizer or analizer should use and field type

Posted by Jack Krupansky <ja...@basetechnology.com>.

Unfortunately, Solr doesn't have a query parser that would give the meaning 
you want to:

project assistant,manager

For now, you would need to write that query as:

(project AND assistant) OR manager

Or maybe as:

"project assistant"~5 OR manager

That would require project and assistant to occur with a few words of each 
other.

Or, if you have q.op defaulted to "OR":

"project assistant"~5 manager

Add the HTML strip char filter to your text field type:

<charFilter class="solr.HTMLStripCharFilterFactory" />

text_general is a semi-decent place to start.

-- Jack Krupansky

-----Original Message----- 
From: anurag.jain
Sent: Friday, April 12, 2013 11:32 AM
To: solr-user@lucene.apache.org
Subject: Which tokenizer or analizer should use and field type

my schema file is :

<copyField source="title" dest ="keyword"/>
<copyField source="body" dest ="keyword"/>
<copyField source="company_name" dest="keyword"/>
<copyField source="company_profile" dest="keyword"/>

<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="body" type="text_general" indexed="true" stored="true"/>
<field name="company_name" type="text_general" indexed="true"
stored="true"/>
<field name="company_profile" type="text_general" indexed="true"
stored="true"/>

<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />

        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>





values are like,

title: "Assistant Coach/ Junior Assistant"
body: "<p> <http://i.imgur.com/buPga.jpg> <br /><br />Oil India Ltd. invites
applications for the post of <strong>Sr Medical Officer (Paediatrics)
</strong><br /> www.freshersworld.com<br /> <strong>Qualification</strong> :
MD (Paediatrics) <br /><br /> <strong>No of Post</strong> : 1UR<br /> <br
/><strong> Pay Scale</strong> : Rs 32900 -58000 <br /> <br /> <strong>Age as
on 11.04.2013</strong> : 32 yrs<br /> </p><p><strong>Selection Procedure :
</strong>Selection for the above post will be based on Written Test, Group
Discussion (GD), Viva-Voce and Medical Examination.<br /> </p>"

company_profile: "<p>The story of <strong>Oil India Limited (OIL)</strong>
traces and symbolises the development and growth of the Indian petroleum
industry. From the discovery of crude oil in the far east of India at
Digboi, Assam in 1889 to its present status as a fully integrated upstream
petroleum company, OIL has come far, crossing many milestones.</p>",

company_name: "Oil India Limited",



please give me suggestion about field type i should use.

keyword is copyfield i am using for search. i do not want to search on html
content.

How search will happen ?


if i give words to search

project assistant,manager


it only should give me keyword have project assistance or manager.

right now it is giving me results which has project or assistance or manager
that is wrong case for me.

Please give me solution for it. I have to complete that task by today thats
why i am not able to do research on it.


need field type definitions for each field. and how search query i'll write
??

thanks in advance






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591.html
Sent from the Solr - User mailing list archive at Nabble.com.