You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alok Bhandari <al...@gmail.com> on 2012/06/19 07:24:45 UTC
StandardTokenizerFactory behaviour
Hello ,
I am working on Solr from last few months and stuck some where ,
Analyzer in Field Definition : --
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
In: "Please, email john.doe@foo.com by 03-09, re: m37-xq."
Expected Out: "Please", "email", "john.doe@foo.com", "by", "03-09", "re",
"m37-xq"
but not getting this. Is something wrong with my understanding of
StandardTokenizer? I am using solr 3.6.
Please let me know what is wrong with this. Thanks
--
View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-behaviour-tp3990215.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: AW: StandardTokenizerFactory behaviour
Posted by Alok Bhandari <al...@gmail.com>.
thanks for the reply.
--
View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-behaviour-tp3990215p3990280.html
Sent from the Solr - User mailing list archive at Nabble.com.
AW: StandardTokenizerFactory behaviour
Posted by Markus Klose <mk...@shi-gmbh.com>.
The behaviour of the StandardTokenizerFactory changed with solr 3.1.
The actual output is now: "Please", "email", "john.doe", "foo.com", "by", "03", "09", "re", "m37","xq"
Viele Grüße aus Augsburg
Markus Klose
SHI Elektronische Medien GmbH
-----Ursprüngliche Nachricht-----
Von: Alok Bhandari [mailto:alokomprakashbhandari@gmail.com]
Gesendet: Dienstag, 19. Juni 2012 07:33
An: solr-user@lucene.apache.org
Betreff: Re: StandardTokenizerFactory behaviour
Just to make sure that there is no ambiguity the In: "Please, email john.doe@foo.com by 03-09, re: m37-xq." is the input given to this field for indexing and the Expected Out: "Please", "email", "john.doe@foo.com", "by", "03-09", "re", "m37-xq" is expected output tokens.
--
View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-behaviour-tp3990215p3990216.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: StandardTokenizerFactory behaviour
Posted by Alok Bhandari <al...@gmail.com>.
Just to make sure that there is no ambiguity the In: "Please, email
john.doe@foo.com by 03-09, re: m37-xq." is the input given to this field for
indexing and the Expected Out: "Please", "email", "john.doe@foo.com", "by",
"03-09", "re", "m37-xq" is expected output tokens.
--
View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-behaviour-tp3990215p3990216.html
Sent from the Solr - User mailing list archive at Nabble.com.