You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alok Bhandari <al...@gmail.com> on 2012/06/19 07:24:45 UTC

StandardTokenizerFactory behaviour

Hello ,

I am working on Solr from last few months and stuck some where ,

Analyzer in Field Definition : --

<analyzer>
  <tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>

In: "Please, email john.doe@foo.com by 03-09, re: m37-xq."

Expected Out: "Please", "email", "john.doe@foo.com", "by", "03-09", "re",
"m37-xq"

but not getting this. Is something wrong with my understanding of
StandardTokenizer? I am using solr 3.6.
Please let me know what is wrong with this. Thanks


--
View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-behaviour-tp3990215.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AW: StandardTokenizerFactory behaviour

Posted by Alok Bhandari <al...@gmail.com>.
thanks for the reply.

--
View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-behaviour-tp3990215p3990280.html
Sent from the Solr - User mailing list archive at Nabble.com.

AW: StandardTokenizerFactory behaviour

Posted by Markus Klose <mk...@shi-gmbh.com>.
The behaviour of the StandardTokenizerFactory  changed  with solr 3.1.
The actual output is now: "Please", "email", "john.doe", "foo.com", "by", "03", "09", "re", "m37","xq"

Viele Grüße aus Augsburg

Markus Klose
SHI Elektronische Medien GmbH 
 



-----Ursprüngliche Nachricht-----
Von: Alok Bhandari [mailto:alokomprakashbhandari@gmail.com] 
Gesendet: Dienstag, 19. Juni 2012 07:33
An: solr-user@lucene.apache.org
Betreff: Re: StandardTokenizerFactory behaviour


Just to make sure that there is no ambiguity the In: "Please, email john.doe@foo.com by 03-09, re: m37-xq." is the input given to this field for indexing and the Expected Out: "Please", "email", "john.doe@foo.com", "by", "03-09", "re", "m37-xq"  is expected output tokens.

--
View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-behaviour-tp3990215p3990216.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: StandardTokenizerFactory behaviour

Posted by Alok Bhandari <al...@gmail.com>.
Just to make sure that there is no ambiguity the In: "Please, email
john.doe@foo.com by 03-09, re: m37-xq." is the input given to this field for
indexing and the Expected Out: "Please", "email", "john.doe@foo.com", "by",
"03-09", "re", "m37-xq"  is expected output tokens.

--
View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-behaviour-tp3990215p3990216.html
Sent from the Solr - User mailing list archive at Nabble.com.