You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Daan Hoogland <da...@asml.com> on 2004/10/04 13:27:26 UTC

different analyzer all produce the same index?

H all,

I try to create different indices using different Analyzer-classes. I 
tried standard, german, russian, and cjk. They all produce exactly the 
same index file (md5-wise). There are over 280 pages so I expected at 
least some differences.

Any ideas anyone?


-- 
The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. ASML is neither liable for the proper and complete transmission of the information contained in this communication, nor for any delay in its receipt.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: different analyzer all produce the same index?

Posted by Morus Walter <mo...@tanto.de>.
sergiu gordea writes:
> Daan Hoogland wrote:
> 
> >H all,
> >
> >I try to create different indices using different Analyzer-classes. I 
> >tried standard, german, russian, and cjk. They all produce exactly the 
> >same index file (md5-wise). There are over 280 pages so I expected at 
> >least some differences.
> >
> >  
> >
> Take a look in the lucene source code... Maybe you will find the answer ...
> I asume that all the pages you indexed were written in English, 
> therefore is normal that german, russian and cjk analyzers to
> create identic indexex, but htey should be different  than english one 
> (StandardAnalyzer)
> 
german analyzer definitely won't leave english text as it is, since it
does algorithmic stemming.
E.g. your text get's
tak a look in the luc sourc cod mayb you will find the answ i asum tha all the pag you indexed wer writt in english therefor is normal tha germa russia and cjk analyx to crea identic indexex but htey should be diff tha english one standardanalyx
  while std analyzer does not stem at all and gives
take a look in the lucene source code maybe you will find the answer i asume that all the pages you indexed were written in english therefore is normal that german russian and cjk analyzers to create identic indexex but htey should be different than english one standardanalyzer

I'd rather suspect some problem with the indexing code.
So my advice is, to check what the analyzer produces.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: different analyzer all produce the same index?

Posted by sergiu gordea <gs...@ifit.uni-klu.ac.at>.
Daan Hoogland wrote:

>H all,
>
>I try to create different indices using different Analyzer-classes. I 
>tried standard, german, russian, and cjk. They all produce exactly the 
>same index file (md5-wise). There are over 280 pages so I expected at 
>least some differences.
>
>  
>
Take a look in the lucene source code... Maybe you will find the answer ...
I asume that all the pages you indexed were written in English, 
therefore is normal that german, russian and cjk analyzers to
create identic indexex, but htey should be different  than english one 
(StandardAnalyzer)


All the best,

 Sergiu

>Any ideas anyone?
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org