You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Pe...@taylorandfrancis.com on 2006/01/31 22:07:57 UTC

maximum string length in index field

I have some really long chemical names that I am storing in an index and
it looks like they are being split into two terms.  Is there a way to
increase the max term length?

Here is an example:

DTryptophanmethylLleucineethylLhprolinamidedeglycinamideluteinizing&nbsp
;hormonereleasing&nbsp;factor&nbsp;pig679010NN!#6-<h3>D<h0>-Tryptophan-7
-(<h1>N<h0>-methyl-<h3>L<h0>-leucine)-9-(<h1>N<h0>-ethyl-<h3>L<h0>-proli
namide)-10-deglycinamide-luteinizing&nbsp;hormone-releasing&nbsp;factor&
nbsp;(pig)
length of name: 298
Number of docs in index: 1

sort_name:DTryptophanmethylLleucineethylLhprolinamidedeglycinamidelutein
izing&nbsp;hormonereleasing&nbsp;factor&nbsp;pig679010NN!#6-<h3>D<h0>-Tr
yptophan-7-(<h1>N<h0>-methyl-<h3>L<h0>-leucine)-9-(<h1>N<h0>-ethyl-<h3>L
<h0>-prolinamide)-10-deglycinamide-luteinizing&nb   freq: 1

sort_name:sp;hormone-releasing&nbsp;factor&nbsp;(pig)   freq: 1

Total terms: 2   Total Occuracnes:2

I only put one name in the index using whitespace analyzer and making
sure there are no whitespaces.  However there are two terms in the
index.

Thanks,
Peter

RE: maximum string length in index field

Posted by Koji Sekiguchi <ko...@m4.dion.ne.jp>.
Peter,

CharTokenizer may be the cause of the problem.
It is the parent Tokenizer of WhitespaceTokenizer
which is used by WhitespaceAnalyzer and it
has 255 bytes buffer.

How about using KeywordAnalyzer instead of WhitespaceAnalyzer?

Thanks,

Koji

> -----Original Message-----
> From: Peter.Kipping@taylorandfrancis.com 
> [mailto:Peter.Kipping@taylorandfrancis.com]
> Sent: Wednesday, February 01, 2006 6:08 AM
> To: java-user@lucene.apache.org
> Subject: maximum string length in index field
> 
> 
> I have some really long chemical names that I am storing in an index and
> it looks like they are being split into two terms.  Is there a way to
> increase the max term length?
> 
> Here is an example:
> 
> DTryptophanmethylLleucineethylLhprolinamidedeglycinamideluteinizing&nbsp
> ;hormonereleasing&nbsp;factor&nbsp;pig679010NN!#6-<h3>D<h0>-Tryptophan-7
> -(<h1>N<h0>-methyl-<h3>L<h0>-leucine)-9-(<h1>N<h0>-ethyl-<h3>L<h0>-proli
> namide)-10-deglycinamide-luteinizing&nbsp;hormone-releasing&nbsp;factor&
> nbsp;(pig)
> length of name: 298
> Number of docs in index: 1
> 
> sort_name:DTryptophanmethylLleucineethylLhprolinamidedeglycinamidelutein
> izing&nbsp;hormonereleasing&nbsp;factor&nbsp;pig679010NN!#6-<h3>D<h0>-Tr
> yptophan-7-(<h1>N<h0>-methyl-<h3>L<h0>-leucine)-9-(<h1>N<h0>-ethyl-<h3>L
> <h0>-prolinamide)-10-deglycinamide-luteinizing&nb   freq: 1
> 
> sort_name:sp;hormone-releasing&nbsp;factor&nbsp;(pig)   freq: 1
> 
> Total terms: 2   Total Occuracnes:2
> 
> I only put one name in the index using whitespace analyzer and making
> sure there are no whitespaces.  However there are two terms in the
> index.
> 
> Thanks,
> Peter
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org