You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chris Bamford <cb...@mimecast.com> on 2017/03/20 12:02:11 UTC

Limiting terms / field

Hello,

We are using Lucene 4.10.3 and are interested in limiting the number of terms per field.  In the past this was set by the IndexWriter (maxFieldLength) and the default was 10K; as I understand it this is no longer  the case, in fact it is now unlimited by default?

Anyway, what is the best way we can do this?  I have found some references to a class called LimitTokenCountFilter, but I believe it is only found in later versions.

Thanks

- Chris


[ YouTube: http://www.youtube.com/user/mimecast#p/u/15/_523kC3lcNQ]  [ Twitter: http://twitter.com/mimecast ]  [ Our Blog: http://blog.mimecast.com/ ] 

Chris Bamford
Lead Software Engineer
c: +44 7860 405292
p: +44 207 847 8700
http://www.mimecast.com

Johannesburg Map 
GPS: 26' 05.940" S, 18o 28' 04.278" E
(http://maps.google.com/maps/ms?hl=en&ie=UTF8&msa=0&msid=104153695170153523925.000469102c74a808b138c≪=-26.099685,28.069403&spn=0.011986,0.026178&z=16)

Cape Town Map
GPS: 33o 56.068" S, 18o 28.320" E
(http://maps.google.com/maps/ms?source=s_q&hl=en≥ocode=&mrt=all&ie=UTF8&g=Fir+Street,+Observatory,Cape+Town&msa=0≪=-33.934753,18.4721&spn=0.00413,0.009656&z=17&msid=100887237870528382628.00046a80a3916c933dad3)

====================================================================================================================================================================

Disclaimer

This email, sent at 12:02:15 on 2017-03-20 from cbamford@mimecast.com to java-user@lucene.apache.org has been scanned for viruses and malware by Mimecast, an innovator in software as a service (SaaS) for business. 's email continuity, security, archiving and compliancy is managed by Mimecast's unified email management platform. 
To find out more, email info@mimecast.co.za or request a demo.

Mimecast SA (Pty) Ltd is a registered company within the Republic of South Africa, company registration number: 2004/000965/07  VAT No. 4650210547



RE: Limiting terms / field

Posted by Uwe Schindler <uw...@thetaphi.de>.
It is also in 4.10.3 as part of the analysis-common module:

https://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html

 

Uwe

 

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: uwe@thetaphi.de

 

From: Chris Bamford [mailto:cbamford@mimecast.com] 
Sent: Monday, March 20, 2017 1:02 PM
To: java-user@lucene.apache.org
Subject: Limiting terms / field

 

 


Hello,

We are using Lucene 4.10.3 and are interested in limiting the number of terms per field. In the past this was set by the IndexWriter (maxFieldLength) and the default was 10K; as I understand it this is no longer the case, in fact it is now unlimited by default?

Anyway, what is the best way we can do this? I have found some references to a class called LimitTokenCountFilter, but I believe it is only found in later versions.

Thanks

- Chris





Chris Bamford 

m: +44 7860 405292 

 <http://www.mimecast.com/> www.mimecast.com 


Lead Software Engineer

p: +44 207 847 8700

Address click  <http://www.mimecast.com/About-us/Contact-us/> here


  _____  



 <https://eu-api.mimecast.com/s/click/V5cKV3mUEd00vOSgvXwtbZyPc4HU7YXzH3Q5Ov2IZlOUD5KnfY0Eo__Me97k70LeLdXjnAQzFGO9rpjwJts3InRknkny11ed5T74o6AsRNboAh8dqyqFsq0unf3MyHXrJZy3M1JP91JCAA_brBpnBkIxsGjUljn71poOVL3N1hyJLCqRsucHp-dI8GBbMwFeLX53666oCIpB3mfN4i4LuA> 


 


 <https://eu-api.mimecast.com/s/click/1K7xTdhoqgjnB3PEFCIbObk_ZFAnXYKlTiLISV6xNSUBOHgJ34O23NXNATid7364YTyNegMgTtFqBtW54vnckhfn0k-UaDEHtFDzrnfXx8Dpjgv85mz2AnanRV970OKhhoQOsOkKG1l_SYGT0ryVgPfhypOP2MXKfgsbjzlGSXlax271RkXX8mjHRhvuYoZHUWsThNMfSE_TxYg9ZhC-Fg> 


 <https://eu-api.mimecast.com/s/click/0ChSNgfhxT33DvPLIaGrHSdBcfZACHC3pDIPU_BoN-kfqbJfezAkQS-MSlF3hLCB6ZVlhiRGR3wIEZlIHEMA3_LGld58ajMxdXLLk9tbO55u1ZecyQu36ZqO8fIxY9q4sSZO-ADUPIfmajCv7yv1VUPEzCN51poTuYrGu4oP3PJPCydVJitflwcdsJM11tr4__5Kprgsvpnc9fHlp2BM5A> 


 <https://eu-api.mimecast.com/s/click/NjE9ed9agLcHnu6tdfXIcnD6b0cN73NhGASU7Y7-fBAF3h92PJA8wj2nkLyj7kkcWp7LU4ny37JS_YM1LdyBM4VQArdMfl-tqEm2M0WJoNheY2bxkI-ZKSKKWjfj_z8nmZTbVxfHKPaaHmak2vnDwueGzhDFduwa6BKj3FyGqy-QtlJQ7csd0taeAvhVnpYsylQFZDAlgBPm2se9Vfqssw> 


 <https://eu-api.mimecast.com/s/click/v4zOP0KQ-MJlMJOoXyVCSXfOawzU6b7Yl3xFw0ODhytdAbDQ50RRte1KJKbgMViNVkD6fs8BNrBlwT-55EdBK5a4oonpL1ZATKUlP8fjrVpcAdHrVTp4NRc31q1WYWJ0gLhNSCW_kYsoyKBrHYkvgoUmKNvgh-54BQllH7JNn_KJ6jMV5TlrkToHgOmUWKJQUWsThNMfSE_TxYg9ZhC-Fg> 


 <https://eu-api.mimecast.com/s/click/XujAZpejvFW2OIhYbUKIG4bbDde5QWSLvav2Zd1T1pHjrccG5l08Ssefc_H8Zr-BkL00127s5rL6kUxtJHwqZ3VPCRBYg4JXq_Wd9owjxjfb3LUf-kNIrJE7XBBExF_k-1-DXs8HNoBxB7OUVIfjBbm80zerQX9iyu2hUqSsBeorOQA5m0DSs02m-WfDE0D8t8DxXx5osyLjtMdIc2MC7g> 

 


 


 <https://eu-api.mimecast.com/s/click/gNXpRy8Di3hABeg9FCvOkJyWCtych0vtbAF2YBaOE5exD9teAozt-UCmgN0eOSWeBMLfnjwJKrTgL9QmOD6wAHVPCRBYg4JXq_Wd9owjxjfb3LUf-kNIrJE7XBBExF_k-1-DXs8HNoBxB7OUVIfjBbm80zerQX9iyu2hUqSsBeorOQA5m0DSs02m-WfDE0D8DI8sdYf5O9VTZPl6r-07iA> 


Disclaimer
The information contained in this communication from  <ma...@mimecast.com> cbamford@mimecast.com sent at 2017-03-20 12:02:15 is confidential and may be legally privileged. It is intended solely for use by  <ma...@lucene.apache.org> java-user@lucene.apache.org and others authorized to receive it. If you are not  <ma...@lucene.apache.org> java-user@lucene.apache.org you are hereby notified that any disclosure, copying, distribution or taking action in reliance of the contents of this information is strictly prohibited and may be unlawful.

This email message has been scanned for viruses by Mimecast. Mimecast delivers a complete managed email solution from a single web based platform. For more information please visit http://www.mimecast.com