You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2007/07/02 15:44:04 UTC
[jira] Updated: (LUCENE-947) Some improvements to contrib/benchmark
[ https://issues.apache.org/jira/browse/LUCENE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-947:
--------------------------------------
Attachment: LUCENE-947.patch
First cut patch.
> Some improvements to contrib/benchmark
> --------------------------------------
>
> Key: LUCENE-947
> URL: https://issues.apache.org/jira/browse/LUCENE-947
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/benchmark
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-947.patch
>
>
> I've made some small improvements to the contrib/benchmark, mostly
> merging in the ad-hoc benchmarking code I've been using in LUCENE-843:
> - Fixed thread safety of DirDocMaker's usage of SimpleDateFormat
> - Print the props in sorted order
> - Added new config "autocommit=true|false" to CreateIndexTask
> - Added new config "ram.flush.mb=int" to AddDocTask
> - Added new configs "doc.term.vector.positions=true|false" and
> "doc.term.vector.offsets=true|false" to BasicDocMaker
> - Added WriteLineDocTask.java, so you can make an alg that uses this
> to build up a single file containing one document per line in a
> single file. EG this alg converts the reuters-out tree into a
> single file that has ~1000 bytes per body field, saved to
> work/reuters.1000.txt:
> docs.dir=reuters-out
> doc.maker=org.apache.lucene.benchmark.byTask.feeds.DirDocMaker
> line.file.out=work/reuters.1000.txt
> doc.maker.forever=false
> {WriteLineDoc(1000)}: *
> Each line has tab-separted TITLE, DATE, BODY fields.
> - Created feeds/LineDocMaker.java that creates documents read from
> the file created by WriteLineDocTask.java. EG this alg indexes
> all documents created above:
> analyzer=org.apache.lucene.analysis.SimpleAnalyzer
> directory=FSDirectory
> doc.add.log.step=500
> docs.file=work/reuters.1000.txt
> doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
> doc.tokenized=true
> doc.maker.forever=false
> ResetSystemErase
> CreateIndex
> {AddDoc}: *
> CloseIndex
> RepSumByPref AddDoc
> I'll attach initial patch shortly.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org