You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Doron Cohen (JIRA)" <ji...@apache.org> on 2007/03/26 01:07:32 UTC
[jira] Created: (LUCENE-849) Configurable HTML Parser, external
classes to path, exhasutive doc maker
Configurable HTML Parser, external classes to path, exhasutive doc maker
------------------------------------------------------------------------
Key: LUCENE-849
URL: https://issues.apache.org/jira/browse/LUCENE-849
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/benchmark
Reporter: Doron Cohen
Assigned To: Doron Cohen
Priority: Minor
"doc making" enhancements:
1. Allow configurable html parser, with a new html.parser property.
Currently TrecDocMaker is using the Demo html parser. With this new property this can be overriden.
2. allow to add external class path, so the bechmark can be used with modified makers/parsers without having to add code to Lucene.
Run benchmark with e.g. "ant run-task -Dbenchmark.ext.classpath=/myproj/myclasses"
3. allow to crawl a doc maker until exhausting all its files/docs once, without having to know in advance how many docs it can make.
This can be useful for instance if the input data is in zip files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-849) Configurable HTML Parser, external
classes to path, exhaustive doc maker
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen updated LUCENE-849:
-------------------------------
Description:
"doc making" enhancements:
1. Allow configurable html parser, with a new html.parser property.
Currently TrecDocMaker is using the Demo html parser. With this new property this can be overridden.
2. allow to add external class path, so the benchmark can be used with modified makers/parsers without having to add code to Lucene.
Run benchmark with e.g. "ant run-task -Dbenchmark.ext.classpath=/myproj/myclasses"
3. allow to crawl a doc maker until exhausting all its files/docs once, without having to know in advance how many docs it can make.
This can be useful for instance if the input data is in zip files.
was:
"doc making" enhancements:
1. Allow configurable html parser, with a new html.parser property.
Currently TrecDocMaker is using the Demo html parser. With this new property this can be overriden.
2. allow to add external class path, so the bechmark can be used with modified makers/parsers without having to add code to Lucene.
Run benchmark with e.g. "ant run-task -Dbenchmark.ext.classpath=/myproj/myclasses"
3. allow to crawl a doc maker until exhausting all its files/docs once, without having to know in advance how many docs it can make.
This can be useful for instance if the input data is in zip files.
Lucene Fields: [New, Patch Available] (was: [Patch Available, New])
Summary: Configurable HTML Parser, external classes to path, exhaustive doc maker (was: Configurable HTML Parser, external classes to path, exhasutive doc maker)
> Configurable HTML Parser, external classes to path, exhaustive doc maker
> ------------------------------------------------------------------------
>
> Key: LUCENE-849
> URL: https://issues.apache.org/jira/browse/LUCENE-849
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/benchmark
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Priority: Minor
>
> "doc making" enhancements:
> 1. Allow configurable html parser, with a new html.parser property.
> Currently TrecDocMaker is using the Demo html parser. With this new property this can be overridden.
> 2. allow to add external class path, so the benchmark can be used with modified makers/parsers without having to add code to Lucene.
> Run benchmark with e.g. "ant run-task -Dbenchmark.ext.classpath=/myproj/myclasses"
> 3. allow to crawl a doc maker until exhausting all its files/docs once, without having to know in advance how many docs it can make.
> This can be useful for instance if the input data is in zip files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-849) Configurable HTML Parser, external
classes to path, exhaustive doc maker
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen updated LUCENE-849:
-------------------------------
Attachment: 849-bench-parse-exhaust.patch
> Configurable HTML Parser, external classes to path, exhaustive doc maker
> ------------------------------------------------------------------------
>
> Key: LUCENE-849
> URL: https://issues.apache.org/jira/browse/LUCENE-849
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/benchmark
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Priority: Minor
> Attachments: 849-bench-parse-exhaust.patch
>
>
> "doc making" enhancements:
> 1. Allow configurable html parser, with a new html.parser property.
> Currently TrecDocMaker is using the Demo html parser. With this new property this can be overridden.
> 2. allow to add external class path, so the benchmark can be used with modified makers/parsers without having to add code to Lucene.
> Run benchmark with e.g. "ant run-task -Dbenchmark.ext.classpath=/myproj/myclasses"
> 3. allow to crawl a doc maker until exhausting all its files/docs once, without having to know in advance how many docs it can make.
> This can be useful for instance if the input data is in zip files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Resolved: (LUCENE-849) contrib/benchmark: configurable HTML
Parser, external classes to path, exhaustive doc maker
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen resolved LUCENE-849.
--------------------------------
Resolution: Fixed
Lucene Fields: [Patch Available] (was: [Patch Available, New])
Committed.
> contrib/benchmark: configurable HTML Parser, external classes to path, exhaustive doc maker
> --------------------------------------------------------------------------------------------
>
> Key: LUCENE-849
> URL: https://issues.apache.org/jira/browse/LUCENE-849
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/benchmark
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Priority: Minor
> Attachments: 849-bench-parse-exhaust.patch
>
>
> "doc making" enhancements:
> 1. Allow configurable html parser, with a new html.parser property.
> Currently TrecDocMaker is using the Demo html parser. With this new property this can be overridden.
> 2. allow to add external class path, so the benchmark can be used with modified makers/parsers without having to add code to Lucene.
> Run benchmark with e.g. "ant run-task -Dbenchmark.ext.classpath=/myproj/myclasses"
> 3. allow to crawl a doc maker until exhausting all its files/docs once, without having to know in advance how many docs it can make.
> This can be useful for instance if the input data is in zip files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-849) contrib/benchmark: configurable HTML
Parser, external classes to path, exhaustive doc maker
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen updated LUCENE-849:
-------------------------------
Lucene Fields: [New, Patch Available] (was: [Patch Available, New])
Summary: contrib/benchmark: configurable HTML Parser, external classes to path, exhaustive doc maker (was: Configurable HTML Parser, external classes to path, exhaustive doc maker)
> contrib/benchmark: configurable HTML Parser, external classes to path, exhaustive doc maker
> --------------------------------------------------------------------------------------------
>
> Key: LUCENE-849
> URL: https://issues.apache.org/jira/browse/LUCENE-849
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/benchmark
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Priority: Minor
> Attachments: 849-bench-parse-exhaust.patch
>
>
> "doc making" enhancements:
> 1. Allow configurable html parser, with a new html.parser property.
> Currently TrecDocMaker is using the Demo html parser. With this new property this can be overridden.
> 2. allow to add external class path, so the benchmark can be used with modified makers/parsers without having to add code to Lucene.
> Run benchmark with e.g. "ant run-task -Dbenchmark.ext.classpath=/myproj/myclasses"
> 3. allow to crawl a doc maker until exhausting all its files/docs once, without having to know in advance how many docs it can make.
> This can be useful for instance if the input data is in zip files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org