You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Alan Tanaman (JIRA)" <ji...@apache.org> on 2006/12/27 14:57:20 UTC
[jira] Created: (NUTCH-421) Allow predeterminate running order of
index filters
Allow predeterminate running order of index filters
---------------------------------------------------
Key: NUTCH-421
URL: http://issues.apache.org/jira/browse/NUTCH-421
Project: Nutch
Issue Type: Improvement
Components: indexer
Affects Versions: 0.8.1
Environment: All
Reporter: Alan Tanaman
Priority: Minor
I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.
As suggested elsewhere, I based this on the urlfilter.order functionality:
<property>
<name>indexingfilter.order</name>
<value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
<description>The order by which index filters are applied.
If empty, all available index filters (as dictated by properties
plugin-includes and plugin-excludes above) are loaded and applied in system
defined order. If not empty, only named filters are loaded and applied
in given order. For example, if this property has value:
org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
Since all filters are AND'ed, filter ordering does not have impact
on end result, but it may have performance implication, depending
on relative expensiveness of filters.
</description>
</property>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-421) Allow predeterminate running order of
index filters
Posted by "Alan Tanaman (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-421?page=all ]
Alan Tanaman updated NUTCH-421:
-------------------------------
Description:
I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.
As suggested elsewhere, I based this on the urlfilter.order functionality:
<property>
<name>indexingfilter.order</name>
<value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
<description>The order by which index filters are applied.
If empty, all available index filters (as dictated by properties
plugin-includes and plugin-excludes above) are loaded and applied in system
defined order. If not empty, only named filters are loaded and applied
in given order. For example, if this property has value:
org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
Since all filters are AND'ed, filter ordering does not have impact
on end result, but it may have performance implication, depending
on relative expensiveness of filters.
</description>
</property>
was:
I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.
As suggested elsewhere, I based this on the urlfilter.order functionality:
<property>
<name>indexingfilter.order</name>
<value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
<description>The order by which index filters are applied.
If empty, all available index filters (as dictated by properties
plugin-includes and plugin-excludes above) are loaded and applied in system
defined order. If not empty, only named filters are loaded and applied
in given order. For example, if this property has value:
org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
Since all filters are AND'ed, filter ordering does not have impact
on end result, but it may have performance implication, depending
on relative expensiveness of filters.
</description>
</property>
Patch will be attached to this issue by 29/12/06
> Allow predeterminate running order of index filters
> ---------------------------------------------------
>
> Key: NUTCH-421
> URL: http://issues.apache.org/jira/browse/NUTCH-421
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 0.8.1
> Environment: All
> Reporter: Alan Tanaman
> Priority: Minor
> Attachments: nutch-421.patch
>
>
> I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
> indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.
> As suggested elsewhere, I based this on the urlfilter.order functionality:
> <property>
> <name>indexingfilter.order</name>
> <value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
> <description>The order by which index filters are applied.
> If empty, all available index filters (as dictated by properties
> plugin-includes and plugin-excludes above) are loaded and applied in system
> defined order. If not empty, only named filters are loaded and applied
> in given order. For example, if this property has value:
> org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
> then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
> Since all filters are AND'ed, filter ordering does not have impact
> on end result, but it may have performance implication, depending
> on relative expensiveness of filters.
> </description>
> </property>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-421) Allow predeterminate running order of
index filters
Posted by "Alan Tanaman (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-421?page=all ]
Alan Tanaman updated NUTCH-421:
-------------------------------
Attachment: nutch-421.patch
> Allow predeterminate running order of index filters
> ---------------------------------------------------
>
> Key: NUTCH-421
> URL: http://issues.apache.org/jira/browse/NUTCH-421
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 0.8.1
> Environment: All
> Reporter: Alan Tanaman
> Priority: Minor
> Attachments: nutch-421.patch
>
>
> I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
> indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.
> As suggested elsewhere, I based this on the urlfilter.order functionality:
> <property>
> <name>indexingfilter.order</name>
> <value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
> <description>The order by which index filters are applied.
> If empty, all available index filters (as dictated by properties
> plugin-includes and plugin-excludes above) are loaded and applied in system
> defined order. If not empty, only named filters are loaded and applied
> in given order. For example, if this property has value:
> org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
> then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
> Since all filters are AND'ed, filter ordering does not have impact
> on end result, but it may have performance implication, depending
> on relative expensiveness of filters.
> </description>
> </property>
> Patch will be attached to this issue by 29/12/06
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-421) Allow predeterminate running order of
index filters
Posted by "Alan Tanaman (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-421?page=all ]
Alan Tanaman updated NUTCH-421:
-------------------------------
Description:
I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.
As suggested elsewhere, I based this on the urlfilter.order functionality:
<property>
<name>indexingfilter.order</name>
<value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
<description>The order by which index filters are applied.
If empty, all available index filters (as dictated by properties
plugin-includes and plugin-excludes above) are loaded and applied in system
defined order. If not empty, only named filters are loaded and applied
in given order. For example, if this property has value:
org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
Since all filters are AND'ed, filter ordering does not have impact
on end result, but it may have performance implication, depending
on relative expensiveness of filters.
</description>
</property>
Patch will be attached to this issue by 29/12/06
was:
I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.
As suggested elsewhere, I based this on the urlfilter.order functionality:
<property>
<name>indexingfilter.order</name>
<value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
<description>The order by which index filters are applied.
If empty, all available index filters (as dictated by properties
plugin-includes and plugin-excludes above) are loaded and applied in system
defined order. If not empty, only named filters are loaded and applied
in given order. For example, if this property has value:
org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
Since all filters are AND'ed, filter ordering does not have impact
on end result, but it may have performance implication, depending
on relative expensiveness of filters.
</description>
</property>
> Allow predeterminate running order of index filters
> ---------------------------------------------------
>
> Key: NUTCH-421
> URL: http://issues.apache.org/jira/browse/NUTCH-421
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 0.8.1
> Environment: All
> Reporter: Alan Tanaman
> Priority: Minor
>
> I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
> indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.
> As suggested elsewhere, I based this on the urlfilter.order functionality:
> <property>
> <name>indexingfilter.order</name>
> <value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
> <description>The order by which index filters are applied.
> If empty, all available index filters (as dictated by properties
> plugin-includes and plugin-excludes above) are loaded and applied in system
> defined order. If not empty, only named filters are loaded and applied
> in given order. For example, if this property has value:
> org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
> then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
> Since all filters are AND'ed, filter ordering does not have impact
> on end result, but it may have performance implication, depending
> on relative expensiveness of filters.
> </description>
> </property>
> Patch will be attached to this issue by 29/12/06
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (NUTCH-421) Allow predeterminate running order of
index filters
Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sami Siren reassigned NUTCH-421:
--------------------------------
Assignee: Sami Siren
> Allow predeterminate running order of index filters
> ---------------------------------------------------
>
> Key: NUTCH-421
> URL: https://issues.apache.org/jira/browse/NUTCH-421
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 0.8.1
> Environment: All
> Reporter: Alan Tanaman
> Assigned To: Sami Siren
> Priority: Minor
> Attachments: nutch-421.patch
>
>
> I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
> indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.
> As suggested elsewhere, I based this on the urlfilter.order functionality:
> <property>
> <name>indexingfilter.order</name>
> <value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
> <description>The order by which index filters are applied.
> If empty, all available index filters (as dictated by properties
> plugin-includes and plugin-excludes above) are loaded and applied in system
> defined order. If not empty, only named filters are loaded and applied
> in given order. For example, if this property has value:
> org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
> then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
> Since all filters are AND'ed, filter ordering does not have impact
> on end result, but it may have performance implication, depending
> on relative expensiveness of filters.
> </description>
> </property>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (NUTCH-421) Allow predeterminate running order of
index filters
Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sami Siren resolved NUTCH-421.
------------------------------
Resolution: Fixed
Fix Version/s: 0.9.0
Thanks Alan,
I just committed this with additionali junit test and a fix similar to NUTCH-325.
Identation in IndexingFilters is still screwed, I'll fix that on next pass.
Next step regarding filters could be combining the common features from INdexingFilters, URLFilters and friends to a common super class.
> Allow predeterminate running order of index filters
> ---------------------------------------------------
>
> Key: NUTCH-421
> URL: https://issues.apache.org/jira/browse/NUTCH-421
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 0.8.1
> Environment: All
> Reporter: Alan Tanaman
> Assigned To: Sami Siren
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: nutch-421.patch
>
>
> I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
> indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.
> As suggested elsewhere, I based this on the urlfilter.order functionality:
> <property>
> <name>indexingfilter.order</name>
> <value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
> <description>The order by which index filters are applied.
> If empty, all available index filters (as dictated by properties
> plugin-includes and plugin-excludes above) are loaded and applied in system
> defined order. If not empty, only named filters are loaded and applied
> in given order. For example, if this property has value:
> org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
> then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
> Since all filters are AND'ed, filter ordering does not have impact
> on end result, but it may have performance implication, depending
> on relative expensiveness of filters.
> </description>
> </property>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira