You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2010/09/08 12:42:33 UTC
[jira] Created: (NUTCH-901) Make index-more plug-in configurable
Make index-more plug-in configurable
--------------------------------------
Key: NUTCH-901
URL: https://issues.apache.org/jira/browse/NUTCH-901
Project: Nutch
Issue Type: Improvement
Components: indexer
Reporter: Markus Jelsma
Fix For: 1.2
In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (NUTCH-901) Make index-more plug-in configurable
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann reassigned NUTCH-901:
---------------------------------------
Assignee: Chris A. Mattmann
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Assignee: Chris A. Mattmann
> Fix For: 2.0
>
> Attachments: NUTCH-901-MarkusJelsma.998958.patch, NUTCH-901-trunk.998961.patch
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-901) Make index-more plug-in configurable
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated NUTCH-901:
------------------------------------
Fix Version/s: 1.2
- fix for 1.2 as well (sigh, this means *another* RC). Oh well, for the greater good!
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Assignee: Chris A. Mattmann
> Fix For: 1.2, 2.0
>
> Attachments: NUTCH-901-MarkusJelsma.998958.patch, NUTCH-901-trunk.998961.patch
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Work started: (NUTCH-901) Make index-more plug-in
configurable
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on NUTCH-901 started by Chris A. Mattmann.
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Assignee: Chris A. Mattmann
> Fix For: 2.0
>
> Attachments: NUTCH-901-MarkusJelsma.998958.patch, NUTCH-901-trunk.998961.patch
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-901) Make index-more plug-in configurable
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-901:
--------------------------------
Attachment: NUTCH-901-MarkusJelsma.998958.patch
Here's a patch for version 1.2. It includes a backward compatible setting in nutch-default.xml and handles the setting the the MoreIndexingFilter.java. It's tested and behaves as expected on my 1.2 up to date check out.
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Fix For: 2.0
>
> Attachments: NUTCH-901-MarkusJelsma.998958.patch
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (NUTCH-901) Make index-more plug-in configurable
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann resolved NUTCH-901.
-------------------------------------
Resolution: Fixed
- patch applied to trunk in r999181 and to branch-1.2 in r999200. Thanks so much Markus!
One nit: no unit tests. I've created one in the trunk (in r999203 and in r999204), and one in the branch-1.2 (in r999208).
I won't be applying *any more* patches to the Nutch 1.2 RC. Let's get this thing VOTEd into release-dom with RC #4.
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Assignee: Chris A. Mattmann
> Fix For: 1.2, 2.0
>
> Attachments: NUTCH-901-MarkusJelsma.998958.patch, NUTCH-901-trunk.998961.patch
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-901) Make index-more plug-in configurable
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-901:
--------------------------------
Attachment: NUTCH-901-trunk.998961.patch
Here's also a patch for 2.0 trunk. I could not test the code because i haven't managed to compile trunk as of yet.
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Fix For: 2.0
>
> Attachments: NUTCH-901-MarkusJelsma.998958.patch, NUTCH-901-trunk.998961.patch
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-901) Make index-more plug-in configurable
Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche updated NUTCH-901:
--------------------------------
Summary: Make index-more plug-in configurable (was: Make index-more plug-in configurable
)
Fix Version/s: 2.0
Affects Version/s: 1.2
2.0
Needs fixing in the trunk as well (v2.0)
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Fix For: 1.2, 2.0
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-901) Make index-more plug-in configurable
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925318#action_12925318 ]
Markus Jelsma commented on NUTCH-901:
-------------------------------------
Applied patch and added Mattmann's test to branch-1.3
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Assignee: Chris A. Mattmann
> Fix For: 1.2, 2.0
>
> Attachments: NUTCH-901-MarkusJelsma.998958.patch, NUTCH-901-trunk.998961.patch
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-901) Make index-more plug-in configurable
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated NUTCH-901:
------------------------------------
Fix Version/s: (was: 1.2)
Hi Guys: I don't have time to put together a patch for this, and I haven't seen anything produced yet. Let's push this off to 2.0. If someone gets me a patch in the next day or so, I'll try and squeeze it in, but for now, I'm pushing to 2.0.
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Fix For: 2.0
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (NUTCH-901) Make index-more plug-in
configurable
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912547#action_12912547 ]
Markus Jelsma edited comment on NUTCH-901 at 9/20/10 11:53 AM:
---------------------------------------------------------------
Here's a patch for version 1.2 (that's the NUTCH-901-MarkusJelsma.998958.patch file). It includes a backward compatible setting in nutch-default.xml and handles the setting the the MoreIndexingFilter.java. It's tested and behaves as expected on my 1.2 up to date check out.
was (Author: markus17):
Here's a patch for version 1.2. It includes a backward compatible setting in nutch-default.xml and handles the setting the the MoreIndexingFilter.java. It's tested and behaves as expected on my 1.2 up to date check out.
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Fix For: 2.0
>
> Attachments: NUTCH-901-MarkusJelsma.998958.patch, NUTCH-901-trunk.998961.patch
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization is something a Solr instance should take care of. Instead of removing the code (which would break compatibility for users that rely on it), we need a way to configure the plug-in not to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.