You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/03/01 16:07:59 UTC
[jira] [Created] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
IndexingFiltersChecker to store detected content type in crawldatum metadata
----------------------------------------------------------------------------
Key: NUTCH-1293
URL: https://issues.apache.org/jira/browse/NUTCH-1293
Project: Nutch
Issue Type: Bug
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma resolved NUTCH-1293.
----------------------------------
Resolution: Fixed
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1293:
---------------------------------
Attachment: NUTCH-1293-1.5-1.patch
Wrong patch indeed :)
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma resolved NUTCH-1293.
----------------------------------
Resolution: Fixed
Committed for 1.5 in rev. 1295614.
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1293:
---------------------------------
Attachment: NUTCH-1293-1.5-1.patch
Patch for 1.5.
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Julien Nioche (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220095#comment-13220095 ]
Julien Nioche commented on NUTCH-1293:
--------------------------------------
+1
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220665#comment-13220665 ]
Hudson commented on NUTCH-1293:
-------------------------------
Integrated in Nutch-trunk #1774 (See [https://builds.apache.org/job/Nutch-trunk/1774/])
NUTCH-1293 IndexingFiltersChecker to store detected content type in crawldatum metadata (Revision 1295614)
Result = SUCCESS
markus : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295614
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Julien Nioche (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220087#comment-13220087 ]
Julien Nioche commented on NUTCH-1293:
--------------------------------------
wrong patch?
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220122#comment-13220122 ]
Hudson commented on NUTCH-1293:
-------------------------------
Integrated in nutch-trunk-maven #178 (See [https://builds.apache.org/job/nutch-trunk-maven/178/])
NUTCH-1293 IndexingFiltersChecker to store detected content type in crawldatum metadata (Revision 1295614)
Result = SUCCESS
markus :
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1293:
---------------------------------
Attachment: (was: NUTCH-1293-1.5-1.patch)
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Sebastian Nagel (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263124#comment-13263124 ]
Sebastian Nagel commented on NUTCH-1293:
----------------------------------------
The content type should be added to metadata after the check for content == null.
{noformat}
% nutch indexchecker file:/xxxx
fetching: file:/xxxx
org.apache.nutch.protocol.file.FileError: File Error: 404
...
Exception in thread "main" java.lang.NullPointerException at org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:71)
{noformat}
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma reopened NUTCH-1293:
----------------------------------
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store
detected content type in crawldatum metadata
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267324#comment-13267324 ]
Markus Jelsma commented on NUTCH-1293:
--------------------------------------
You're right. Please open a new issue as this is already part of 1.5.
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1293
> URL: https://issues.apache.org/jira/browse/NUTCH-1293
> Project: Nutch
> Issue Type: Bug
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira