You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/03/01 16:07:59 UTC

[jira] [Created] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

IndexingFiltersChecker to store detected content type in crawldatum metadata
----------------------------------------------------------------------------

                 Key: NUTCH-1293
                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
             Project: Nutch
          Issue Type: Bug
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma
            Priority: Minor


NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma resolved NUTCH-1293.
----------------------------------

    Resolution: Fixed
    
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1293:
---------------------------------

    Attachment: NUTCH-1293-1.5-1.patch

Wrong patch indeed :)
                
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma resolved NUTCH-1293.
----------------------------------

    Resolution: Fixed

Committed for 1.5 in rev. 1295614.
                
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1293:
---------------------------------

    Attachment: NUTCH-1293-1.5-1.patch

Patch for 1.5.

                
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Julien Nioche (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220095#comment-13220095 ] 

Julien Nioche commented on NUTCH-1293:
--------------------------------------

+1
                
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220665#comment-13220665 ] 

Hudson commented on NUTCH-1293:
-------------------------------

Integrated in Nutch-trunk #1774 (See [https://builds.apache.org/job/Nutch-trunk/1774/])
    NUTCH-1293 IndexingFiltersChecker to store detected content type in crawldatum metadata (Revision 1295614)

     Result = SUCCESS
markus : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1295614
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java

                
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Julien Nioche (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220087#comment-13220087 ] 

Julien Nioche commented on NUTCH-1293:
--------------------------------------

wrong patch?
                
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220122#comment-13220122 ] 

Hudson commented on NUTCH-1293:
-------------------------------

Integrated in nutch-trunk-maven #178 (See [https://builds.apache.org/job/nutch-trunk-maven/178/])
    NUTCH-1293 IndexingFiltersChecker to store detected content type in crawldatum metadata (Revision 1295614)

     Result = SUCCESS
markus : 
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java

                
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1293:
---------------------------------

    Attachment:     (was: NUTCH-1293-1.5-1.patch)
    
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Sebastian Nagel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263124#comment-13263124 ] 

Sebastian Nagel commented on NUTCH-1293:
----------------------------------------

The content type should be added to metadata after the check for content == null.

{noformat}
% nutch indexchecker file:/xxxx
fetching: file:/xxxx
org.apache.nutch.protocol.file.FileError: File Error: 404
   ...
Exception in thread "main" java.lang.NullPointerException at org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:71)
{noformat}
                
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma reopened NUTCH-1293:
----------------------------------

    
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267324#comment-13267324 ] 

Markus Jelsma commented on NUTCH-1293:
--------------------------------------

You're right. Please open a new issue as this is already part of 1.5.
                
> IndexingFiltersChecker to store detected content type in crawldatum metadata
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1293
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1293
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1293-1.5-1.patch
>
>
> NUTCH-1259 is not implemented in the checker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira