You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/06/14 14:29:42 UTC

[jira] [Created] (NUTCH-1395) Show batchId when skipping within ParserJob

Lewis John McGibbney created NUTCH-1395:
-------------------------------------------

             Summary: Show batchId when skipping within ParserJob
                 Key: NUTCH-1395
                 URL: https://issues.apache.org/jira/browse/NUTCH-1395
             Project: Nutch
          Issue Type: Bug
          Components: crawldb, parser
    Affects Versions: nutchgora
            Reporter: Lewis John McGibbney
            Priority: Minor
             Fix For: 2.1


Although the ParserJob CLI has been smartened up, logging still lets us down where we are only teased with the 'different batch id' for an url which is skipped.
{code}
Parsing http://www.trancearoundtheworld.com/tatw/399
Parsing http://www.trancearoundtheworld.com/index.php
Skipping http://www.aboveandbeyond.nu/music; different batch id
Parsing http://www.trancearoundtheworld.com/tatw/425
Parsing http://www.trancearoundtheworld.com/tatw/398
Parsing https://twitter.com/tatw
Parsing http://www.trancearoundtheworld.com/tatw/401
{code}

I would like to see
{code}
Skipping http://www.aboveandbeyond.nu/music; different batch id ($batchId)
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1395) Show batchId when skipping within ParserJob

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney resolved NUTCH-1395.
-----------------------------------------

    Resolution: Fixed
      Assignee: Lewis John McGibbney

Committed @revision 1379137 in 2.1-dev
                
> Show batchId when skipping within ParserJob
> -------------------------------------------
>
>                 Key: NUTCH-1395
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1395
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb, parser
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 2.1
>
>         Attachments: NUTCH-1395.patch
>
>
> Although the ParserJob CLI has been smartened up, logging still lets us down where we are only teased with the 'different batch id' for an url which is skipped.
> {code}
> Parsing http://www.trancearoundtheworld.com/tatw/399
> Parsing http://www.trancearoundtheworld.com/index.php
> Skipping http://www.aboveandbeyond.nu/music; different batch id
> Parsing http://www.trancearoundtheworld.com/tatw/425
> Parsing http://www.trancearoundtheworld.com/tatw/398
> Parsing https://twitter.com/tatw
> Parsing http://www.trancearoundtheworld.com/tatw/401
> {code}
> I would like to see
> {code}
> Skipping http://www.aboveandbeyond.nu/music; different batch id ($batchId)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (NUTCH-1395) Show batchId when skipping within ParserJob

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney closed NUTCH-1395.
---------------------------------------

    
> Show batchId when skipping within ParserJob
> -------------------------------------------
>
>                 Key: NUTCH-1395
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1395
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb, parser
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 2.1
>
>         Attachments: NUTCH-1395.patch
>
>
> Although the ParserJob CLI has been smartened up, logging still lets us down where we are only teased with the 'different batch id' for an url which is skipped.
> {code}
> Parsing http://www.trancearoundtheworld.com/tatw/399
> Parsing http://www.trancearoundtheworld.com/index.php
> Skipping http://www.aboveandbeyond.nu/music; different batch id
> Parsing http://www.trancearoundtheworld.com/tatw/425
> Parsing http://www.trancearoundtheworld.com/tatw/398
> Parsing https://twitter.com/tatw
> Parsing http://www.trancearoundtheworld.com/tatw/401
> {code}
> I would like to see
> {code}
> Skipping http://www.aboveandbeyond.nu/music; different batch id ($batchId)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1395) Show batchId when skipping within ParserJob

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1395:
----------------------------------------

    Attachment: NUTCH-1395.patch

trivial patch. In all honesty this should have been addressed in NUTCH-1349.
                
> Show batchId when skipping within ParserJob
> -------------------------------------------
>
>                 Key: NUTCH-1395
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1395
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb, parser
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 2.1
>
>         Attachments: NUTCH-1395.patch
>
>
> Although the ParserJob CLI has been smartened up, logging still lets us down where we are only teased with the 'different batch id' for an url which is skipped.
> {code}
> Parsing http://www.trancearoundtheworld.com/tatw/399
> Parsing http://www.trancearoundtheworld.com/index.php
> Skipping http://www.aboveandbeyond.nu/music; different batch id
> Parsing http://www.trancearoundtheworld.com/tatw/425
> Parsing http://www.trancearoundtheworld.com/tatw/398
> Parsing https://twitter.com/tatw
> Parsing http://www.trancearoundtheworld.com/tatw/401
> {code}
> I would like to see
> {code}
> Skipping http://www.aboveandbeyond.nu/music; different batch id ($batchId)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1395) Show batchId when skipping within ParserJob

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445632#comment-13445632 ] 

Hudson commented on NUTCH-1395:
-------------------------------

Integrated in Nutch-nutchgora #333 (See [https://builds.apache.org/job/Nutch-nutchgora/333/])
    NUTCH-1395 Show batchId when skipping within ParserJob (Revision 1379137)

     Result = SUCCESS
lewismc : 
Files : 
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/src/java/org/apache/nutch/parse/ParserJob.java

                
> Show batchId when skipping within ParserJob
> -------------------------------------------
>
>                 Key: NUTCH-1395
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1395
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb, parser
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 2.1
>
>         Attachments: NUTCH-1395.patch
>
>
> Although the ParserJob CLI has been smartened up, logging still lets us down where we are only teased with the 'different batch id' for an url which is skipped.
> {code}
> Parsing http://www.trancearoundtheworld.com/tatw/399
> Parsing http://www.trancearoundtheworld.com/index.php
> Skipping http://www.aboveandbeyond.nu/music; different batch id
> Parsing http://www.trancearoundtheworld.com/tatw/425
> Parsing http://www.trancearoundtheworld.com/tatw/398
> Parsing https://twitter.com/tatw
> Parsing http://www.trancearoundtheworld.com/tatw/401
> {code}
> I would like to see
> {code}
> Skipping http://www.aboveandbeyond.nu/music; different batch id ($batchId)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira