You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/06/14 01:35:42 UTC
[jira] [Created] (NUTCH-1392) -force and -resume arguments being
ignored in ParserJob
Lewis John McGibbney created NUTCH-1392:
-------------------------------------------
Summary: -force and -resume arguments being ignored in ParserJob
Key: NUTCH-1392
URL: https://issues.apache.org/jira/browse/NUTCH-1392
Project: Nutch
Issue Type: Bug
Components: parser
Affects Versions: nutchgora
Reporter: Lewis John McGibbney
Fix For: 2.1
>From the log below there is obviously something not right here as both -resume and -force are passed to the CLI but blatantly ignored within the log output.
lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse
Usage: ParserJob (<batchId> | -all) [-crawlId <id>] [-resume] [-force]
<batchId> - symbolic batch ID created by Generator
-crawlId <id> - the id to prefix the schemas to operate on,
(default: storage.crawl.id)
-all - consider pages from all crawl jobs
-resume - resume a previous incomplete job
-force - force re-parsing even if a page is already parsed
lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse -all -resume -force
ParserJob: starting
ParserJob: resuming: false
ParserJob: forced reparse: false
ParserJob: parsing all
Parsing http://www.trancearoundtheworld.com/
ParserJob: success
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1392) -force and -resume arguments being
ignored in ParserJob
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved NUTCH-1392.
-----------------------------------------
Resolution: Fixed
Fix Version/s: (was: 2.1)
nutchgora
Committed @revision 1350213 in Nutchgora branch
> -force and -resume arguments being ignored in ParserJob
> -------------------------------------------------------
>
> Key: NUTCH-1392
> URL: https://issues.apache.org/jira/browse/NUTCH-1392
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
> Attachments: NUTCH-1392.patch
>
>
> From the log below there is obviously something not right here as both -resume and -force are passed to the CLI but blatantly ignored within the log output.
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse
> Usage: ParserJob (<batchId> | -all) [-crawlId <id>] [-resume] [-force]
> <batchId> - symbolic batch ID created by Generator
> -crawlId <id> - the id to prefix the schemas to operate on,
> (default: storage.crawl.id)
> -all - consider pages from all crawl jobs
> -resume - resume a previous incomplete job
> -force - force re-parsing even if a page is already parsed
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse -all -resume -force
> ParserJob: starting
> ParserJob: resuming: false
> ParserJob: forced reparse: false
> ParserJob: parsing all
> Parsing http://www.trancearoundtheworld.com/
> ParserJob: success
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1392) -force and -resume arguments being
ignored in ParserJob
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294730#comment-13294730 ]
Lewis John McGibbney commented on NUTCH-1392:
---------------------------------------------
Additionally this issue should remove the -parse argument from FetcherJob. This is trivial.
> -force and -resume arguments being ignored in ParserJob
> -------------------------------------------------------
>
> Key: NUTCH-1392
> URL: https://issues.apache.org/jira/browse/NUTCH-1392
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: 2.1
>
>
> From the log below there is obviously something not right here as both -resume and -force are passed to the CLI but blatantly ignored within the log output.
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse
> Usage: ParserJob (<batchId> | -all) [-crawlId <id>] [-resume] [-force]
> <batchId> - symbolic batch ID created by Generator
> -crawlId <id> - the id to prefix the schemas to operate on,
> (default: storage.crawl.id)
> -all - consider pages from all crawl jobs
> -resume - resume a previous incomplete job
> -force - force re-parsing even if a page is already parsed
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse -all -resume -force
> ParserJob: starting
> ParserJob: resuming: false
> ParserJob: forced reparse: false
> ParserJob: parsing all
> Parsing http://www.trancearoundtheworld.com/
> ParserJob: success
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1392) -force and -resume arguments being
ignored in ParserJob
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1392:
--------------------------------
Attachment: NUTCH-1392.patch
> -force and -resume arguments being ignored in ParserJob
> -------------------------------------------------------
>
> Key: NUTCH-1392
> URL: https://issues.apache.org/jira/browse/NUTCH-1392
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: 2.1
>
> Attachments: NUTCH-1392.patch
>
>
> From the log below there is obviously something not right here as both -resume and -force are passed to the CLI but blatantly ignored within the log output.
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse
> Usage: ParserJob (<batchId> | -all) [-crawlId <id>] [-resume] [-force]
> <batchId> - symbolic batch ID created by Generator
> -crawlId <id> - the id to prefix the schemas to operate on,
> (default: storage.crawl.id)
> -all - consider pages from all crawl jobs
> -resume - resume a previous incomplete job
> -force - force re-parsing even if a page is already parsed
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse -all -resume -force
> ParserJob: starting
> ParserJob: resuming: false
> ParserJob: forced reparse: false
> ParserJob: parsing all
> Parsing http://www.trancearoundtheworld.com/
> ParserJob: success
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1392) -force and -resume arguments being
ignored in ParserJob
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295767#comment-13295767 ]
Hudson commented on NUTCH-1392:
-------------------------------
Integrated in Nutch-nutchgora #281 (See [https://builds.apache.org/job/Nutch-nutchgora/281/])
-force and -resume arguments being ignored in ParserJob NUTCH-1392 (Revision 1350213)
Result = SUCCESS
lewismc :
Files :
* /nutch/branches/nutchgora/CHANGES.txt
* /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/parse/ParserJob.java
> -force and -resume arguments being ignored in ParserJob
> -------------------------------------------------------
>
> Key: NUTCH-1392
> URL: https://issues.apache.org/jira/browse/NUTCH-1392
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
> Attachments: NUTCH-1392.patch
>
>
> From the log below there is obviously something not right here as both -resume and -force are passed to the CLI but blatantly ignored within the log output.
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse
> Usage: ParserJob (<batchId> | -all) [-crawlId <id>] [-resume] [-force]
> <batchId> - symbolic batch ID created by Generator
> -crawlId <id> - the id to prefix the schemas to operate on,
> (default: storage.crawl.id)
> -all - consider pages from all crawl jobs
> -resume - resume a previous incomplete job
> -force - force re-parsing even if a page is already parsed
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse -all -resume -force
> ParserJob: starting
> ParserJob: resuming: false
> ParserJob: forced reparse: false
> ParserJob: parsing all
> Parsing http://www.trancearoundtheworld.com/
> ParserJob: success
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1392) -force and -resume arguments being
ignored in ParserJob
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294994#comment-13294994 ]
Lewis John McGibbney commented on NUTCH-1392:
---------------------------------------------
Thanks Ferdy for the lightening turnaround on this one.
> -force and -resume arguments being ignored in ParserJob
> -------------------------------------------------------
>
> Key: NUTCH-1392
> URL: https://issues.apache.org/jira/browse/NUTCH-1392
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
> Attachments: NUTCH-1392.patch
>
>
> From the log below there is obviously something not right here as both -resume and -force are passed to the CLI but blatantly ignored within the log output.
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse
> Usage: ParserJob (<batchId> | -all) [-crawlId <id>] [-resume] [-force]
> <batchId> - symbolic batch ID created by Generator
> -crawlId <id> - the id to prefix the schemas to operate on,
> (default: storage.crawl.id)
> -all - consider pages from all crawl jobs
> -resume - resume a previous incomplete job
> -force - force re-parsing even if a page is already parsed
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch parse -all -resume -force
> ParserJob: starting
> ParserJob: resuming: false
> ParserJob: forced reparse: false
> ParserJob: parsing all
> Parsing http://www.trancearoundtheworld.com/
> ParserJob: success
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira