You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/05/09 16:11:49 UTC
[jira] [Created] (NUTCH-1363) Make parsing in FetcherJob actually
work.
Lewis John McGibbney created NUTCH-1363:
-------------------------------------------
Summary: Make parsing in FetcherJob actually work.
Key: NUTCH-1363
URL: https://issues.apache.org/jira/browse/NUTCH-1363
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: nutchgora
Reporter: Lewis John McGibbney
Fix For: nutchgora
We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually
work.
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271445#comment-13271445 ]
Ferdy Galema commented on NUTCH-1363:
-------------------------------------
Hey Lewis,
This does work, with the -Dfetcher.parse=true option. Note that the -parse is not supported anymore.
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
> Key: NUTCH-1363
> URL: https://issues.apache.org/jira/browse/NUTCH-1363
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually
work.
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272732#comment-13272732 ]
Markus Jelsma commented on NUTCH-1363:
--------------------------------------
Good work anyway :) I had the same confusing thoughts with the same issue on trunk.
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
> Key: NUTCH-1363
> URL: https://issues.apache.org/jira/browse/NUTCH-1363
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually
work.
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271510#comment-13271510 ]
Lewis John McGibbney commented on NUTCH-1363:
---------------------------------------------
Mmmm. OK. Maybe we should add a little comment to nutch-default.xml to say that this can only be overrideen at CLI by the option you specify or else within the Nutch-site.xml then by rebuilding the job jar (if in distributed mode).
wdyt?
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
> Key: NUTCH-1363
> URL: https://issues.apache.org/jira/browse/NUTCH-1363
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually
work.
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272633#comment-13272633 ]
Markus Jelsma commented on NUTCH-1363:
--------------------------------------
I'm fine with not having a -parse switch for FetcherJob. Not having it is not a big deal and advanced users that dare to parse in the fetcher can find the -Dparse=true option just as well.
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
> Key: NUTCH-1363
> URL: https://issues.apache.org/jira/browse/NUTCH-1363
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1363) Make parsing in FetcherJob actually
work.
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved NUTCH-1363.
-----------------------------------------
Resolution: Not A Problem
Yeah, you guys win :0)
Closing as this is not an issue. I was mixed up and now I'm not.
Thanks
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
> Key: NUTCH-1363
> URL: https://issues.apache.org/jira/browse/NUTCH-1363
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually
work.
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272150#comment-13272150 ]
Ferdy Galema commented on NUTCH-1363:
-------------------------------------
I'm not sure I follow. What makes this property different from all the other properties?
In general, properties defined in nutch-default can be overriden using nutch-site (in either distributed and local mode) and finally using generic Hadoop -Dkey=value command-line options. Additionally, tools are able to provide specific arguments. For exampe -threads 10 with the fetcher sets fetcher.threads.fetch to 10 in the configuration.
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
> Key: NUTCH-1363
> URL: https://issues.apache.org/jira/browse/NUTCH-1363
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually
work.
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272622#comment-13272622 ]
Lewis John McGibbney commented on NUTCH-1363:
---------------------------------------------
So just to summarize here... we are happy that we don't have a -parse switch in the fetcher? If this was a consideration and subsequent decision made a while back then I am happy with this. I just needed to get a clear picture of what the consensus is. If, as you mention, this issue simply raises something that was removed from Nutchgora (to be consistent with trunk), then I will close the issue accordingly. Thanks
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
> Key: NUTCH-1363
> URL: https://issues.apache.org/jira/browse/NUTCH-1363
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (NUTCH-1363) Make parsing in
FetcherJob actually work.
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271445#comment-13271445 ]
Ferdy Galema edited comment on NUTCH-1363 at 5/9/12 2:27 PM:
-------------------------------------------------------------
Hey Lewis,
This does work, with the -Dfetcher.parse=true option. Note that the -parse option is not supported anymore. (But it did the same thing).
was (Author: ferdy.g):
Hey Lewis,
This does work, with the -Dfetcher.parse=true option. Note that the -parse is not supported anymore.
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
> Key: NUTCH-1363
> URL: https://issues.apache.org/jira/browse/NUTCH-1363
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (NUTCH-1363) Make parsing in FetcherJob actually
work.
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney closed NUTCH-1363.
---------------------------------------
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
> Key: NUTCH-1363
> URL: https://issues.apache.org/jira/browse/NUTCH-1363
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira