You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/05/09 16:11:49 UTC

[jira] [Created] (NUTCH-1363) Make parsing in FetcherJob actually work.

Lewis John McGibbney created NUTCH-1363:
-------------------------------------------

             Summary: Make parsing in FetcherJob actually work.
                 Key: NUTCH-1363
                 URL: https://issues.apache.org/jira/browse/NUTCH-1363
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: nutchgora
            Reporter: Lewis John McGibbney
             Fix For: nutchgora


We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually work.

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271445#comment-13271445 ] 

Ferdy Galema commented on NUTCH-1363:
-------------------------------------

Hey Lewis,

This does work, with the -Dfetcher.parse=true option. Note that the -parse is not supported anymore.
                
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
>                 Key: NUTCH-1363
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1363
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>             Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually work.

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272732#comment-13272732 ] 

Markus Jelsma commented on NUTCH-1363:
--------------------------------------

Good work anyway :) I had the same confusing thoughts with the same issue on trunk.
                
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
>                 Key: NUTCH-1363
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1363
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>             Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually work.

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271510#comment-13271510 ] 

Lewis John McGibbney commented on NUTCH-1363:
---------------------------------------------

Mmmm. OK. Maybe we should add a little comment to nutch-default.xml to say that this can only be overrideen at CLI by the option you specify or else within the Nutch-site.xml then by rebuilding the job jar (if in distributed mode).

wdyt?
                
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
>                 Key: NUTCH-1363
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1363
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>             Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually work.

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272633#comment-13272633 ] 

Markus Jelsma commented on NUTCH-1363:
--------------------------------------

I'm fine with not having a -parse switch for FetcherJob. Not having it is not a big deal and advanced users that dare to parse in the fetcher can find the -Dparse=true option just as well.
                
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
>                 Key: NUTCH-1363
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1363
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>             Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1363) Make parsing in FetcherJob actually work.

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney resolved NUTCH-1363.
-----------------------------------------

    Resolution: Not A Problem

Yeah, you guys win :0)
Closing as this is not an issue. I was mixed up and now I'm not.
Thanks 
                
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
>                 Key: NUTCH-1363
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1363
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>             Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually work.

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272150#comment-13272150 ] 

Ferdy Galema commented on NUTCH-1363:
-------------------------------------

I'm not sure I follow. What makes this property different from all the other properties?

In general, properties defined in nutch-default can be overriden using nutch-site (in either distributed and local mode) and finally using generic Hadoop -Dkey=value command-line options. Additionally, tools are able to provide specific arguments. For exampe -threads 10 with the fetcher sets fetcher.threads.fetch to 10 in the configuration.
                
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
>                 Key: NUTCH-1363
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1363
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>             Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1363) Make parsing in FetcherJob actually work.

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272622#comment-13272622 ] 

Lewis John McGibbney commented on NUTCH-1363:
---------------------------------------------

So just to summarize here... we are happy that we don't have a -parse switch in the fetcher? If this was a consideration and subsequent decision made a while back then I am happy with this. I just needed to get a clear picture of what the consensus is. If, as you mention, this issue simply raises something that was removed from Nutchgora (to be consistent with trunk), then I will close the issue accordingly. Thanks
                
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
>                 Key: NUTCH-1363
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1363
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>             Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (NUTCH-1363) Make parsing in FetcherJob actually work.

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271445#comment-13271445 ] 

Ferdy Galema edited comment on NUTCH-1363 at 5/9/12 2:27 PM:
-------------------------------------------------------------

Hey Lewis,

This does work, with the -Dfetcher.parse=true option. Note that the -parse option is not supported anymore. (But it did the same thing).
                
      was (Author: ferdy.g):
    Hey Lewis,

This does work, with the -Dfetcher.parse=true option. Note that the -parse is not supported anymore.
                  
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
>                 Key: NUTCH-1363
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1363
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>             Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (NUTCH-1363) Make parsing in FetcherJob actually work.

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney closed NUTCH-1363.
---------------------------------------

    
> Make parsing in FetcherJob actually work.
> -----------------------------------------
>
>                 Key: NUTCH-1363
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1363
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>             Fix For: nutchgora
>
>
> We know that parsing during fetching is not recommended, however for those that wish to dive into the abyss the functionality should be available. This issue will address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira