You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 19:48:12 UTC

[jira] [Updated] (NUTCH-1452) hadoop.job.history.user.location in nutch-default making job history useless

     [ https://issues.apache.org/jira/browse/NUTCH-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1452:
----------------------------------------

    Fix Version/s: 2.2
                   1.7
    
> hadoop.job.history.user.location in nutch-default making job history useless
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1452
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1452
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Ferdy Galema
>             Fix For: 1.7, 2.2
>
>
> There is still a property in nutch-default 'hadoop.job.history.user.location' that redirects the creation of history files from job output locations to a custom location. I noticed that the current value does not work well with cloudera (I have tested cdh3u4), because ${hadoop.log.dir} is not defined. This actually causes the job in the jobtracker to show empty info. (With 'incomplete' job status). This is only when the job moves to retired. When it is still in 'completed', all is looking well.
> This property can be set to 'none', because the job history is ALSO stored in the central jobtracker location anyway. The 'hadoop.job.history.user.location' property specifies an extra location. But if it is set to an invalid value, it causes the central history location to NOT store it, so it seems. Please see for more details:
> http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html
> Besides setting it to 'none', another option is to set it to 'history' which does work with cdh. (This writes all logs to 'history' in the user directory in the configured filesystem, usually dfs). The final option is to simply remove this value and not meddle with hadoop properties at all. But that actually requires all jobs to correctly ignore these files. I am not up to date how well this currently works with Nutch jobs. This question is most relevant for trunk, since trunk heavily relies on the filesystem for jobs.
> What do you think?
> A) Set property to 'none'
> B) Set property to 'history'
> C) Remove property, see what happens, possibly fix jobs
> D) ?
> For now, I opt for A. But I think we need some more input with other distributions (for example official Hadoop 1.x) and also Nutch trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira