You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Gal Nitzan (JIRA)" <ji...@apache.org> on 2006/01/24 23:05:10 UTC

[jira] Created: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

mapred-default.xml is over ridden by nutch-site.xml
---------------------------------------------------

         Key: NUTCH-186
         URL: http://issues.apache.org/jira/browse/NUTCH-186
     Project: Nutch
        Type: Bug
    Versions: 0.8-dev    
 Environment: All
    Reporter: Gal Nitzan
    Priority: Minor


If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.

So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.

I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.

I will be happy to supply a patch for that  if the proposition accepted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-186?page=all ]

Gal Nitzan updated NUTCH-186:
-----------------------------

    Attachment: myBeautifulPatch.patch

ok. fixed and tested patch.


> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
>          Key: NUTCH-186
>          URL: http://issues.apache.org/jira/browse/NUTCH-186
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>  Environment: All
>     Reporter: Gal Nitzan
>     Priority: Minor
>  Attachments: myBeautifulPatch.patch, myBeautifulPatch.patch
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that  if the proposition accepted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363901 ] 

Gal Nitzan commented on NUTCH-186:
----------------------------------

I do agree with you Andrzej. following the same convention is almost a must.

I will start working on a patch.

> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
>          Key: NUTCH-186
>          URL: http://issues.apache.org/jira/browse/NUTCH-186
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>  Environment: All
>     Reporter: Gal Nitzan
>     Priority: Minor

>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that  if the proposition accepted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363998 ] 

Doug Cutting commented on NUTCH-186:
------------------------------------

The config rules at present are:

1. All user-settable values should be in nutch-default.xml, as documentation that they exist.  Any other config will override this.  This file should not be altered by users.

2. nutch-site.xml is always loaded last, overriding all other options.  This is empty by default.

mapred-default.xml was added specifically to permit the specification of things that a job can override.

I think the fix that's needed here is documentation.  The documentation for these parameters should perhaps caution against putting them in nutch-site.xml, and point folks towards mapred-default.xml.

We might eventually move to a more complex configuration, where we break things into modules, each with three parts: base, default, final.  So there could be a mapred-base.xml that listed all of the settable mapred parameters.  Then the overridable defauld value could be set in mapred-default.xml.  And non-overrideable values (e.g., the jobtracker host) could be specified in mapred-final.

> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
>          Key: NUTCH-186
>          URL: http://issues.apache.org/jira/browse/NUTCH-186
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>  Environment: All
>     Reporter: Gal Nitzan
>     Priority: Minor
>  Attachments: myBeautifulPatch.patch
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that  if the proposition accepted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363903 ] 

Gal Nitzan commented on NUTCH-186:
----------------------------------

ok, JobConf extends NutchConf and in the (JobConf) constructor it adds the mapred-default.xml resource.

the call to add resource in NutchConf actually inserts any resource file before the nutch-site.xml so there is no way to override it. look at the code at the bottom.

the only thing required is to change line 85 in NutchConf to be:

    resourceNames.add(name); // add resouce name

instead of

    resourceNames.add(resourceNames.size()-1, name); // add second to last

and add one more line to JobConf constructor

    addConfResource("mapred-site.xml");


This way nutch-site.xml overides nutch-default.xml but other added resources can override nutch-site.xml which in my opinion is reasonable.

If acceptable I will create the patch.


--------------------------------- current code in ButchConf.Java -------------------------------------
  public synchronized void addConfResource(File file) {
    addConfResourceInternal(file);
  }
  private synchronized void addConfResourceInternal(Object name) {
    resourceNames.add(resourceNames.size()-1, name); // add second to last
    properties = null;                            // trigger reload
  }


> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
>          Key: NUTCH-186
>          URL: http://issues.apache.org/jira/browse/NUTCH-186
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>  Environment: All
>     Reporter: Gal Nitzan
>     Priority: Minor

>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that  if the proposition accepted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363932 ] 

Gal Nitzan commented on NUTCH-186:
----------------------------------

Sorry, I was too eager... Allow me to ivestigate a little further.

Ignore the patch I will submit a new one

> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
>          Key: NUTCH-186
>          URL: http://issues.apache.org/jira/browse/NUTCH-186
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>  Environment: All
>     Reporter: Gal Nitzan
>     Priority: Minor
>  Attachments: myBeautifulPatch.patch
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that  if the proposition accepted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363890 ] 

Andrzej Bialecki  commented on NUTCH-186:
-----------------------------------------

I agree. A patch would be welcome.

I wonder whether it's a good idea to follow the pattern of nutch-default/nutch-site and use a pair of mapred-default/mapred-site.xml ... It would be more understandable for users.

> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
>          Key: NUTCH-186
>          URL: http://issues.apache.org/jira/browse/NUTCH-186
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>  Environment: All
>     Reporter: Gal Nitzan
>     Priority: Minor

>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that  if the proposition accepted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12364010 ] 

Gal Nitzan commented on NUTCH-186:
----------------------------------

After reading the code and I think I figured it... :)

The issue of the mapred-default.xml is totaly misleading.

Actualy : mapred.map.tasks and mapred.reduce.tasks properties does not have any effect when placed in mapred-default.xml (unless JobConf needs it which I didnĀ“t check) because this file is loaded only when JobConf is constructed.
But tasktracker is looking for these properties in nutch-site and not in mapred-default.

If these properties does not exists in nutch-site.xm with the correct values for your system, these values will be picked from nutch-defaul.xml.

Further, I am not sure that nutch-site.xml "overiding" everything should be the correct behavior. Most users knows that nutch-site.xml overides nutch-default but I think we should leave it up to them the option to override nutch-site and it  will be a good start into breaking configuration to parts (ndfs and mapred are going to be seperated from nutch)...

Gal

> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
>          Key: NUTCH-186
>          URL: http://issues.apache.org/jira/browse/NUTCH-186
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>  Environment: All
>     Reporter: Gal Nitzan
>     Priority: Minor
>  Attachments: myBeautifulPatch.patch, myBeautifulPatch.patch
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that  if the proposition accepted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-186?page=all ]

Gal Nitzan updated NUTCH-186:
-----------------------------

    Attachment: myBeautifulPatch.patch

the patch attached

> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
>          Key: NUTCH-186
>          URL: http://issues.apache.org/jira/browse/NUTCH-186
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>  Environment: All
>     Reporter: Gal Nitzan
>     Priority: Minor
>  Attachments: myBeautifulPatch.patch
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that  if the proposition accepted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira