You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Gal Nitzan (JIRA)" <ji...@apache.org> on 2006/01/24 23:05:10 UTC
[jira] Created: (NUTCH-186) mapred-default.xml is over ridden by
nutch-site.xml
mapred-default.xml is over ridden by nutch-site.xml
---------------------------------------------------
Key: NUTCH-186
URL: http://issues.apache.org/jira/browse/NUTCH-186
Project: Nutch
Type: Bug
Versions: 0.8-dev
Environment: All
Reporter: Gal Nitzan
Priority: Minor
If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
I will be happy to supply a patch for that if the proposition accepted.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-186) mapred-default.xml is over ridden by
nutch-site.xml
Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-186?page=all ]
Gal Nitzan updated NUTCH-186:
-----------------------------
Attachment: myBeautifulPatch.patch
ok. fixed and tested patch.
> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
> Key: NUTCH-186
> URL: http://issues.apache.org/jira/browse/NUTCH-186
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Environment: All
> Reporter: Gal Nitzan
> Priority: Minor
> Attachments: myBeautifulPatch.patch, myBeautifulPatch.patch
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that if the proposition accepted.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by
nutch-site.xml
Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363901 ]
Gal Nitzan commented on NUTCH-186:
----------------------------------
I do agree with you Andrzej. following the same convention is almost a must.
I will start working on a patch.
> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
> Key: NUTCH-186
> URL: http://issues.apache.org/jira/browse/NUTCH-186
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Environment: All
> Reporter: Gal Nitzan
> Priority: Minor
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that if the proposition accepted.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by
nutch-site.xml
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363998 ]
Doug Cutting commented on NUTCH-186:
------------------------------------
The config rules at present are:
1. All user-settable values should be in nutch-default.xml, as documentation that they exist. Any other config will override this. This file should not be altered by users.
2. nutch-site.xml is always loaded last, overriding all other options. This is empty by default.
mapred-default.xml was added specifically to permit the specification of things that a job can override.
I think the fix that's needed here is documentation. The documentation for these parameters should perhaps caution against putting them in nutch-site.xml, and point folks towards mapred-default.xml.
We might eventually move to a more complex configuration, where we break things into modules, each with three parts: base, default, final. So there could be a mapred-base.xml that listed all of the settable mapred parameters. Then the overridable defauld value could be set in mapred-default.xml. And non-overrideable values (e.g., the jobtracker host) could be specified in mapred-final.
> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
> Key: NUTCH-186
> URL: http://issues.apache.org/jira/browse/NUTCH-186
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Environment: All
> Reporter: Gal Nitzan
> Priority: Minor
> Attachments: myBeautifulPatch.patch
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that if the proposition accepted.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by
nutch-site.xml
Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363903 ]
Gal Nitzan commented on NUTCH-186:
----------------------------------
ok, JobConf extends NutchConf and in the (JobConf) constructor it adds the mapred-default.xml resource.
the call to add resource in NutchConf actually inserts any resource file before the nutch-site.xml so there is no way to override it. look at the code at the bottom.
the only thing required is to change line 85 in NutchConf to be:
resourceNames.add(name); // add resouce name
instead of
resourceNames.add(resourceNames.size()-1, name); // add second to last
and add one more line to JobConf constructor
addConfResource("mapred-site.xml");
This way nutch-site.xml overides nutch-default.xml but other added resources can override nutch-site.xml which in my opinion is reasonable.
If acceptable I will create the patch.
--------------------------------- current code in ButchConf.Java -------------------------------------
public synchronized void addConfResource(File file) {
addConfResourceInternal(file);
}
private synchronized void addConfResourceInternal(Object name) {
resourceNames.add(resourceNames.size()-1, name); // add second to last
properties = null; // trigger reload
}
> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
> Key: NUTCH-186
> URL: http://issues.apache.org/jira/browse/NUTCH-186
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Environment: All
> Reporter: Gal Nitzan
> Priority: Minor
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that if the proposition accepted.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by
nutch-site.xml
Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363932 ]
Gal Nitzan commented on NUTCH-186:
----------------------------------
Sorry, I was too eager... Allow me to ivestigate a little further.
Ignore the patch I will submit a new one
> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
> Key: NUTCH-186
> URL: http://issues.apache.org/jira/browse/NUTCH-186
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Environment: All
> Reporter: Gal Nitzan
> Priority: Minor
> Attachments: myBeautifulPatch.patch
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that if the proposition accepted.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by
nutch-site.xml
Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12363890 ]
Andrzej Bialecki commented on NUTCH-186:
-----------------------------------------
I agree. A patch would be welcome.
I wonder whether it's a good idea to follow the pattern of nutch-default/nutch-site and use a pair of mapred-default/mapred-site.xml ... It would be more understandable for users.
> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
> Key: NUTCH-186
> URL: http://issues.apache.org/jira/browse/NUTCH-186
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Environment: All
> Reporter: Gal Nitzan
> Priority: Minor
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that if the proposition accepted.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by
nutch-site.xml
Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-186?page=comments#action_12364010 ]
Gal Nitzan commented on NUTCH-186:
----------------------------------
After reading the code and I think I figured it... :)
The issue of the mapred-default.xml is totaly misleading.
Actualy : mapred.map.tasks and mapred.reduce.tasks properties does not have any effect when placed in mapred-default.xml (unless JobConf needs it which I didnĀ“t check) because this file is loaded only when JobConf is constructed.
But tasktracker is looking for these properties in nutch-site and not in mapred-default.
If these properties does not exists in nutch-site.xm with the correct values for your system, these values will be picked from nutch-defaul.xml.
Further, I am not sure that nutch-site.xml "overiding" everything should be the correct behavior. Most users knows that nutch-site.xml overides nutch-default but I think we should leave it up to them the option to override nutch-site and it will be a good start into breaking configuration to parts (ndfs and mapred are going to be seperated from nutch)...
Gal
> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
> Key: NUTCH-186
> URL: http://issues.apache.org/jira/browse/NUTCH-186
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Environment: All
> Reporter: Gal Nitzan
> Priority: Minor
> Attachments: myBeautifulPatch.patch, myBeautifulPatch.patch
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that if the proposition accepted.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-186) mapred-default.xml is over ridden by
nutch-site.xml
Posted by "Gal Nitzan (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-186?page=all ]
Gal Nitzan updated NUTCH-186:
-----------------------------
Attachment: myBeautifulPatch.patch
the patch attached
> mapred-default.xml is over ridden by nutch-site.xml
> ---------------------------------------------------
>
> Key: NUTCH-186
> URL: http://issues.apache.org/jira/browse/NUTCH-186
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Environment: All
> Reporter: Gal Nitzan
> Priority: Minor
> Attachments: myBeautifulPatch.patch
>
> If mapred.map.tasks and mapred.reduce.tasks are defined in nutch-site.xml and also in mapred-default.xml the definitions from nutch-site.xml are those that will take effect.
> So if a user mistakenly copies those entries into nutch-site.xml from the nutch-default.xml she will not understand what happens.
> I would like to propose removing these setting completely from the nutch-default.xml and put it only in mapred-default.xml where it belongs.
> I will be happy to supply a patch for that if the proposition accepted.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira