You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2005/12/19 00:19:01 UTC

[bug] overwriting job properties until runtime is not possible

Hi,
until writing theses Test that mades the generation bug reproducable  
I discovered another strange behavior.
Following test fail:

	public void testConf() throws Exception {
		NutchConf conf = NutchConf.get();
		conf.setInt("mapred.reduce.tasks", 2);
		JobConf jobConf = new JobConf(conf);
		assertEquals(2, jobConf.getInt("mapred.reduce.tasks", 25));
	}

What happen is that the JobConf calles addConfResource("mapred- 
default.xml"). Sure that make sense but the way the new Resource is  
loaded is really strange - from my point of view.
Instead reading the configuration file and add or overwriting the  
method, the properties object is setted to null and the file is added  
to a list of files that need to be loaded.
Until next get call the all configuration files are reloaded.
That means anytime a new Configuration resource is added all  
configuration file will be reloaded and more important setted values  
will be deleted or overwritten by default values as soon a  
configuration resouce is added.
This happens for example in the JobConf.
So on the one hand side all classes that implementing  
NutchConfigurable should be configurable but in reality since all  
this   of such classes create a own new JobConf they are not  
configurable at all.

My suggestion is that we change NutchConf is following way:

changing the  private synchronized void addConfResourceInternal 
(Object name)
from

resourceNames.add(resourceNames.size()-1, name); // add second to last
properties = null;                            // trigger reload

to:

resourceNames.add(resourceNames.size()-1, name); // add second to last
loadResource(properties, name, false);


Any comments?
Should I contribute a patch for this one line edit?

Stefan

P.S. BTW, this bug makes unit testing distributed map reduce  
impossible..


Re: [bug] overwriting job properties until runtime is not possible

Posted by Stefan Groschupf <sg...@media-style.com>.
Hi Paul,

wouldn't it a better and may easier solution to have an arraylist for  
all values of keys and just add the values to the arraylist.
Than we can have a getProperty method that return the first value in  
the list and a getProperties that return an array? This could be very  
similar to the ContentProperties mechanism.
I think using an array list is may easier than using properties that  
are hosted in properties.

Stefan



Am 21.12.2005 um 01:36 schrieb Paul Baclace:

> Stefan Groschupf wrote:
> > My suggestion is that we change NutchConf is following way:
> >
> > resourceNames.add(resourceNames.size()-1, name); // add second to  
> last
> > loadResource(properties, name, false);
>
> This would make property settings in the new resource (name, in the  
> above)
> override explicitly set() properties and the final site settings
> could be overridden by a lower priority config file.
>
> In TestNDFS, I relied on explicitly set() properties to make the
> test *independent* of the conf files (even to the point of overriding
> nutch-site.xml).
>
> I recommend filing a bug and I will then submit a patch.
>
> Details:
>
> The problem here is that NutchConf puts all the attribute-value pairs
> in a single Properties instance.  It should use the existing mechanism
> of Properties which can chain together as an implicit linked list  
> of defaults.
>
> In the most straightforward approach, each resource should get an
> instance of Properties and a separate, highest priority, Properties
> instance should be used to hold explicitly set value.
>
> In Java:
>
>     Properties props, deep, deeper, deepest;
>     props = new Properties(deep = new Properties(deeper = new  
> Properties(deepest = new Properties())));
>
> and
>
>     NutchConf.set(a,v) would simply use props.setProperty(a,v)
>
>     loadResource(deep, "nutch-site.xml", false);
>     loadResource(deeper, name, false);
>     loadResource(deepest, "nutch-default.xml", false);
>
> The actual implementation would need to deal with a variable number of
> Properties, not just 3.
>
>
> Paul
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net



Re: [bug] overwriting job properties until runtime is not possible

Posted by Paul Baclace <pe...@baclace.net>.
Stefan Groschupf wrote:
 > My suggestion is that we change NutchConf is following way:
 >
 > resourceNames.add(resourceNames.size()-1, name); // add second to last
 > loadResource(properties, name, false);

This would make property settings in the new resource (name, in the above)
override explicitly set() properties and the final site settings
could be overridden by a lower priority config file.

In TestNDFS, I relied on explicitly set() properties to make the
test *independent* of the conf files (even to the point of overriding
nutch-site.xml).

I recommend filing a bug and I will then submit a patch.

Details:

The problem here is that NutchConf puts all the attribute-value pairs
in a single Properties instance.  It should use the existing mechanism
of Properties which can chain together as an implicit linked list of defaults.

In the most straightforward approach, each resource should get an
instance of Properties and a separate, highest priority, Properties
instance should be used to hold explicitly set value.

In Java:

     Properties props, deep, deeper, deepest;
     props = new Properties(deep = new Properties(deeper = new Properties(deepest = new Properties())));

and

     NutchConf.set(a,v) would simply use props.setProperty(a,v)

     loadResource(deep, "nutch-site.xml", false);
     loadResource(deeper, name, false);
     loadResource(deepest, "nutch-default.xml", false);

The actual implementation would need to deal with a variable number of
Properties, not just 3.


Paul