You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Enzo Michelangeli <en...@gmail.com> on 2007/06/05 04:20:28 UTC

Loading mechnism of plugin classes and singleton objects

I have a question about the loading mechanism of plugin classes. I'm working 
with a custom URLFilter, and I need a singleton object loaded and 
initialized by the first instance of the URLFilter, and shared by other 
instances (e.g., instantiated by other threads). I was assuming that the 
URLFilter class was being loaded only once even when the filter is used by 
multiple threads, so I tried to use a static member variable of my URLFilter 
class to hold a reference to the object to be shared: but it appears that 
the supposed singleton, actually, isn't, because the method responsible for 
its instantiation finds the static field initialized to null. So: are 
URLFilter classes loaded multiple times by their classloader in Nutch? The 
wiki page at 
http://wiki.apache.org/nutch/WhichTechnicalConceptsAreBehindTheNutchPluginSystem 
seems to suggest otherwise:

    Until Nutch runtime, only one instance of such a plugin
    class is alive in the Java virtual machine.

(By the way, what does "Until Nutch runtime" mean here? Before Nutch 
runtime, no class whatsoever is supposed to be alive in the JVM, is it?)

Enzo 


Re: Loading mechnism of plugin classes and singleton objects

Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message ----- 
From: "Doğacan Güney" <do...@gmail.com>
Sent: Friday, June 08, 2007 11:25 PM

> On 6/8/07, Enzo Michelangeli <en...@gmail.com> wrote:
[...]
>> A more serious problem is that an implementation of equals() that returns
>> true if the two hashCodes are different violates the specifications of
>> Object.hashCode() :
>>
>> http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html#hashCode()
>> "If two objects are equal according to the equals(Object) method, then
>> calling the hashCode method on each of the two objects must produce the 
>> same
>> integer result."
>
> We can just update Configuration.hashCode to calculate hash by summing
> hashCode's of all key,value pairs. This should make it equal,
> shouldn't it?

Sure, in fact it would be much better. I didn't mention it because it 
affects Hadoop, about the innards of which I know very little, and I was 
concerned about unforeseen side-effects. In that spirit (think global, act 
local ;-) ), we could also subclass org.apache.hadoop.conf.Configuration 
only for use by methods of PluginRepository, and override its hashCode() 
method instead of touching the original class.

> I think so too. When a map task ends and another begins, there will be
> no strong references to the configuration object of the previous map
> task, so it may be garbage-collected. Nicolas Lichtmaier has a patch
> for this to change WeakHashMap to a form of LRU map.
>
> BTW, this problem has been discussed before (most recently at
> http://www.nabble.com/Plugins-initialized-all-the-time ). There even
> is an open issue for this - NUTCH-356. I would suggest that we move
> our discussion there so that we can all work on this together and fix
> this once and for all. I will update the issue with the most recent
> discussions.

OK, I'll subscribe to nutch-dev as well.

Cheers --

Enzo


Re: Loading mechnism of plugin classes and singleton objects

Posted by Doğacan Güney <do...@gmail.com>.
On 6/8/07, Enzo Michelangeli <en...@gmail.com> wrote:
> ----- Original Message -----
> From: "Doğacan Güney" <do...@gmail.com>
> To: <nu...@lucene.apache.org>
> Sent: Friday, June 08, 2007 8:27 PM
> Subject: Re: Loading mechnism of plugin classes and singleton objects
>
> [...]
> >> This is strange, because, as you can see below, the strings that
> >> make keys and values of conf appears unchanged. Perhaps we should
> >> override
> >> the equals() method in org.apache.hadoop.conf.Configuration (invoked by
> >> CACHE.get(), according to the specs of the java.util.Map interface), so
> >> that
> >> the hashCode()s of the keys get ignored, and conf1.equals(conf2) return
> >> true
> >> if and only if:
> >>
> >>  1. conf1.size() == conf2.size(),
> >>
> >>  2. for each key k1 of conf1 there is a key k2 in conf2 such as:
> >>   2.1 k1.equals(k2)
> >>   2.2 conf1.get(k1).equals(conf2.get(k2))
> >
> > This has been suggested before and I have to say I don't like this
> > one, because this means that each call to PluginRepository.get(conf)
> > will end up comparing all key value pairs, which, IMO, is
> > excessive(because if I am not mistaken, we don't need this when
> > running nutch in a distributed environment.). Unfortunately, this may
> > be the only way to fix this leak.
>
> Well, after all how often can this overhead ever happen? I logged just a few
> calls (admittedly, on short runs) but even thousands of occurrences would
> only consume a negligible amount of CPU time.

There are a couple of places where the overhead will cost us right
now. For example, current ParseSegment code around line 76:

      parseResult = new ParseUtil(getConf()).parse(content);

This is called for every record. So we will end up doing a lot of
comparisons. Obviously we can fix this, but I don't like the idea that
a get method causes overhead. Anyway, that is just my personal
preference, so if people are feeling very strongly about it,
"deep-comparison" is fine with me.

> A more serious problem is that an implementation of equals() that returns
> true if the two hashCodes are different violates the specifications of
> Object.hashCode() :
>
> http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html#hashCode()
> "If two objects are equal according to the equals(Object) method, then
> calling the hashCode method on each of the two objects must produce the same
> integer result."

We can just update Configuration.hashCode to calculate hash by summing
hashCode's of all key,value pairs. This should make it equal,
shouldn't it?

>
> > This is probably not a good idea, but here it goes: Perhaps, we can
> > change LocaJobRunner.Job.run method. First, it clones the JobConf
> > object (clonedConf).  It then runs a MapTask with the original
> > JobConf. Upon completion, it copies everything from clonedConf back to
> > original JobConf. This way, original JobConf's hashCode won't change,
> > so there should be no leak.
>
> What if we simply rework the PluginRepository.get() method, checking first
> if there is key "deeply" equal (i.e., with the same key/value pairs but
> possibly different hashCode) to conf is in CACHE, and in that case if the
> hashCode is different we use that key as parameter of the get() method, in
> place of conf? (This within a synchronized block, of course.) Probably the
> cleanest way do do that is to encapsulate this behavior in a
> "hashCodeObliviousGet()" method of a subclass of Hashtable, and use that
> subclass for CACHE. So, we will be "specs-ly correct" because we won't
> override common methods (get() or equals()) with implementations that are
> not compliant with the specs.
>
> Also: do you agree with me that we should avoid caches based on weak
> references (like the current WeakHashMap) if we want to be protected against
> multiple instances of plugins? I can't see a compelling reason for a
> garbage-collected CACHE, considering that the number of items stored in it
> can't grow that large (how many different configurations can users ever
> create?)

I think so too. When a map task ends and another begins, there will be
no strong references to the configuration object of the previous map
task, so it may be garbage-collected. Nicolas Lichtmaier has a patch
for this to change WeakHashMap to a form of LRU map.

BTW, this problem has been discussed before (most recently at
http://www.nabble.com/Plugins-initialized-all-the-time ). There even
is an open issue for this - NUTCH-356. I would suggest that we move
our discussion there so that we can all work on this together and fix
this once and for all. I will update the issue with the most recent
discussions.

>
> Enzo
>
>


-- 
Doğacan Güney

Re: Loading mechnism of plugin classes and singleton objects

Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message ----- 
From: "Doğacan Güney" <do...@gmail.com>
To: <nu...@lucene.apache.org>
Sent: Friday, June 08, 2007 8:27 PM
Subject: Re: Loading mechnism of plugin classes and singleton objects

[...]
>> This is strange, because, as you can see below, the strings that
>> make keys and values of conf appears unchanged. Perhaps we should
>> override
>> the equals() method in org.apache.hadoop.conf.Configuration (invoked by
>> CACHE.get(), according to the specs of the java.util.Map interface), so
>> that
>> the hashCode()s of the keys get ignored, and conf1.equals(conf2) return
>> true
>> if and only if:
>>
>>  1. conf1.size() == conf2.size(),
>>
>>  2. for each key k1 of conf1 there is a key k2 in conf2 such as:
>>   2.1 k1.equals(k2)
>>   2.2 conf1.get(k1).equals(conf2.get(k2))
>
> This has been suggested before and I have to say I don't like this
> one, because this means that each call to PluginRepository.get(conf)
> will end up comparing all key value pairs, which, IMO, is
> excessive(because if I am not mistaken, we don't need this when
> running nutch in a distributed environment.). Unfortunately, this may
> be the only way to fix this leak.

Well, after all how often can this overhead ever happen? I logged just a few
calls (admittedly, on short runs) but even thousands of occurrences would
only consume a negligible amount of CPU time.
A more serious problem is that an implementation of equals() that returns
true if the two hashCodes are different violates the specifications of
Object.hashCode() :

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html#hashCode()
"If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce the same
integer result."

> This is probably not a good idea, but here it goes: Perhaps, we can
> change LocaJobRunner.Job.run method. First, it clones the JobConf
> object (clonedConf).  It then runs a MapTask with the original
> JobConf. Upon completion, it copies everything from clonedConf back to
> original JobConf. This way, original JobConf's hashCode won't change,
> so there should be no leak.

What if we simply rework the PluginRepository.get() method, checking first
if there is key "deeply" equal (i.e., with the same key/value pairs but
possibly different hashCode) to conf is in CACHE, and in that case if the
hashCode is different we use that key as parameter of the get() method, in
place of conf? (This within a synchronized block, of course.) Probably the
cleanest way do do that is to encapsulate this behavior in a
"hashCodeObliviousGet()" method of a subclass of Hashtable, and use that
subclass for CACHE. So, we will be "specs-ly correct" because we won't
override common methods (get() or equals()) with implementations that are
not compliant with the specs.

Also: do you agree with me that we should avoid caches based on weak
references (like the current WeakHashMap) if we want to be protected against
multiple instances of plugins? I can't see a compelling reason for a
garbage-collected CACHE, considering that the number of items stored in it
can't grow that large (how many different configurations can users ever
create?)

Enzo


Re: Loading mechnism of plugin classes and singleton objects

Posted by Doğacan Güney <do...@gmail.com>.
On 6/8/07, Enzo Michelangeli <en...@gmail.com> wrote:
> ----- Original Message -----
> From: "Doğacan Güney" <do...@gmail.com>
> Sent: Friday, June 08, 2007 3:49 PM
>
> [...]
> >> Any idea?
> >
> > This will certainly help a lot. If it is not too much trouble, can you
> > add debug outputs for hashCodes of conf objects (both for the one in
> > the cache and for the parameter, because it seems Configuration object
> > is created more than once so their hashCode may be different, which in
> > turn causes the change in CACHE's hashCode(*)) and a stack trace?
> > A stack trace of depth 2-3 will probably suffice, I am just wondering
> > what is calling PluginRepository.get(conf).
>
> OK, I changed my debug code as follows:
>
>   public static synchronized PluginRepository get(Configuration conf) {
>     PluginRepository result = CACHE.get(conf);
>         /* --- start debug code */
>         String tr = "";
>         StackTraceElement[] tes = Thread.currentThread().getStackTrace();
>         for(int j=2; j<tes.length; j++)
>             tr = tr+"\n    "+tes[j].toString();
>         LOG.info("In thread "+Thread.currentThread()+
>                  " a static method of the class "+
>                  (new CurrentClassGetter()).getCurrentClass()+
>                  " called CACHE.get("+conf+
>                  "), where CACHE is "+CACHE+
>                  " and CACHE.hashCode() = "+CACHE.hashCode()+
>                  " - got result = "+result+
>                  " conf.hashCode() was: "+conf.hashCode()+
>                  " hashCode was: "+conf.hashCode()+
>                  " Stack Trace:"+tr);
>         /* end debug code --- */
>     if (result == null) {
>       result = new PluginRepository(conf);
>       CACHE.put(conf, result);
>     }
>     return result;
>   }
>
>   /* --- start debug code */
>   public static class CurrentClassGetter extends SecurityManager {
>     public String getCurrentClass() {
>       Class cl = super.getClassContext()[1];
>       return cl.toString() + "@" + cl.hashCode();
>     }
>   }
>   /* end debug code --- */
>
> (With full stack trace: bytes are cheap ;-) )
>
> I did not bother to print the hashCode of the keys in CACHE because it's
> become evident why CACHE.get(conf) returns null: the hashCode of conf
> changes!

That's true. Take a look at this code from LocalJobRunner.Job.run:
for (int i = 0; i < splits.length; i++) {
          String mapId = "map_" + newId() ;
          mapIds.add(mapId);
          MapTask map = new MapTask(jobId, file, "tip_m_" + mapId,
                                    mapId, i,
                                    splits[i]);
          JobConf localConf = new JobConf(job);
          map.localizeConfiguration(localConf);
          map.setConf(localConf);
          map_tasks += 1;
          myMetrics.launchMap();
          map.run(localConf, this);
          myMetrics.completeMap();
          map_tasks -= 1;
        }

For each new map task, hadoop creates a new configuration object
(which of course makes sense), so its hashCode changes and all hell
breaks loose.

If I understood the code correctly, this will not be a problem in
distributed environment. Each new map task gets its own process
anyway, so there should be no leak.

> This is strange, because, as you can see below, the strings that
> make keys and values of conf appears unchanged. Perhaps we should override
> the equals() method in org.apache.hadoop.conf.Configuration (invoked by
> CACHE.get(), according to the specs of the java.util.Map interface), so that
> the hashCode()s of the keys get ignored, and conf1.equals(conf2) return true
> if and only if:
>
>  1. conf1.size() == conf2.size(),
>
>  2. for each key k1 of conf1 there is a key k2 in conf2 such as:
>   2.1 k1.equals(k2)
>   2.2 conf1.get(k1).equals(conf2.get(k2))

This has been suggested before and I have to say I don't like this
one, because this means that each call to PluginRepository.get(conf)
will end up comparing all key value pairs, which, IMO, is
excessive(because if I am not mistaken, we don't need this when
running nutch in a distributed environment.). Unfortunately, this may
be the only way to fix this leak.

This is probably not a good idea, but here it goes: Perhaps, we can
change LocaJobRunner.Job.run method. First, it clones the JobConf
object (clonedConf).  It then runs a MapTask with the original
JobConf. Upon completion, it copies everything from clonedConf back to
original JobConf. This way, original JobConf's hashCode won't change,
so there should be no leak.


>
> Anyway, I'm attaching the log below.
>
> > Thanks for the detailed analysis!
>
> Glad to be of help!
>
> Enzo
>

-- 
Doğacan Güney

Re: Loading mechnism of plugin classes and singleton objects

Posted by Enzo Michelangeli <en...@gmail.com>.
----- Original Message ----- 
From: "Doğacan Güney" <do...@gmail.com>
Sent: Friday, June 08, 2007 3:49 PM

[...]
>> Any idea?
>
> This will certainly help a lot. If it is not too much trouble, can you
> add debug outputs for hashCodes of conf objects (both for the one in
> the cache and for the parameter, because it seems Configuration object
> is created more than once so their hashCode may be different, which in
> turn causes the change in CACHE's hashCode(*)) and a stack trace?
> A stack trace of depth 2-3 will probably suffice, I am just wondering
> what is calling PluginRepository.get(conf).

OK, I changed my debug code as follows:

  public static synchronized PluginRepository get(Configuration conf) {
    PluginRepository result = CACHE.get(conf);
        /* --- start debug code */
        String tr = "";
        StackTraceElement[] tes = Thread.currentThread().getStackTrace();
        for(int j=2; j<tes.length; j++)
            tr = tr+"\n    "+tes[j].toString();
        LOG.info("In thread "+Thread.currentThread()+
                 " a static method of the class "+
                 (new CurrentClassGetter()).getCurrentClass()+
                 " called CACHE.get("+conf+
                 "), where CACHE is "+CACHE+
                 " and CACHE.hashCode() = "+CACHE.hashCode()+
                 " - got result = "+result+
                 " conf.hashCode() was: "+conf.hashCode()+
                 " hashCode was: "+conf.hashCode()+
                 " Stack Trace:"+tr);
        /* end debug code --- */
    if (result == null) {
      result = new PluginRepository(conf);
      CACHE.put(conf, result);
    }
    return result;
  }

  /* --- start debug code */
  public static class CurrentClassGetter extends SecurityManager {
    public String getCurrentClass() {
      Class cl = super.getClassContext()[1];
      return cl.toString() + "@" + cl.hashCode();
    }
  }
  /* end debug code --- */

(With full stack trace: bytes are cheap ;-) )

I did not bother to print the hashCode of the keys in CACHE because it's
become evident why CACHE.get(conf) returns null: the hashCode of conf
changes! This is strange, because, as you can see below, the strings that
make keys and values of conf appears unchanged. Perhaps we should override 
the equals() method in org.apache.hadoop.conf.Configuration (invoked by 
CACHE.get(), according to the specs of the java.util.Map interface), so that 
the hashCode()s of the keys get ignored, and conf1.equals(conf2) return true 
if and only if:

 1. conf1.size() == conf2.size(),

 2. for each key k1 of conf1 there is a key k2 in conf2 such as:
  2.1 k1.equals(k2)
  2.2 conf1.get(k1).equals(conf2.get(k2))

Anyway, I'm attaching the log below.

> Thanks for the detailed analysis!

Glad to be of help!

Enzo

2007-06-08 17:24:39,211 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 23315571 - got result =
org.apache.nutch.plugin.PluginRepository@ff2413 conf.hashCode() was:
27058272 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.plugin.PluginDescriptor.collectLibs(PluginDescriptor.java:309)
    org.apache.nutch.plugin.PluginDescriptor.getDependencyLibs(PluginDescriptor.java:298)
    org.apache.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:277)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:155)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:39,231 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 23315571 - got result =
org.apache.nutch.plugin.PluginRepository@ff2413 conf.hashCode() was:
27058272 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.urlfilter.geoip.GeoIpURLFilter.setConf(GeoIpURLFilter.java:252)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:39,802 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 23315571 - got result =
org.apache.nutch.plugin.PluginRepository@ff2413 conf.hashCode() was:
27058272 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.plugin.PluginDescriptor.collectLibs(PluginDescriptor.java:309)
    org.apache.nutch.plugin.PluginDescriptor.getDependencyLibs(PluginDescriptor.java:298)
    org.apache.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:277)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:155)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:39,802 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 23315571 - got result =
org.apache.nutch.plugin.PluginRepository@ff2413 conf.hashCode() was:
27058272 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.plugin.PluginDescriptor.collectLibs(PluginDescriptor.java:309)
    org.apache.nutch.plugin.PluginDescriptor.getDependencyLibs(PluginDescriptor.java:298)
    org.apache.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:277)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:155)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:43,618 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 23315571 - got result = null conf.hashCode() was: 7461949
Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:46)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:43,848 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1af33d6} and
CACHE.hashCode() = 54703198 - got result =
org.apache.nutch.plugin.PluginRepository@1af33d6 conf.hashCode() was:
7461949 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.plugin.PluginDescriptor.collectLibs(PluginDescriptor.java:309)
    org.apache.nutch.plugin.PluginDescriptor.getDependencyLibs(PluginDescriptor.java:298)
    org.apache.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:277)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:155)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:43,858 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1af33d6} and
CACHE.hashCode() = 54703198 - got result =
org.apache.nutch.plugin.PluginRepository@1af33d6 conf.hashCode() was:
7461949 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.urlfilter.geoip.GeoIpURLFilter.setConf(GeoIpURLFilter.java:252)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:43,918 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1af33d6} and
CACHE.hashCode() = 54703198 - got result =
org.apache.nutch.plugin.PluginRepository@1af33d6 conf.hashCode() was:
7461949 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.plugin.PluginDescriptor.collectLibs(PluginDescriptor.java:309)
    org.apache.nutch.plugin.PluginDescriptor.getDependencyLibs(PluginDescriptor.java:298)
    org.apache.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:277)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:155)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:43,918 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1af33d6} and
CACHE.hashCode() = 54703198 - got result =
org.apache.nutch.plugin.PluginRepository@1af33d6 conf.hashCode() was:
7461949 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.plugin.PluginDescriptor.collectLibs(PluginDescriptor.java:309)
    org.apache.nutch.plugin.PluginDescriptor.getDependencyLibs(PluginDescriptor.java:298)
    org.apache.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:277)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:155)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:44,299 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1af33d6} and
CACHE.hashCode() = 54703198 - got result = null conf.hashCode() was:
19647819 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:46)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:44,499 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1bcdbf6,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1af33d6} and
CACHE.hashCode() = 64604955 - got result =
org.apache.nutch.plugin.PluginRepository@1bcdbf6 conf.hashCode() was:
19647819 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.plugin.PluginDescriptor.collectLibs(PluginDescriptor.java:309)
    org.apache.nutch.plugin.PluginDescriptor.getDependencyLibs(PluginDescriptor.java:298)
    org.apache.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:277)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:155)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:44,509 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1bcdbf6,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1af33d6} and
CACHE.hashCode() = 64604955 - got result =
org.apache.nutch.plugin.PluginRepository@1bcdbf6 conf.hashCode() was:
19647819 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.urlfilter.geoip.GeoIpURLFilter.setConf(GeoIpURLFilter.java:252)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:44,599 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1bcdbf6,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1af33d6} and
CACHE.hashCode() = 64604955 - got result =
org.apache.nutch.plugin.PluginRepository@1bcdbf6 conf.hashCode() was:
19647819 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.plugin.PluginDescriptor.collectLibs(PluginDescriptor.java:309)
    org.apache.nutch.plugin.PluginDescriptor.getDependencyLibs(PluginDescriptor.java:298)
    org.apache.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:277)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:155)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:24:44,599 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1bcdbf6,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1af33d6} and
CACHE.hashCode() = 64604955 - got result =
org.apache.nutch.plugin.PluginRepository@1bcdbf6 conf.hashCode() was:
19647819 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.plugin.PluginDescriptor.collectLibs(PluginDescriptor.java:309)
    org.apache.nutch.plugin.PluginDescriptor.getDependencyLibs(PluginDescriptor.java:298)
    org.apache.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:277)
    org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:155)
    org.apache.nutch.net.URLFilters.<init>(URLFilters.java:54)
    org.apache.nutch.crawl.CrawlDbFilter.configure(CrawlDbFilter.java:66)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

2007-06-08 17:25:22,694 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1bcdbf6,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_rafehc.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1af33d6} and
CACHE.hashCode() = 64604955 - got result = null conf.hashCode() was:
27692793 Stack Trace:
    org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
    org.apache.nutch.scoring.ScoringFilters.<init>(ScoringFilters.java:59)
    org.apache.nutch.crawl.CrawlDbReducer.configure(CrawlDbReducer.java:46)
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:217)
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)




Re: Loading mechnism of plugin classes and singleton objects

Posted by Doğacan Güney <do...@gmail.com>.
Hi,

On 6/8/07, Enzo Michelangeli <en...@gmail.com> wrote:
> I have here a partial explanation of what's going wrong and causes multiple
> loading of plugin classes, but its core root still mystifies me.
>
> I traced back the multiple classloading of Plugin classes to the following
> chain of events in the org.apache.nutch.plugin package:
>
> 1. The method parsePlugin() of PluginManifestParser is unduly called more
> than once for each plugin (every time from a different instance of
> PluginManifestParser), instead of being called only once per plugin.
> 2. This causes the creation of multiple instances of PluginDescriptor
> objects, each of which instantiates its own copy of PluginClassLoader
> 3. With multiple copies of classloader, there are also multiple loadings of
> a same plugin class.
>
> The method parsePlugin() is only called (indirectly) in the constructor of
> PluginManifestParser, so this means that there are multiple instances of
> PluginManifestParser. But this object is only instantiated by the
> constructor of PluginRepository , and the latter is only invoked in its
> static get() method:
>
>   public static synchronized PluginRepository get(Configuration conf) {
>     PluginRepository result = CACHE.get(conf);
>     if (result == null) {
>       result = new PluginRepository(conf);
>       CACHE.put(conf, result);
>     }
>     return result;
>   }
>
> First of all, I see one problem: CACHE is defined as WeakHashMap, which is
> not guaranteed to hold data when the garbage collector kicks into action --
> whereas we definitely want to avoid multiple instances of PluginClassLoader
> even if memory is tight. So, I think that we should change CACHE's class
> type into, say, a Hashtable (which has also synchronized methods, just in
> case). However, it turns out that this is not the immediate cause of the
> problem, because I made the change to Hashtable but the problem continued to
> occur.
>
> What actually happens is that _sometimes_ CACHE.get() returns null for no
> apparent reason! Logging the content of CACHE shows that, in these cases,
> keys and values, as well as the parameter passed to CACHE.get(), are
> perfectly valid, identical to the ones in the few previous calls. In other
> words, the (key, value) pair exists and the key is identical to the
> parameter passed to CACHE.get(), and nevertheless CACHE.get() occasionally
> returns null.
>
> After that strange event happens, the log shows that the hashcode of the
> CACHE object becomes different. This is another thing that puzzles me,
> because CACHE is a static member of PluginRepository and it should only be
> initialized at class load time; and the object referenced by it CACHE should
> not be garbage-collected. Furthermore, my logs show that the class
> PluginRepository is only loaded once, because its hashCode remains unchanged
> throughout the run.
>
> Anyway, here is how I instrumented the get() factory method of
> PluginRepository :
>
>   /**
>    * @return a cached instance of the plugin repository
>    */
>   public static synchronized PluginRepository get(Configuration conf) {
>     PluginRepository result = CACHE.get(conf);
>         /* --- start debug code */
>         LOG.info("In thread "+Thread.currentThread()+
>                  " a static method of the class "+
>                  (new CurrentClassGetter()).getCurrentClass()+
>                  " called CACHE.get("+conf+
>                  "), where CACHE is "+CACHE+
>                  " and CACHE.hashCode() = "+CACHE.hashCode()+
>                  " - got result = "+result);
>         /* end debug code --- */
>     if (result == null) {
>       result = new PluginRepository(conf);
>       CACHE.put(conf, result);
>     }
>     return result;
>   }
>
>   /* --- start debug code */
>   // detect if the current class is loaded more than
>   // once by showing its hashCode
>   public static class CurrentClassGetter extends SecurityManager {
>     public String getCurrentClass() {
>       Class cl = super.getClassContext()[1];
>       return cl.toString() + "@" + cl.hashCode();
>     }
>   }
>   /* end debug code --- */
>
>
> ...and here is what is logged by that code (my comments to each previous
> line are marked with "==>"):
>
> 2007-06-08 10:14:25,223 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {} and
> CACHE.hashCode() = 0 - got result = null
>
> ==> Getting null from CACHE.get here is right, because it was the first
> time -- I would have expected a hashcode different from zero, but perhaps
> the compiler perform lazy initialization of static members.
>
> 2007-06-08 10:14:30,100 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 11660238 - got result =
> org.apache.nutch.plugin.PluginRepository@ff2413
>
> ==> OK so far...
>
> 2007-06-08 10:14:30,200 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 11660238 - got result =
> org.apache.nutch.plugin.PluginRepository@ff2413
>
> ==> OK so far...
>
> 2007-06-08 10:14:31,121 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 11660238 - got result =
> org.apache.nutch.plugin.PluginRepository@ff2413
>
> ==> OK so far...
>
> 2007-06-08 10:14:31,121 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 11660238 - got result =
> org.apache.nutch.plugin.PluginRepository@ff2413
>
> ==> OK so far...
>
> 2007-06-08 10:14:33,084 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 11660238 - got result = null
>
> ==> AHA! Why are we this time getting null from CACHE.get(conf), if both
> content of CACHE and value of conf are exactly the same as before?? And note
> that the class PluginRepository has the same hashCode etc.
>
> 2007-06-08 10:14:33,384 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 25606144 - got result =
> org.apache.nutch.plugin.PluginRepository@acb158
>
> ==> And here we can see that the hashcode of CACHE has already changed
> (weird for a member allocated by the initialization of a static member of a
> class that is loaded just once... (and it is, because the hashCode of the
> class PluginRepository is the same as before). Of course, the hashcode of
> the PluginRepository _object_ has now changed because the cache miss has
> triggered a new call to its constructor, which in turn has cused the
> instantiation of new copies of PluginManifestParser, of PluginDescriptor, of
> PluginClassLoader and finally of the plugin classes -- which was what we
> didn't like in first place. This pattern occurs a few times (see the rest of
> the log below):
>
> 2007-06-08 10:14:33,384 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 25606144 - got result =
> org.apache.nutch.plugin.PluginRepository@acb158
>
> 2007-06-08 10:14:33,455 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 25606144 - got result =
> org.apache.nutch.plugin.PluginRepository@acb158
>
> 2007-06-08 10:14:33,455 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 25606144 - got result =
> org.apache.nutch.plugin.PluginRepository@acb158
>
> 2007-06-08 10:14:33,765 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-
> ADMIN/mapred/local/localRunner/job_23zna7.xml , mapred-default.xmlfinal:
> hadoop-site.xml), where CACHE is {Configuration: defaults:
> hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 25606144 - got result = null
>
> 2007-06-08 00:03:48,201 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] called CACHE.get(Configuration: defaults:
> hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_dvpvjj.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_dvpvjj.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@14520eb,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_dvpvjj.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@2acc65} and
> CACHE.hashCode() = 37253945 - got result = null
>
> ==> The record above shows another occurrence of a null value returned, for
> no apparent reason, by PluginRepository.get().
>
> 2007-06-08 10:14:33,985 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1ce3fc5,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 40727333 - got result =
> org.apache.nutch.plugin.PluginRepository@1ce3fc5
>
> 2007-06-08 10:14:33,985 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1ce3fc5,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 40727333 - got result =
> org.apache.nutch.plugin.PluginRepository@1ce3fc5
>
> 2007-06-08 10:14:34,055 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1ce3fc5,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 40727333 - got result =
> org.apache.nutch.plugin.PluginRepository@1ce3fc5
>
> 2007-06-08 10:14:34,055 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1ce3fc5,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 40727333 - got result =
> org.apache.nutch.plugin.PluginRepository@1ce3fc5
>
> 2007-06-08 10:14:37,881 INFO  plugin.PluginRepository - In thread
> Thread[Thread-0,5,main] a static method of the class class
> org.apache.nutch.plugin.PluginRepository@12621140 called
> CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1ce3fc5,
> Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
> mapred-default.xmlfinal:
> hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
> CACHE.hashCode() = 40727333 - got result = null
>
> ==> And the one above is the last occurrence.
>
> Any idea?

This will certainly help a lot. If it is not too much trouble, can you
add debug outputs for hashCodes of conf objects (both for the one in
the cache and for the parameter, because it seems Configuration object
is created more than once so their hashCode may be different, which in
turn causes the change in CACHE's hashCode(*)) and a stack trace? A
stack trace of depth 2-3 will probably suffice, I am just wondering
what is calling PluginRepository.get(conf).

(*) CACHE's hashCode is calculated like this:
        int h = 0;
	Iterator<Entry<K,V>> i = entrySet().iterator();
	while (i.hasNext())
	    h += i.next().hashCode();
	return h;

     So if configuration's hashCode changes, CACHE's hashCode also changes.

Thanks for the detailed analysis!

>
> Enzo
>


-- 
Doğacan Güney

Re: Loading mechnism of plugin classes and singleton objects

Posted by Enzo Michelangeli <en...@gmail.com>.
I have here a partial explanation of what's going wrong and causes multiple
loading of plugin classes, but its core root still mystifies me.

I traced back the multiple classloading of Plugin classes to the following
chain of events in the org.apache.nutch.plugin package:

1. The method parsePlugin() of PluginManifestParser is unduly called more
than once for each plugin (every time from a different instance of
PluginManifestParser), instead of being called only once per plugin.
2. This causes the creation of multiple instances of PluginDescriptor
objects, each of which instantiates its own copy of PluginClassLoader
3. With multiple copies of classloader, there are also multiple loadings of
a same plugin class.

The method parsePlugin() is only called (indirectly) in the constructor of
PluginManifestParser, so this means that there are multiple instances of
PluginManifestParser. But this object is only instantiated by the
constructor of PluginRepository , and the latter is only invoked in its
static get() method:

  public static synchronized PluginRepository get(Configuration conf) {
    PluginRepository result = CACHE.get(conf);
    if (result == null) {
      result = new PluginRepository(conf);
      CACHE.put(conf, result);
    }
    return result;
  }

First of all, I see one problem: CACHE is defined as WeakHashMap, which is
not guaranteed to hold data when the garbage collector kicks into action --
whereas we definitely want to avoid multiple instances of PluginClassLoader
even if memory is tight. So, I think that we should change CACHE's class
type into, say, a Hashtable (which has also synchronized methods, just in
case). However, it turns out that this is not the immediate cause of the
problem, because I made the change to Hashtable but the problem continued to
occur.

What actually happens is that _sometimes_ CACHE.get() returns null for no
apparent reason! Logging the content of CACHE shows that, in these cases,
keys and values, as well as the parameter passed to CACHE.get(), are
perfectly valid, identical to the ones in the few previous calls. In other
words, the (key, value) pair exists and the key is identical to the
parameter passed to CACHE.get(), and nevertheless CACHE.get() occasionally
returns null.

After that strange event happens, the log shows that the hashcode of the
CACHE object becomes different. This is another thing that puzzles me,
because CACHE is a static member of PluginRepository and it should only be
initialized at class load time; and the object referenced by it CACHE should
not be garbage-collected. Furthermore, my logs show that the class
PluginRepository is only loaded once, because its hashCode remains unchanged
throughout the run.

Anyway, here is how I instrumented the get() factory method of
PluginRepository :

  /**
   * @return a cached instance of the plugin repository
   */
  public static synchronized PluginRepository get(Configuration conf) {
    PluginRepository result = CACHE.get(conf);
        /* --- start debug code */
        LOG.info("In thread "+Thread.currentThread()+
                 " a static method of the class "+
                 (new CurrentClassGetter()).getCurrentClass()+
                 " called CACHE.get("+conf+
                 "), where CACHE is "+CACHE+
                 " and CACHE.hashCode() = "+CACHE.hashCode()+
                 " - got result = "+result);
        /* end debug code --- */
    if (result == null) {
      result = new PluginRepository(conf);
      CACHE.put(conf, result);
    }
    return result;
  }

  /* --- start debug code */
  // detect if the current class is loaded more than
  // once by showing its hashCode
  public static class CurrentClassGetter extends SecurityManager {
    public String getCurrentClass() {
      Class cl = super.getClassContext()[1];
      return cl.toString() + "@" + cl.hashCode();
    }
  }
  /* end debug code --- */


...and here is what is logged by that code (my comments to each previous
line are marked with "==>"):

2007-06-08 10:14:25,223 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {} and
CACHE.hashCode() = 0 - got result = null

==> Getting null from CACHE.get here is right, because it was the first
time -- I would have expected a hashcode different from zero, but perhaps
the compiler perform lazy initialization of static members.

2007-06-08 10:14:30,100 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 11660238 - got result =
org.apache.nutch.plugin.PluginRepository@ff2413

==> OK so far...

2007-06-08 10:14:30,200 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 11660238 - got result =
org.apache.nutch.plugin.PluginRepository@ff2413

==> OK so far...

2007-06-08 10:14:31,121 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 11660238 - got result =
org.apache.nutch.plugin.PluginRepository@ff2413

==> OK so far...

2007-06-08 10:14:31,121 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 11660238 - got result =
org.apache.nutch.plugin.PluginRepository@ff2413

==> OK so far...

2007-06-08 10:14:33,084 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 11660238 - got result = null

==> AHA! Why are we this time getting null from CACHE.get(conf), if both
content of CACHE and value of conf are exactly the same as before?? And note
that the class PluginRepository has the same hashCode etc.

2007-06-08 10:14:33,384 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 25606144 - got result =
org.apache.nutch.plugin.PluginRepository@acb158

==> And here we can see that the hashcode of CACHE has already changed
(weird for a member allocated by the initialization of a static member of a
class that is loaded just once... (and it is, because the hashCode of the
class PluginRepository is the same as before). Of course, the hashcode of
the PluginRepository _object_ has now changed because the cache miss has
triggered a new call to its constructor, which in turn has cused the
instantiation of new copies of PluginManifestParser, of PluginDescriptor, of
PluginClassLoader and finally of the plugin classes -- which was what we
didn't like in first place. This pattern occurs a few times (see the rest of
the log below):

2007-06-08 10:14:33,384 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 25606144 - got result =
org.apache.nutch.plugin.PluginRepository@acb158

2007-06-08 10:14:33,455 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 25606144 - got result =
org.apache.nutch.plugin.PluginRepository@acb158

2007-06-08 10:14:33,455 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 25606144 - got result =
org.apache.nutch.plugin.PluginRepository@acb158

2007-06-08 10:14:33,765 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-
ADMIN/mapred/local/localRunner/job_23zna7.xml , mapred-default.xmlfinal:
hadoop-site.xml), where CACHE is {Configuration: defaults:
hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 25606144 - got result = null

2007-06-08 00:03:48,201 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] called CACHE.get(Configuration: defaults:
hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_dvpvjj.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_dvpvjj.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@14520eb,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_dvpvjj.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@2acc65} and
CACHE.hashCode() = 37253945 - got result = null

==> The record above shows another occurrence of a null value returned, for
no apparent reason, by PluginRepository.get().

2007-06-08 10:14:33,985 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1ce3fc5,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 40727333 - got result =
org.apache.nutch.plugin.PluginRepository@1ce3fc5

2007-06-08 10:14:33,985 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1ce3fc5,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 40727333 - got result =
org.apache.nutch.plugin.PluginRepository@1ce3fc5

2007-06-08 10:14:34,055 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1ce3fc5,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 40727333 - got result =
org.apache.nutch.plugin.PluginRepository@1ce3fc5

2007-06-08 10:14:34,055 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1ce3fc5,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 40727333 - got result =
org.apache.nutch.plugin.PluginRepository@1ce3fc5

2007-06-08 10:14:37,881 INFO  plugin.PluginRepository - In thread
Thread[Thread-0,5,main] a static method of the class class
org.apache.nutch.plugin.PluginRepository@12621140 called
CACHE.get(Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal: hadoop-site.xml), where CACHE is {Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@acb158,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@1ce3fc5,
Configuration: defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop-ADMIN/mapred/local/localRunner/job_23zna7.xml ,
mapred-default.xmlfinal:
hadoop-site.xml=org.apache.nutch.plugin.PluginRepository@ff2413} and
CACHE.hashCode() = 40727333 - got result = null

==> And the one above is the last occurrence.

Any idea?

Enzo

----- Original Message ----- 
From: "Briggs" <ac...@gmail.com>
To: <nu...@lucene.apache.org>
Sent: Wednesday, June 06, 2007 11:01 PM
Subject: Re: Loading mechnism of plugin classes and singleton objects


> This is all I did (and from what I have read, double checked locking is
> works correctly in jdk 5)
>
> private static volatile IndexingFilters INSTANCE;
>
> public static IndexingFilters getInstance(final Configuration
> configuration)
> {
>  if(INSTANCE == null) {
>    synchronized(IndexingFilters.class) {
>      if(INSTANCE == null) {
>        INSTANCE = new IndexingFilters(configuration);
>      }
>    }
>  }
>  return INSTANCE;
> }
>
> So, I just updated all the code that calls "new IndexingFilters(..)" to
> call
> IndexingFilters.getInstance(...).  This works for me, perhaps not
> everyone.
> I think that the filter interface should be refitted to allow the
> configuration instance to be passed along the filters too, or allow a way
> for the thread to obtain it's current configuration, rather than
> instantiating these things over and over again.  If a filter is designed
> to
> be thread-safe, there is no need for all this unnecessary object creation.
>
>
> On 6/6/07, Briggs <ac...@gmail.com> wrote:
>>
>> FYI, I ran into the same problem.   I wanted my filters to be
>> instantiated
>> only once, and they not only get instantiated repeatedly, but the
>> classloading is flawed in that it keeps reloading the classes.  So, if
>> you
>> ever dump the stats from your app (use 'jmap -histo;) you can see all the
>> classes that have been loaded. You will notice, if you have been running
>> nutch for a while,  classes being loaded thousands of times and never
>> unloaded. My quick fix was to just edit all the main plugin points (
>> URLFilters.java, IndexFilters.java etc) and made them all singletons.  I
>> haven't had time to look into the classloading facility.  There is a bit
>> of
>> a bug in there (IMHO), but some people may not want singletons.  But,
>> there
>> needs to be a way of just instantiating a new plugin, and not
>> instantiating
>> a new classloader everytime a plugin is requested.  These seem to never
>> get
>> garbage collected.
>>
>> Anyway.. that's all I have to say at the moment.
>>
>>
>>
>> On 6/5/07, Doğacan Güney <dogacan@gmail.com > wrote:
>> >
>> > Hi,
>> >
>> > It seems that plugin-loading code is somehow broken. There is some
>> > discussion going on about this on
>> > http://www.nabble.com/forum/ViewPost.jtp?post=10844164&framed=y .
>> >
>> > On 6/5/07, Enzo Michelangeli < enzomich@gmail.com> wrote:
>> > > I have a question about the loading mechanism of plugin classes. I'm
>> > working
>> > > with a custom URLFilter, and I need a singleton object loaded and
>> > > initialized by the first instance of the URLFilter, and shared by
>> > other
>> > > instances (e.g., instantiated by other threads). I was assuming that
>> > the
>> > > URLFilter class was being loaded only once even when the filter is
>> > used by
>> > > multiple threads, so I tried to use a static member variable of my
>> > URLFilter
>> > > class to hold a reference to the object to be shared: but it appears
>> > that
>> > > the supposed singleton, actually, isn't, because the method
>> > responsible for
>> > > its instantiation finds the static field initialized to null. So: are
>> > > URLFilter classes loaded multiple times by their classloader in
>> > > Nutch?
>> > The
>> > > wiki page at
>> > >
>> > http://wiki.apache.org/nutch/WhichTechnicalConceptsAreBehindTheNutchPluginSystem
>> > > seems to suggest otherwise:
>> > >
>> > >     Until Nutch runtime, only one instance of such a plugin
>> > >     class is alive in the Java virtual machine.
>> > >
>> > > (By the way, what does "Until Nutch runtime" mean here? Before Nutch
>> > > runtime, no class whatsoever is supposed to be alive in the JVM, is
>> > it?)
>> > >
>> > > Enzo
>> > >
>> > >
>> >
>> > --
>> > Doğacan Güney
>> >
>>
>>
>>
>> --
>> "Conscious decisions by conscious minds are what make reality real"
>
>
>
>
> -- 
> "Conscious decisions by conscious minds are what make reality real"
>


Re: Loading mechnism of plugin classes and singleton objects

Posted by Briggs <ac...@gmail.com>.
This is all I did (and from what I have read, double checked locking is
works correctly in jdk 5)

private static volatile IndexingFilters INSTANCE;

public static IndexingFilters getInstance(final Configuration configuration)
{
  if(INSTANCE == null) {
    synchronized(IndexingFilters.class) {
      if(INSTANCE == null) {
        INSTANCE = new IndexingFilters(configuration);
      }
    }
  }
  return INSTANCE;
}

So, I just updated all the code that calls "new IndexingFilters(..)" to call
IndexingFilters.getInstance(...).  This works for me, perhaps not everyone.
I think that the filter interface should be refitted to allow the
configuration instance to be passed along the filters too, or allow a way
for the thread to obtain it's current configuration, rather than
instantiating these things over and over again.  If a filter is designed to
be thread-safe, there is no need for all this unnecessary object creation.


On 6/6/07, Briggs <ac...@gmail.com> wrote:
>
> FYI, I ran into the same problem.   I wanted my filters to be instantiated
> only once, and they not only get instantiated repeatedly, but the
> classloading is flawed in that it keeps reloading the classes.  So, if you
> ever dump the stats from your app (use 'jmap -histo;) you can see all the
> classes that have been loaded. You will notice, if you have been running
> nutch for a while,  classes being loaded thousands of times and never
> unloaded. My quick fix was to just edit all the main plugin points (
> URLFilters.java, IndexFilters.java etc) and made them all singletons.  I
> haven't had time to look into the classloading facility.  There is a bit of
> a bug in there (IMHO), but some people may not want singletons.  But, there
> needs to be a way of just instantiating a new plugin, and not instantiating
> a new classloader everytime a plugin is requested.  These seem to never get
> garbage collected.
>
> Anyway.. that's all I have to say at the moment.
>
>
>
> On 6/5/07, Doğacan Güney <dogacan@gmail.com > wrote:
> >
> > Hi,
> >
> > It seems that plugin-loading code is somehow broken. There is some
> > discussion going on about this on
> > http://www.nabble.com/forum/ViewPost.jtp?post=10844164&framed=y .
> >
> > On 6/5/07, Enzo Michelangeli < enzomich@gmail.com> wrote:
> > > I have a question about the loading mechanism of plugin classes. I'm
> > working
> > > with a custom URLFilter, and I need a singleton object loaded and
> > > initialized by the first instance of the URLFilter, and shared by
> > other
> > > instances (e.g., instantiated by other threads). I was assuming that
> > the
> > > URLFilter class was being loaded only once even when the filter is
> > used by
> > > multiple threads, so I tried to use a static member variable of my
> > URLFilter
> > > class to hold a reference to the object to be shared: but it appears
> > that
> > > the supposed singleton, actually, isn't, because the method
> > responsible for
> > > its instantiation finds the static field initialized to null. So: are
> > > URLFilter classes loaded multiple times by their classloader in Nutch?
> > The
> > > wiki page at
> > >
> > http://wiki.apache.org/nutch/WhichTechnicalConceptsAreBehindTheNutchPluginSystem
> > > seems to suggest otherwise:
> > >
> > >     Until Nutch runtime, only one instance of such a plugin
> > >     class is alive in the Java virtual machine.
> > >
> > > (By the way, what does "Until Nutch runtime" mean here? Before Nutch
> > > runtime, no class whatsoever is supposed to be alive in the JVM, is
> > it?)
> > >
> > > Enzo
> > >
> > >
> >
> > --
> > Doğacan Güney
> >
>
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"




-- 
"Conscious decisions by conscious minds are what make reality real"

Re: Loading mechnism of plugin classes and singleton objects

Posted by Briggs <ac...@gmail.com>.
FYI, I ran into the same problem.   I wanted my filters to be instantiated
only once, and they not only get instantiated repeatedly, but the
classloading is flawed in that it keeps reloading the classes.  So, if you
ever dump the stats from your app (use 'jmap -histo;) you can see all the
classes that have been loaded. You will notice, if you have been running
nutch for a while,  classes being loaded thousands of times and never
unloaded. My quick fix was to just edit all the main plugin points (
URLFilters.java, IndexFilters.java etc) and made them all singletons.  I
haven't had time to look into the classloading facility.  There is a bit of
a bug in there (IMHO), but some people may not want singletons.  But, there
needs to be a way of just instantiating a new plugin, and not instantiating
a new classloader everytime a plugin is requested.  These seem to never get
garbage collected.

Anyway.. that's all I have to say at the moment.



On 6/5/07, Doğacan Güney <do...@gmail.com> wrote:
>
> Hi,
>
> It seems that plugin-loading code is somehow broken. There is some
> discussion going on about this on
> http://www.nabble.com/forum/ViewPost.jtp?post=10844164&framed=y .
>
> On 6/5/07, Enzo Michelangeli <en...@gmail.com> wrote:
> > I have a question about the loading mechanism of plugin classes. I'm
> working
> > with a custom URLFilter, and I need a singleton object loaded and
> > initialized by the first instance of the URLFilter, and shared by other
> > instances (e.g., instantiated by other threads). I was assuming that the
> > URLFilter class was being loaded only once even when the filter is used
> by
> > multiple threads, so I tried to use a static member variable of my
> URLFilter
> > class to hold a reference to the object to be shared: but it appears
> that
> > the supposed singleton, actually, isn't, because the method responsible
> for
> > its instantiation finds the static field initialized to null. So: are
> > URLFilter classes loaded multiple times by their classloader in Nutch?
> The
> > wiki page at
> >
> http://wiki.apache.org/nutch/WhichTechnicalConceptsAreBehindTheNutchPluginSystem
> > seems to suggest otherwise:
> >
> >     Until Nutch runtime, only one instance of such a plugin
> >     class is alive in the Java virtual machine.
> >
> > (By the way, what does "Until Nutch runtime" mean here? Before Nutch
> > runtime, no class whatsoever is supposed to be alive in the JVM, is it?)
> >
> > Enzo
> >
> >
>
> --
> Doğacan Güney
>



-- 
"Conscious decisions by conscious minds are what make reality real"

Re: Loading mechnism of plugin classes and singleton objects

Posted by Doğacan Güney <do...@gmail.com>.
Hi,

It seems that plugin-loading code is somehow broken. There is some
discussion going on about this on
http://www.nabble.com/forum/ViewPost.jtp?post=10844164&framed=y .

On 6/5/07, Enzo Michelangeli <en...@gmail.com> wrote:
> I have a question about the loading mechanism of plugin classes. I'm working
> with a custom URLFilter, and I need a singleton object loaded and
> initialized by the first instance of the URLFilter, and shared by other
> instances (e.g., instantiated by other threads). I was assuming that the
> URLFilter class was being loaded only once even when the filter is used by
> multiple threads, so I tried to use a static member variable of my URLFilter
> class to hold a reference to the object to be shared: but it appears that
> the supposed singleton, actually, isn't, because the method responsible for
> its instantiation finds the static field initialized to null. So: are
> URLFilter classes loaded multiple times by their classloader in Nutch? The
> wiki page at
> http://wiki.apache.org/nutch/WhichTechnicalConceptsAreBehindTheNutchPluginSystem
> seems to suggest otherwise:
>
>     Until Nutch runtime, only one instance of such a plugin
>     class is alive in the Java virtual machine.
>
> (By the way, what does "Until Nutch runtime" mean here? Before Nutch
> runtime, no class whatsoever is supposed to be alive in the JVM, is it?)
>
> Enzo
>
>

-- 
Doğacan Güney