You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Enzo Michelangeli <en...@gmail.com> on 2007/06/09 09:53:36 UTC
Re: Loading mechanism of plugin classes and singleton objects

This is a self-followup from a thread in nutch-user - see:
http://www.nabble.com/Loading-mechnism-of-plugin-classes-and-singleton-objects-tf3869131.html#a11036730

----- Original Message ----- 
From: "Enzo Michelangeli" <en...@gmail.com>

To: <nu...@lucene.apache.org>
Sent: Saturday, June 09, 2007 10:43 AM
Subject: Re: Loading mechnism of plugin classes and singleton objects

[...]
>                                       In that spirit (think global, act
> local ;-) ), we could also subclass org.apache.hadoop.conf.Configuration
> only for use by methods of PluginRepository, and override its hashCode()
> method instead of touching the original class.

On second thought, I believe we should change the original: that's how the
hashCode() method of a class implementing the Map interface, like
org.apache.hadoop.conf.Configuration.hashCode(), should work in first place,
and, if it doesn't, it should be fixed. The JavaDoc for the Map interface
http://java.sun.com/j2se/1.5.0/docs/api/java/util/Map.html#hashCode%28%29
says:

    int hashCode()
        Returns the hash code value for this map. The hash code of
        a map is defined to be the sum of the hashCodes of each
        entry in the map's entrySet view. [...]

So, I suppose that the following method should be added to the class
org.apache.hadoop.conf.Configuration , fixing what appears to be a bug of
non-compliance with the specs:

      public int hashCode() {
          int hc = 0;
          java.util.Iterator<Map.Entry<String,String>> it = iterator();
          while(it.hasNext()) {
              Map.Entry<String,String> e = it.next();
              hc += e.hashCode();
          }
          return hc;
      }

I don't have the source code of Hadoop installed on my machine so I can't
quickly test this code, but it should be OK as long as the object returned
by the Configuration.iterator().next() method can be trusted to follow the
specifications of the Map.Entry interface for its own hashCode(). To be
sure, one can always modify it, based on what
http://java.sun.com/j2se/1.5.0/docs/api/java/util/Map.Entry.html#hashCode%28%29
tells us about Map.Entry.hashcode(), into a more explicit:

      public int hashCode() {
          int hc = 0;
          java.util.Iterator<Map.Entry<String,String>> it = iterator();
          while(it.hasNext()) {
              Map.Entry<String,String> e = it.next();

              hc += (e.getKey()==null   ? 0 : e.getKey().hashCode()) ^
                    (e.getValue()==null ? 0 : e.getValue().hashCode())

          }
          return hc;
      }

Another potential problem is the implementation of
org.apache.hadoop.conf.Configuration's iterator() method: it seems to return
the Iterator of a newly allocated HashMap object... Might that screw up the
hashCodes of the entries? Probably not, if we use the second of the two
implementations listed above which relies upon the hashCodes of the copied
strings, but better watch out.

Cheers --

Enzo