You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2009/07/23 21:59:59 UTC

changing CharArrayString hashCode

While doing some generics work in uimaj-core, I came across the hashCode
impl in this class; it has one possible problem in that it uses Math.abs
in an attempt to return just non-negative ints.  This is required in
other places, where the hash code is used to create indexes using
hashCode % some-size, and the "mod" operator needs a non-negative input
to work the way you want here.

The Math.abs of Integer.MIN_VALUE, which I think could be generated by
the hash code above (but I haven't verified this), is defined to be that
same number (surprisingly).  A slightly better way to compute this might
be to use the following:

... same body ...
  return hash >>> 1;  // insure hashcode is positive, without using
Math.abs which fails for MIN_VALUE

Would changing the hashcode definition break any current use?

-Marshall


Re: changing CharArrayString hashCode

Posted by Tommaso Teofili <to...@gmail.com>.
Talking about this, in order to standardize the code, one possible
alternative for hashCode, toString, equals methods overriding could be using
apache.commons.lang library builders (HashCodeBuilder, EqualsBuilder,
ToStringBuilder). Moreover many string handling utilities come across with
that.
Obviously the disadvantage of another dependency to add come in too.
Regards,
Tommaso Teofili

2009/7/24 Thilo Goetz <tw...@gmx.de>

> Marshall Schor wrote:
> > While doing some generics work in uimaj-core, I came across the hashCode
> > impl in this class; it has one possible problem in that it uses Math.abs
> > in an attempt to return just non-negative ints.  This is required in
> > other places, where the hash code is used to create indexes using
> > hashCode % some-size, and the "mod" operator needs a non-negative input
> > to work the way you want here.
> >
> > The Math.abs of Integer.MIN_VALUE, which I think could be generated by
> > the hash code above (but I haven't verified this), is defined to be that
> > same number (surprisingly).  A slightly better way to compute this might
> > be to use the following:
> >
> > ... same body ...
> >   return hash >>> 1;  // insure hashcode is positive, without using
> > Math.abs which fails for MIN_VALUE
> >
> > Would changing the hashcode definition break any current use?
> >
> > -Marshall
>
> This class isn't used any more and could be deleted (together with
> the classes that depend on it).  One of the two classes that
> depends CharArrayString is TextTokenizer, something I wrote many
> years ago and had completely forgotten.  It does similar things to
> the whitespace tokenizer in the sandbox, but it's fully configurable
> by the user.  All it would need would be wrapping up in an annotator.
> We had that once, but I guess that got lost along the way somewhere.
>
> The tokenizer does not depend on CharArrayString in any crucial way
> and could be salvaged, if there was interest.  I don't see a point
> in supporting two simple tokenizers like this, though.
>
> --Thilo
>
>

Re: changing CharArrayString hashCode

Posted by Thilo Goetz <tw...@gmx.de>.
Marshall Schor wrote:
> While doing some generics work in uimaj-core, I came across the hashCode
> impl in this class; it has one possible problem in that it uses Math.abs
> in an attempt to return just non-negative ints.  This is required in
> other places, where the hash code is used to create indexes using
> hashCode % some-size, and the "mod" operator needs a non-negative input
> to work the way you want here.
> 
> The Math.abs of Integer.MIN_VALUE, which I think could be generated by
> the hash code above (but I haven't verified this), is defined to be that
> same number (surprisingly).  A slightly better way to compute this might
> be to use the following:
> 
> ... same body ...
>   return hash >>> 1;  // insure hashcode is positive, without using
> Math.abs which fails for MIN_VALUE
> 
> Would changing the hashcode definition break any current use?
> 
> -Marshall

This class isn't used any more and could be deleted (together with
the classes that depend on it).  One of the two classes that
depends CharArrayString is TextTokenizer, something I wrote many
years ago and had completely forgotten.  It does similar things to
the whitespace tokenizer in the sandbox, but it's fully configurable
by the user.  All it would need would be wrapping up in an annotator.
We had that once, but I guess that got lost along the way somewhere.

The tokenizer does not depend on CharArrayString in any crucial way
and could be salvaged, if there was interest.  I don't see a point
in supporting two simple tokenizers like this, though.

--Thilo