You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@netbeans.apache.org by Victor Williams Stafusa da Silva <vi...@gmail.com> on 2017/10/28 17:16:55 UTC

Do not use java.net.URL as keys in Maps

I was looking around an issue, when I landed in the class
PatchModuleFileManager (
https://github.com/apache/incubator-netbeans/blob/master/java.source.base/src/org/netbeans/modules/java/source/parsing/PatchModuleFileManager.java,
version of 20170921).

This class uses java.net.URLs as keys of a HashMap. This is a really bad
thing to do because URL has one of the worst implementations of
equals(Object) and hashCode() ever invented.

Let's see this snippet from the URL.hashCode()'s method javadoc (
https://docs.oracle.com/javase/9/docs/api/java/net/URL.html#hashCode--):

> The hash code is based upon all the URL components relevant for URL
comparison. As such, this operation is a blocking operation.

The equals(Object) method is also similarly bad:

> Two hosts are considered equivalent if both host names can be resolved
into the same IP addresses; else if either host name can't be resolved, the
host names must be equal without regard to case; or both host names equal
to null.

> Since hosts comparison requires name resolution, this operation is a
blocking operation.

> Note: The defined behavior for equals is known to be inconsistent with
virtual hosting in HTTP.
So, both equals(Object) and hashCode() method are blocking operations which
do DNS lookups dependent on the state of the internet. Also, it is at least
debatable the fact that hashCode() method is synchronized. At least, it
caches the hashCode() method results. I'm not sure, but I would not be
surprised if it is possible to construct two different instances of URLs
from the same String with different hashCodes()'s due to a network topology
change in the meantime.

This all means that the URL class, by not having a sane equals(Object) and
hashCode() implementation shouldn't be used as a key to a HashMap or
LinkedHashMap nor be used in a HashSet or LinkedHashSet.

In fact, I'm in favor of putting a very big and loud @Deprecated in
java.net.URL. It is an innocent-looking, but actually very dangerous,
performance bottleneck and bug factory. Although it is actually possible to
use it safely if you never call equals(Object) and hashCode(), this is
something extremely hard and error-prone to do in practice, since
equals(Object) and hashCode() usage is widespread as it should be. And
arguably, a class that features ill-conceived equals(Object) and hashCode()
methods are itself ill-conceived. However, that is something that should be
done with the JDK guys.

Now, back to the PatchModuleFileManager, it uses URL as a key to a HashMap.
That is nasty, so I propose that it should be replaced with URI or String.
The PatchModuleFileManager uses the FileObjects class in the same package,
which also uses the URL.equals(object) method. The codebase is huge, and
those are just the first two classes that I checked, so it is very likely
that the URL class is misused in a lot of other places in the code base,
deserving a major effort to check for that.

I'm not familiar with the internal details of netbeans nor is that anything
that I could quicky learn, so I'm not in position of creating a PR to fix
that, so I will just raise the alarm here.

Victor Williams Stafusa da Silva

Re: Do not use java.net.URL as keys in Maps

Posted by Eirik Bakke <eb...@ultorg.com>.
Just to add another example of a NetBeans bug related to equals/hashCode on URL (fixed last year):
https://netbeans.org/bugzilla/show_bug.cgi?id=267962
("30s delay upon first connecting to a localhost database (due to URLClassLoader trying to DNS-resolve nbinst URL in URL.hashCode)")

-- Eirik

On 10/29/17, 5:45 AM, "Lars Bruun-Hansen" <lb...@gmail.com>> wrote:

Thank you for this. Much appreciated.
I for one wasn't aware of this. I had to look into the JDK source because I
refused to believe you :-)
You are right.

I've done some basic grepping for

"Map[:space:]*<[:space:]*URL"

on the NetBeans source code and as I feared this pattern appears all over
the place.

The reason why it is not as bad as it seems is that the vast majority of
these are really jar URLs, and these URLs have their own URLStreamHandler
in the JDK which overrides the sameFile() method (used in equals()) and
hashCode() methods so that there's no longer any name lookup performed.
This is also true for the one you point to, namely PatchModuleFileManager.
Out of the box with JDK, the issue you point to affects "http(s)://",
"ftp://", "file://", "mailto://", but not "jar://".

On top of this - and perhaps more importantly - I can see the code base
makes heavy use of its own custom URLStreamHandlers and
URLStreamHandlerFactories so I would guess that this has been taken care
of. The developers have indeed been aware of this as far as I can tell. All
in all I think this has been taken into consideration.

But you are 100% right that most people would not expect url1.equals(url2)
to trigger a name service lookup (DNS query). Ouch!

I had to go over my own contribution, PR 161, but luckily it uses caching
by URI, not caching by URL. Phew.

Thanks.



On Sat, Oct 28, 2017 at 7:16 PM, Victor Williams Stafusa da Silva <
victorwssilva@gmail.com<ma...@gmail.com>> wrote:

I was looking around an issue, when I landed in the class
PatchModuleFileManager (
https://github.com/apache/incubator-netbeans/blob/
master/java.source.base/src/org/netbeans/modules/java/source/parsing/
PatchModuleFileManager.java,
version of 20170921).

This class uses java.net.URLs as keys of a HashMap. This is a really bad
thing to do because URL has one of the worst implementations of
equals(Object) and hashCode() ever invented.

Let's see this snippet from the URL.hashCode()'s method javadoc (
https://docs.oracle.com/javase/9/docs/api/java/net/URL.html#hashCode--):

> The hash code is based upon all the URL components relevant for URL
comparison. As such, this operation is a blocking operation.

The equals(Object) method is also similarly bad:

> Two hosts are considered equivalent if both host names can be resolved
into the same IP addresses; else if either host name can't be resolved, the
host names must be equal without regard to case; or both host names equal
to null.

> Since hosts comparison requires name resolution, this operation is a
blocking operation.

> Note: The defined behavior for equals is known to be inconsistent with
virtual hosting in HTTP.
So, both equals(Object) and hashCode() method are blocking operations which
do DNS lookups dependent on the state of the internet. Also, it is at least
debatable the fact that hashCode() method is synchronized. At least, it
caches the hashCode() method results. I'm not sure, but I would not be
surprised if it is possible to construct two different instances of URLs
from the same String with different hashCodes()'s due to a network topology
change in the meantime.

This all means that the URL class, by not having a sane equals(Object) and
hashCode() implementation shouldn't be used as a key to a HashMap or
LinkedHashMap nor be used in a HashSet or LinkedHashSet.

In fact, I'm in favor of putting a very big and loud @Deprecated in
java.net.URL. It is an innocent-looking, but actually very dangerous,
performance bottleneck and bug factory. Although it is actually possible to
use it safely if you never call equals(Object) and hashCode(), this is
something extremely hard and error-prone to do in practice, since
equals(Object) and hashCode() usage is widespread as it should be. And
arguably, a class that features ill-conceived equals(Object) and hashCode()
methods are itself ill-conceived. However, that is something that should be
done with the JDK guys.

Now, back to the PatchModuleFileManager, it uses URL as a key to a HashMap.
That is nasty, so I propose that it should be replaced with URI or String.
The PatchModuleFileManager uses the FileObjects class in the same package,
which also uses the URL.equals(object) method. The codebase is huge, and
those are just the first two classes that I checked, so it is very likely
that the URL class is misused in a lot of other places in the code base,
deserving a major effort to check for that.

I'm not familiar with the internal details of netbeans nor is that anything
that I could quicky learn, so I'm not in position of creating a PR to fix
that, so I will just raise the alarm here.

Victor Williams Stafusa da Silva



Re: Do not use java.net.URL as keys in Maps

Posted by Lars Bruun-Hansen <lb...@gmail.com>.
Thank you for this. Much appreciated.
I for one wasn't aware of this. I had to look into the JDK source because I
refused to believe you :-)
You are right.

I've done some basic grepping for

"Map[:space:]*<[:space:]*URL"

on the NetBeans source code and as I feared this pattern appears all over
the place.

The reason why it is not as bad as it seems is that the vast majority of
these are really jar URLs, and these URLs have their own URLStreamHandler
in the JDK which overrides the sameFile() method (used in equals()) and
hashCode() methods so that there's no longer any name lookup performed.
This is also true for the one you point to, namely PatchModuleFileManager.
Out of the box with JDK, the issue you point to affects "http(s)://",
"ftp://", "file://", "mailto://", but not "jar://".

On top of this - and perhaps more importantly - I can see the code base
makes heavy use of its own custom URLStreamHandlers and
URLStreamHandlerFactories so I would guess that this has been taken care
of. The developers have indeed been aware of this as far as I can tell. All
in all I think this has been taken into consideration.

But you are 100% right that most people would not expect url1.equals(url2)
to trigger a name service lookup (DNS query). Ouch!

I had to go over my own contribution, PR 161, but luckily it uses caching
by URI, not caching by URL. Phew.

Thanks.



On Sat, Oct 28, 2017 at 7:16 PM, Victor Williams Stafusa da Silva <
victorwssilva@gmail.com> wrote:

> I was looking around an issue, when I landed in the class
> PatchModuleFileManager (
> https://github.com/apache/incubator-netbeans/blob/
> master/java.source.base/src/org/netbeans/modules/java/source/parsing/
> PatchModuleFileManager.java,
> version of 20170921).
>
> This class uses java.net.URLs as keys of a HashMap. This is a really bad
> thing to do because URL has one of the worst implementations of
> equals(Object) and hashCode() ever invented.
>
> Let's see this snippet from the URL.hashCode()'s method javadoc (
> https://docs.oracle.com/javase/9/docs/api/java/net/URL.html#hashCode--):
>
> > The hash code is based upon all the URL components relevant for URL
> comparison. As such, this operation is a blocking operation.
>
> The equals(Object) method is also similarly bad:
>
> > Two hosts are considered equivalent if both host names can be resolved
> into the same IP addresses; else if either host name can't be resolved, the
> host names must be equal without regard to case; or both host names equal
> to null.
>
> > Since hosts comparison requires name resolution, this operation is a
> blocking operation.
>
> > Note: The defined behavior for equals is known to be inconsistent with
> virtual hosting in HTTP.
> So, both equals(Object) and hashCode() method are blocking operations which
> do DNS lookups dependent on the state of the internet. Also, it is at least
> debatable the fact that hashCode() method is synchronized. At least, it
> caches the hashCode() method results. I'm not sure, but I would not be
> surprised if it is possible to construct two different instances of URLs
> from the same String with different hashCodes()'s due to a network topology
> change in the meantime.
>
> This all means that the URL class, by not having a sane equals(Object) and
> hashCode() implementation shouldn't be used as a key to a HashMap or
> LinkedHashMap nor be used in a HashSet or LinkedHashSet.
>
> In fact, I'm in favor of putting a very big and loud @Deprecated in
> java.net.URL. It is an innocent-looking, but actually very dangerous,
> performance bottleneck and bug factory. Although it is actually possible to
> use it safely if you never call equals(Object) and hashCode(), this is
> something extremely hard and error-prone to do in practice, since
> equals(Object) and hashCode() usage is widespread as it should be. And
> arguably, a class that features ill-conceived equals(Object) and hashCode()
> methods are itself ill-conceived. However, that is something that should be
> done with the JDK guys.
>
> Now, back to the PatchModuleFileManager, it uses URL as a key to a HashMap.
> That is nasty, so I propose that it should be replaced with URI or String.
> The PatchModuleFileManager uses the FileObjects class in the same package,
> which also uses the URL.equals(object) method. The codebase is huge, and
> those are just the first two classes that I checked, so it is very likely
> that the URL class is misused in a lot of other places in the code base,
> deserving a major effort to check for that.
>
> I'm not familiar with the internal details of netbeans nor is that anything
> that I could quicky learn, so I'm not in position of creating a PR to fix
> that, so I will just raise the alarm here.
>
> Victor Williams Stafusa da Silva
>