You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Konstantin Kolinko <kn...@gmail.com> on 2012/07/30 02:16:19 UTC

Re: [Proposal] Preparatory refactoring for Resource handling refactoring

2012/6/17 Mark Thomas <ma...@apache.org>:
> On 16/06/2012 19:18, Mark Thomas wrote:
>>
>>
>> Konstantin Kolinko <kn...@gmail.com> wrote:
>>
>>> 2012/6/16 Mark Thomas <ma...@apache.org>:
>>>>
>>>>> URLs are needed per Servlet API, so they cannot be removed.
>>>>> Does our "jndi" schema need DirContext as underlying
>>>>> implementation?
>>>>
>>>> Our jndi scheme was used to provide access to resources. I
>>>> believe
>>> all
>>>> of that will now go.
>>>>
>>>>> I noticed the following commit in archives:
>>>>> http://svn.apache.org/viewvc?view=revision&revision=1137646 so
>>>>> we have to deal with such schema combinations as "jar:jndi:".
>>>>
>>>> No we won't. We only hadf to deal with URLs like that because we
>>>> generated them.
>>>>
>>>
>>> How are you going to implement ServletContext.getResource(String):
>>> URL
>>>
>>> without a custom URL scheme  (be it named "jndi" or somehow else)?
>>>
>>> For file resources it might be possible to produce the "actual"
>>> URL pointing to a JAR entry or to a file (leaving aside the
>>> question of whether it exposes too much details),  but you cannot
>>> do so with directories,  as entries in a directory can be assembled
>>> from several sources.
>>
>> My intention was to use the URL for the actual resource. For
>> directories, I'll use the first matching dir I find although I need
>> to re-read the spec and Javadoc to make sure there aren't any nasty
>> surprises waiting to trip me up.
>
> Having re-read the specification and Javadoc, I don't see anything of
> concern. Additional pairs of eyes wouldn't hurt though.
>
> How to handle getResource() for a directory that exists in one or more
> overlays and/or the main WAR is an interesting question. I'll be sure to
> raise it within the Servlet EG when we get back to that question.
>

If we remove JNDI stuff from resource handling,  one of "challenges"
might be to re-implement DefaultServlet using only Servlet API
methods. Well, if the former is not possible, it might use the new
resources API (that you are going to implement instead of jndi one)
and be thus still tied with Tomcat internals...

If one reimplements DefaultServlet, one of the tasks would be to
generate directory listings. Those include file size and file
timestamp. One needs to obtain URL of a resource, open its
URLConnection and ask those attributes.


One good thing with "jndi:" URLs returned via Servlet API is that they
are backed by an instance of ProxyDirContext class and it has a cache
(*).  If we change implementation and return "true" URLs, they will
bypass the cache.  I suspect that using a "jar:" URL directly (in case
of an unpacked WAR file) may have poor performance.

Other good thing is that you can create relative URLs as "new URL(Url,
String)", which inherits URLStreamHandler instance from the original
URL, and thus inherits access to ProxyDirContext instance.  If it is a
"jndi" URL it will have access to the full resources hierarchy of the
web application.  If it is a "true" URL, it will be limited to its
origin file system.

The above two are the reasons why I think that in Tomcat 8 we cannot
return "true" URLs from ServletContext.getResource(String) method and
must still support the "jndi:" or some other proprietary URL schema.


(*) for reference: TTL of entries in ProxyDirContext#cache is by
default 5 seconds (5000). If the time has elapsed, the resource is
revalidated by comparing its timestamp with original one  The TTL is
configured via ProxyDirContext constructor <- BaseDirContext#cacheTTL
<- StandardContext#cacheTTL and thus is configurable on a <Context>.
http://tomcat.apache.org/tomcat-7.0-doc/config/context.html

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: [Proposal] Preparatory refactoring for Resource handling refactoring

Posted by Mark Thomas <ma...@apache.org>.
On 17/08/2012 16:50, Konstantin Kolinko wrote:
> (regarding DefaultServlet)
>>> If it were using the servlet API only, the code were more reusable,
>>> both inside and outside of Tomcat. There must be specific issue why
>>> Servlet API is not used there. Is the API lacking?
>>
>> In lot of ways. For example, where is the API to create or delete
>> resources? That is deliberately left as a container implementation detail.
>>
> 
> OK.  One way is to use getRealPath() + manipulate files on the file
> system directly. I agree that using the API (be it DirContext or
> something else) is a bit more nice.

getRealPath() won't work for read/write storage that isn't backed by a
file system (e.g. database backed)

It also gets rather expensive when you have to deal with static
resources from multiple sources. You end up needing a several calls to
work out a) if the resource exists, b) if it can be put / deleted
depending on whether the resource is from a resource JAR, collides with
one from a resource JAR etc. Not impossible, but you need quite a few
calls to get it right. This gets more complicated when you add overlays
to the mix.

>> The jndi layer has caused performance problems as well. The special
>> handling for JARs is a direct result of that. The new approach will
>> simplify a lot of that and hopefully improve performance as well. Like
>> we have with the current approach, if there are specific areas causing
>> problems, we can take a look. I hope and expect that far fewer of these
>> 'tweaks' would be required with the new implementation.
>>
>>>>> One good thing with "jndi:" URLs returned via Servlet API is that they
>>>>> are backed by an instance of ProxyDirContext class and it has a cache
>>>>> (*).  If we change implementation and return "true" URLs, they will
>>>>> bypass the cache.  I suspect that using a "jar:" URL directly (in case
>>>>> of an unpacked WAR file) may have poor performance.
>>>>
>>>> The new Resources implementation will include caching where necessary.
>>>> At the moment there is none. I intend to add it as required. I agree
>>>> JARs/WARs are likely to need it to have performance that is comparable
>>>> with expanded WARs.
>>>>
> 
> Regarding this caching:
> 1. Currently calling getInputStream() on a jndi URL may return
> resource directly from memory (cached by ProxyDirContext).
> 
> If a file URL is used that would involve direct reading from hard
> drive.  If a jar URL is used - I wonder whether opening of the JAR
> file by the system had to be performed regardless of independent
> access to the same file.
> 
> One may argue though that there is
> ServletContext.getResourceAsStream() method available.
> 
> 2. If I query some attributes of the resource via
> getResource().openConnection().getHeaderField(),  for jndi URLs they
> are returned from a cache. I wonder what performance will be there
> with file and jar URLs.

I remain to be convinced how much caching is required. My preference is
to add a very simple (possibly even NO-OP) caching layer that we can
extend as necessary to address actual performance issues rather than
guessing where they might occur.

>> I can put a patch of the changes so far on people.a.o if you are
>> interested (note: it won't compile yet).
>>
> 
> I would like to look and review.  I think that at the current point
> this refactoring should not be done directly on trunk without some
> review and consensus first.

I've almost got something working. I'll see if I can finish it off
first. Either way I'll aim to post something this coming weekend.

> Maybe create a branch for this work (create a "tomcat/branches/"
> directory, or use existing "tomcat/sandbox")?

I'm happy with that if required. I suggest we hold of on the how until
folks have had a chance to look at the patch first.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: [Proposal] Preparatory refactoring for Resource handling refactoring

Posted by Konstantin Kolinko <kn...@gmail.com>.
(regarding DefaultServlet)
>> If it were using the servlet API only, the code were more reusable,
>> both inside and outside of Tomcat. There must be specific issue why
>> Servlet API is not used there. Is the API lacking?
>
> In lot of ways. For example, where is the API to create or delete
> resources? That is deliberately left as a container implementation detail.
>

OK.  One way is to use getRealPath() + manipulate files on the file
system directly. I agree that using the API (be it DirContext or
something else) is a bit more nice.

>
> The jndi layer has caused performance problems as well. The special
> handling for JARs is a direct result of that. The new approach will
> simplify a lot of that and hopefully improve performance as well. Like
> we have with the current approach, if there are specific areas causing
> problems, we can take a look. I hope and expect that far fewer of these
> 'tweaks' would be required with the new implementation.
>
>>>> One good thing with "jndi:" URLs returned via Servlet API is that they
>>>> are backed by an instance of ProxyDirContext class and it has a cache
>>>> (*).  If we change implementation and return "true" URLs, they will
>>>> bypass the cache.  I suspect that using a "jar:" URL directly (in case
>>>> of an unpacked WAR file) may have poor performance.
>>>
>>> The new Resources implementation will include caching where necessary.
>>> At the moment there is none. I intend to add it as required. I agree
>>> JARs/WARs are likely to need it to have performance that is comparable
>>> with expanded WARs.
>>>

Regarding this caching:
1. Currently calling getInputStream() on a jndi URL may return
resource directly from memory (cached by ProxyDirContext).

If a file URL is used that would involve direct reading from hard
drive.  If a jar URL is used - I wonder whether opening of the JAR
file by the system had to be performed regardless of independent
access to the same file.

One may argue though that there is
ServletContext.getResourceAsStream() method available.

2. If I query some attributes of the resource via
getResource().openConnection().getHeaderField(),  for jndi URLs they
are returned from a cache. I wonder what performance will be there
with file and jar URLs.

>
> I can put a patch of the changes so far on people.a.o if you are
> interested (note: it won't compile yet).
>

I would like to look and review.  I think that at the current point
this refactoring should not be done directly on trunk without some
review and consensus first.

Maybe create a branch for this work (create a "tomcat/branches/"
directory, or use existing "tomcat/sandbox")?


Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: [Proposal] Preparatory refactoring for Resource handling refactoring

Posted by Mark Thomas <ma...@apache.org>.
On 30/07/2012 15:48, Konstantin Kolinko wrote:
> 2012/7/30 Mark Thomas <ma...@apache.org>:
>> On 30/07/2012 01:16, Konstantin Kolinko wrote:
>>> If we remove JNDI stuff from resource handling,  one of "challenges"
>>> might be to re-implement DefaultServlet using only Servlet API
>>> methods. Well, if the former is not possible, it might use the new
>>> resources API (that you are going to implement instead of jndi one)
>>> and be thus still tied with Tomcat internals...
>>
>> This is a non-issue in my view. The current DefaultServlet uses Tomcat
>> internals extensively. The new implementation will to.
>>
> 
> I am just saying that it is feels unfair.

It is no better or worse than the current approach in that respect.

> If it were using the servlet API only, the code were more reusable,
> both inside and outside of Tomcat. There must be specific issue why
> Servlet API is not used there. Is the API lacking?

In lot of ways. For example, where is the API to create or delete
resources? That is deliberately left as a container implementation detail.

> Well, bypassing
> the API we can get hands on simpler objects and waste less time
> processing them.

There is some of that, but a lot of it is functionality that we have to
implement.

> If it is not a replacement for the current implementation, it might be
> a new sample servlet for the examples webapp.
> 
>>> If one reimplements DefaultServlet, one of the tasks would be to
>>> generate directory listings. Those include file size and file
>>> timestamp. One needs to obtain URL of a resource, open its
>>> URLConnection and ask those attributes.
>>
>> Only if doing so entirely via the Servlet API which we don't need to do.
>>
> 
> Others will use Servlet API, and dropping performance is a bad option.

The jndi layer has caused performance problems as well. The special
handling for JARs is a direct result of that. The new approach will
simplify a lot of that and hopefully improve performance as well. Like
we have with the current approach, if there are specific areas causing
problems, we can take a look. I hope and expect that far fewer of these
'tweaks' would be required with the new implementation.

>>> One good thing with "jndi:" URLs returned via Servlet API is that they
>>> are backed by an instance of ProxyDirContext class and it has a cache
>>> (*).  If we change implementation and return "true" URLs, they will
>>> bypass the cache.  I suspect that using a "jar:" URL directly (in case
>>> of an unpacked WAR file) may have poor performance.
>>
>> The new Resources implementation will include caching where necessary.
>> At the moment there is none. I intend to add it as required. I agree
>> JARs/WARs are likely to need it to have performance that is comparable
>> with expanded WARs.
>>
>>> Other good thing is that you can create relative URLs as "new URL(Url,
>>> String)", which inherits URLStreamHandler instance from the original
>>> URL, and thus inherits access to ProxyDirContext instance.  If it is a
>>> "jndi" URL it will have access to the full resources hierarchy of the
>>> web application.  If it is a "true" URL, it will be limited to its
>>> origin file system.
>>
>> That is true, but why is that necessary? Is it a specification
>> requirement? I'm not aware of it. The canonical identifier is the path
>> of the resource within the app, not the URL returned from getResource()
>>
> 
> It is an existing feature. Removing a feature is bad and needs a good reason.

The good reason is cleaning up the code, making the 'overlay' feature in
Servlet 3.1 possible. You only have to look at the catalogue of problems
the VirtualClassLoader and VirtualDirContext have introduced to imagine
how difficult it would be to implement 'overlays' with the current code
base. The refactoring makes things a lot, lot cleaner.

> BTW, it you wanna take a look at a use case of it, I noticed one place
> yesterday.

That code is a direct result of a jndi URL being returned. No jndi URLs,
no need for that code.

The ContextConfig is one of the few parts of the code base I still need
to refactor. The other part that is left is the Mapper. The rest is
complete but untested. I'd like to have something that at least passes
the TCK before committing it.

I can put a patch of the changes so far on people.a.o if you are
interested (note: it won't compile yet).

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: [Proposal] Preparatory refactoring for Resource handling refactoring

Posted by Konstantin Kolinko <kn...@gmail.com>.
2012/7/30 Mark Thomas <ma...@apache.org>:
> On 30/07/2012 01:16, Konstantin Kolinko wrote:
>> If we remove JNDI stuff from resource handling,  one of "challenges"
>> might be to re-implement DefaultServlet using only Servlet API
>> methods. Well, if the former is not possible, it might use the new
>> resources API (that you are going to implement instead of jndi one)
>> and be thus still tied with Tomcat internals...
>
> This is a non-issue in my view. The current DefaultServlet uses Tomcat
> internals extensively. The new implementation will to.
>

I am just saying that it is feels unfair.

If it were using the servlet API only, the code were more reusable,
both inside and outside of Tomcat. There must be specific issue why
Servlet API is not used there. Is the API lacking?  Well, bypassing
the API we can get hands on simpler objects and waste less time
processing them.

If it is not a replacement for the current implementation, it might be
a new sample servlet for the examples webapp.

>> If one reimplements DefaultServlet, one of the tasks would be to
>> generate directory listings. Those include file size and file
>> timestamp. One needs to obtain URL of a resource, open its
>> URLConnection and ask those attributes.
>
> Only if doing so entirely via the Servlet API which we don't need to do.
>

Others will use Servlet API, and dropping performance is a bad option.


>> One good thing with "jndi:" URLs returned via Servlet API is that they
>> are backed by an instance of ProxyDirContext class and it has a cache
>> (*).  If we change implementation and return "true" URLs, they will
>> bypass the cache.  I suspect that using a "jar:" URL directly (in case
>> of an unpacked WAR file) may have poor performance.
>
> The new Resources implementation will include caching where necessary.
> At the moment there is none. I intend to add it as required. I agree
> JARs/WARs are likely to need it to have performance that is comparable
> with expanded WARs.
>
>> Other good thing is that you can create relative URLs as "new URL(Url,
>> String)", which inherits URLStreamHandler instance from the original
>> URL, and thus inherits access to ProxyDirContext instance.  If it is a
>> "jndi" URL it will have access to the full resources hierarchy of the
>> web application.  If it is a "true" URL, it will be limited to its
>> origin file system.
>
> That is true, but why is that necessary? Is it a specification
> requirement? I'm not aware of it. The canonical identifier is the path
> of the resource within the app, not the URL returned from getResource()
>

It is an existing feature. Removing a feature is bad and needs a good reason.

BTW, it you wanna take a look at a use case of it, I noticed one place
yesterday.

In ContextConfig.processAnnotationsJndi(...) there is code
[[[
                Enumeration<String> dirs = dcUrlConn.list();
                while (dirs.hasMoreElements()) {
                    String dir = dirs.nextElement();
                    URL dirUrl = new URL(url.toString() + '/' + dir);
                    processAnnotationsJndi(dirUrl, fragment, handlesTypesOnly);
                }
]]]

It creates a relative URL there.

If it were using the constructor for relative URLs,  new URL(url,
dir), it would have better performance, because URLStreamHandler were
inherited from the old URL.  A question that needs testing before
making a change there, though, is whether the old URL ends with "/"
(and thus is a directory). The code behaves as if it did not have the
trailing "/".


>> The above two are the reasons why I think that in Tomcat 8 we cannot
>> return "true" URLs from ServletContext.getResource(String) method and
>> must still support the "jndi:" or some other proprietary URL schema.
>
> I agree that the second issue would require a custom URL scheme if it
> were a requirement but I am not aware of anything that states that it is.
>

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: [Proposal] Preparatory refactoring for Resource handling refactoring

Posted by Mark Thomas <ma...@apache.org>.
On 30/07/2012 01:16, Konstantin Kolinko wrote:
> If we remove JNDI stuff from resource handling,  one of "challenges"
> might be to re-implement DefaultServlet using only Servlet API
> methods. Well, if the former is not possible, it might use the new
> resources API (that you are going to implement instead of jndi one)
> and be thus still tied with Tomcat internals...

This is a non-issue in my view. The current DefaultServlet uses Tomcat
internals extensively. The new implementation will to.

> If one reimplements DefaultServlet, one of the tasks would be to
> generate directory listings. Those include file size and file
> timestamp. One needs to obtain URL of a resource, open its
> URLConnection and ask those attributes.

Only if doing so entirely via the Servlet API which we don't need to do.

> One good thing with "jndi:" URLs returned via Servlet API is that they
> are backed by an instance of ProxyDirContext class and it has a cache
> (*).  If we change implementation and return "true" URLs, they will
> bypass the cache.  I suspect that using a "jar:" URL directly (in case
> of an unpacked WAR file) may have poor performance.

The new Resources implementation will include caching where necessary.
At the moment there is none. I intend to add it as required. I agree
JARs/WARs are likely to need it to have performance that is comparable
with expanded WARs.

> Other good thing is that you can create relative URLs as "new URL(Url,
> String)", which inherits URLStreamHandler instance from the original
> URL, and thus inherits access to ProxyDirContext instance.  If it is a
> "jndi" URL it will have access to the full resources hierarchy of the
> web application.  If it is a "true" URL, it will be limited to its
> origin file system.

That is true, but why is that necessary? Is it a specification
requirement? I'm not aware of it. The canonical identifier is the path
of the resource within the app, not the URL returned from getResource()

> The above two are the reasons why I think that in Tomcat 8 we cannot
> return "true" URLs from ServletContext.getResource(String) method and
> must still support the "jndi:" or some other proprietary URL schema.

I agree that the second issue would require a custom URL scheme if it
were a requirement but I am not aware of anything that states that it is.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org