You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Regis <xu...@gmail.com> on 2009/05/07 10:38:07 UTC
[classlib] add cache for file's canonical path
Hi,
Recently I'm trying to optimize implementation of java.File, and find
File.getCanonicalPath is very time cost and is heavily used by checking file
permission. But in the most of cases, the canonical path is never change, so I
think it's better to cache them to avoid calculate every time. And I found RI
also has cache: File.getCanonicalPath doesn't reflect change in file system
real-time.
I have created a JIRA HARMONY-6200 for this, and attach a simple cache
implementation for file's canonical path, someone interested in this may can
help to review it.
The cache has fixed size, when it's full oldest element would be remove, and
elements would be expired in 10 minutes, that mean if the element stay in cache
more than 10 minutes, it's not validate and will be removed.
It's a initial implementation, I believe there are still many places can be
improved, any comments and suggestions are welcome.
--
Best Regards,
Regis.
Re: [classlib] add cache for file's canonical path
Posted by Regis <xu...@gmail.com>.
Regis wrote:
> Tim Ellison wrote:
>> Sorry for the delay in responding ...
>>
>> On 06/Jul/2009 06:42, Regis wrote:
>>> MapMaker used a j.u.Timer, when put element to map, schedule a new
>>> j.u.TimerTask
>>> according to expiration time. For me, using timer can save the extra
>>> LinkedList, but a little complicated, since it need to start a timer
>>> thread and management of TimerTask is more complicated than LinkedList.
>>
>> I agree that using a j.u.Timer is too heavyweight for this case. Since
>> we don't care about being too exact on the expiry of items from the
>> cache we can simply charge a 'tax' on each look-up to see if an item is
>> ready to be removed.
>>
>> Anyway, it would be good to put this in and get the speed up.
>>
>> Regards,
>> Tim
>>
>>
>
> Agree.
>
> I'd like to commit the patch in JIRA if no one objects.
>
Committed at r794945.
--
Best Regards,
Regis.
Re: [classlib] add cache for file's canonical path
Posted by Regis <xu...@gmail.com>.
Tim Ellison wrote:
> Sorry for the delay in responding ...
>
> On 06/Jul/2009 06:42, Regis wrote:
>> MapMaker used a j.u.Timer, when put element to map, schedule a new
>> j.u.TimerTask
>> according to expiration time. For me, using timer can save the extra
>> LinkedList, but a little complicated, since it need to start a timer
>> thread and management of TimerTask is more complicated than LinkedList.
>
> I agree that using a j.u.Timer is too heavyweight for this case. Since
> we don't care about being too exact on the expiry of items from the
> cache we can simply charge a 'tax' on each look-up to see if an item is
> ready to be removed.
>
> Anyway, it would be good to put this in and get the speed up.
>
> Regards,
> Tim
>
>
Agree.
I'd like to commit the patch in JIRA if no one objects.
--
Best Regards,
Regis.
Re: [classlib] add cache for file's canonical path
Posted by Tim Ellison <t....@gmail.com>.
Sorry for the delay in responding ...
On 06/Jul/2009 06:42, Regis wrote:
> MapMaker used a j.u.Timer, when put element to map, schedule a new
> j.u.TimerTask
> according to expiration time. For me, using timer can save the extra
> LinkedList, but a little complicated, since it need to start a timer
> thread and management of TimerTask is more complicated than LinkedList.
I agree that using a j.u.Timer is too heavyweight for this case. Since
we don't care about being too exact on the expiry of items from the
cache we can simply charge a 'tax' on each look-up to see if an item is
ready to be removed.
Anyway, it would be good to put this in and get the speed up.
Regards,
Tim
Re: [classlib] add cache for file's canonical path
Posted by Regis <xu...@gmail.com>.
Regis wrote:
> Tim Ellison wrote:
>> Regis wrote:
>>> Tim Ellison wrote:
>>>> Regis wrote:
>>>>> Recently I'm trying to optimize implementation of java.File,
>>>> Cool.
>>>>
>>>>> and find File.getCanonicalPath is very time cost and is heavily used
>>>>> by checking file permission. But in the most of cases, the canonical
>>>>> path is never change, so I think it's better to cache them to avoid
>>>>> calculate every time. And I found RI also has cache:
>>>>> File.getCanonicalPath doesn't reflect change in file system
>>>>> real-time.
>>>> Hmm, sounds risky to me -- at least liable to create unpredictable
>>>> behavior.
>>> The file's canonical path changed rarely, on Linux, only symbol link in
>>> the path changed, can cause canonical path change; and on Windows, only
>>> when path contains short name, the canonical path can be changed.
>>> Maybe we can choose to not cache paths which are relatively "volatile",
>>> such as path contained short name on Windows?
>>>
>>>>> I have created a JIRA HARMONY-6200 for this, and attach a simple cache
>>>>> implementation for file's canonical path, someone interested in
>>>>> this may
>>>>> can help to review it.
>>>>>
>>>>> The cache has fixed size, when it's full oldest element would be
>>>>> remove,
>>>>> and elements would be expired in 10 minutes, that mean if the element
>>>>> stay in cache more than 10 minutes, it's not validate and will be
>>>>> removed.
>>>> Ten minutes seems like a very long time. I would have thought that
>>>> having a cache that expires in a relatively short time (since that file
>>>> path was last accessed) would be sufficient.
>>> So how about one minute?
>>> And I prefer to define expiring time as time since that file path added
>>> to the cache, because there may be some hot paths accessed frequently,
>>> if expiring time is count from last accessed, they may stay in cache too
>>> long and have no chance to update from file system.
>>>
>>>>> It's a initial implementation, I believe there are still many
>>>>> places can
>>>>> be improved, any comments and suggestions are welcome.
>>>> My initial thought is that it would be preferable to have the OS tell
>>>> you when a file path changes so you can invalidate the cache, but there
>>>> may not be enough info to be able to do that...
>>
>> Now that M10 is out the way, it may be time to revisit this idea.
>>
>> WDYT?
>>
>> Tim
>>
>
> Hi Tim,
>
> I see your comment on JIRA, it's a good suggestion, I'll take a look at
> MapMaker of Google-collections to find if there are anything we can
> learn from it.
>
> A file's canonical path is changed rarely, so in the almost time, it's
> safe to cache them. Now, the cache implementation used a map to cache
> path and a extra LinkList to keep all cached element in time order. I'm
> thinking are there any ways to do this in one data structure?
>
MapMaker used a j.u.Timer, when put element to map, schedule a new j.u.TimerTask
according to expiration time. For me, using timer can save the extra LinkedList,
but a little complicated, since it need to start a timer thread and management
of TimerTask is more complicated than LinkedList.
--
Best Regards,
Regis.
Re: [classlib] add cache for file's canonical path
Posted by Regis <xu...@gmail.com>.
Tim Ellison wrote:
> Regis wrote:
>> Tim Ellison wrote:
>>> Regis wrote:
>>>> Recently I'm trying to optimize implementation of java.File,
>>> Cool.
>>>
>>>> and find File.getCanonicalPath is very time cost and is heavily used
>>>> by checking file permission. But in the most of cases, the canonical
>>>> path is never change, so I think it's better to cache them to avoid
>>>> calculate every time. And I found RI also has cache:
>>>> File.getCanonicalPath doesn't reflect change in file system
>>>> real-time.
>>> Hmm, sounds risky to me -- at least liable to create unpredictable
>>> behavior.
>> The file's canonical path changed rarely, on Linux, only symbol link in
>> the path changed, can cause canonical path change; and on Windows, only
>> when path contains short name, the canonical path can be changed.
>> Maybe we can choose to not cache paths which are relatively "volatile",
>> such as path contained short name on Windows?
>>
>>>> I have created a JIRA HARMONY-6200 for this, and attach a simple cache
>>>> implementation for file's canonical path, someone interested in this may
>>>> can help to review it.
>>>>
>>>> The cache has fixed size, when it's full oldest element would be remove,
>>>> and elements would be expired in 10 minutes, that mean if the element
>>>> stay in cache more than 10 minutes, it's not validate and will be
>>>> removed.
>>> Ten minutes seems like a very long time. I would have thought that
>>> having a cache that expires in a relatively short time (since that file
>>> path was last accessed) would be sufficient.
>> So how about one minute?
>> And I prefer to define expiring time as time since that file path added
>> to the cache, because there may be some hot paths accessed frequently,
>> if expiring time is count from last accessed, they may stay in cache too
>> long and have no chance to update from file system.
>>
>>>> It's a initial implementation, I believe there are still many places can
>>>> be improved, any comments and suggestions are welcome.
>>> My initial thought is that it would be preferable to have the OS tell
>>> you when a file path changes so you can invalidate the cache, but there
>>> may not be enough info to be able to do that...
>
> Now that M10 is out the way, it may be time to revisit this idea.
>
> WDYT?
>
> Tim
>
Hi Tim,
I see your comment on JIRA, it's a good suggestion, I'll take a look at MapMaker
of Google-collections to find if there are anything we can learn from it.
A file's canonical path is changed rarely, so in the almost time, it's safe to
cache them. Now, the cache implementation used a map to cache path and a extra
LinkList to keep all cached element in time order. I'm thinking are there any
ways to do this in one data structure?
--
Best Regards,
Regis.
Re: [classlib] add cache for file's canonical path
Posted by Tim Ellison <t....@gmail.com>.
Regis wrote:
> Tim Ellison wrote:
>> Regis wrote:
>>> Recently I'm trying to optimize implementation of java.File,
>>
>> Cool.
>>
>>> and find File.getCanonicalPath is very time cost and is heavily used
>>> by checking file permission. But in the most of cases, the canonical
>>> path is never change, so I think it's better to cache them to avoid
>>> calculate every time. And I found RI also has cache:
>>> File.getCanonicalPath doesn't reflect change in file system
>>> real-time.
>>
>> Hmm, sounds risky to me -- at least liable to create unpredictable
>> behavior.
>
> The file's canonical path changed rarely, on Linux, only symbol link in
> the path changed, can cause canonical path change; and on Windows, only
> when path contains short name, the canonical path can be changed.
> Maybe we can choose to not cache paths which are relatively "volatile",
> such as path contained short name on Windows?
>
>>
>>> I have created a JIRA HARMONY-6200 for this, and attach a simple cache
>>> implementation for file's canonical path, someone interested in this may
>>> can help to review it.
>>>
>>> The cache has fixed size, when it's full oldest element would be remove,
>>> and elements would be expired in 10 minutes, that mean if the element
>>> stay in cache more than 10 minutes, it's not validate and will be
>>> removed.
>>
>> Ten minutes seems like a very long time. I would have thought that
>> having a cache that expires in a relatively short time (since that file
>> path was last accessed) would be sufficient.
>
> So how about one minute?
> And I prefer to define expiring time as time since that file path added
> to the cache, because there may be some hot paths accessed frequently,
> if expiring time is count from last accessed, they may stay in cache too
> long and have no chance to update from file system.
>
>>
>>> It's a initial implementation, I believe there are still many places can
>>> be improved, any comments and suggestions are welcome.
>>
>> My initial thought is that it would be preferable to have the OS tell
>> you when a file path changes so you can invalidate the cache, but there
>> may not be enough info to be able to do that...
Now that M10 is out the way, it may be time to revisit this idea.
WDYT?
Tim
Re: [classlib] add cache for file's canonical path
Posted by Regis <xu...@gmail.com>.
Tim Ellison wrote:
> Regis wrote:
>> Recently I'm trying to optimize implementation of java.File,
>
> Cool.
>
>> and find File.getCanonicalPath is very time cost and is heavily used
>> by checking file permission. But in the most of cases, the canonical
>> path is never change, so I think it's better to cache them to avoid
>> calculate every time. And I found RI also has cache:
>> File.getCanonicalPath doesn't reflect change in file system
>> real-time.
>
> Hmm, sounds risky to me -- at least liable to create unpredictable behavior.
The file's canonical path changed rarely, on Linux, only symbol link in the path
changed, can cause canonical path change; and on Windows, only when path
contains short name, the canonical path can be changed.
Maybe we can choose to not cache paths which are relatively "volatile", such as
path contained short name on Windows?
>
>> I have created a JIRA HARMONY-6200 for this, and attach a simple cache
>> implementation for file's canonical path, someone interested in this may
>> can help to review it.
>>
>> The cache has fixed size, when it's full oldest element would be remove,
>> and elements would be expired in 10 minutes, that mean if the element
>> stay in cache more than 10 minutes, it's not validate and will be removed.
>
> Ten minutes seems like a very long time. I would have thought that
> having a cache that expires in a relatively short time (since that file
> path was last accessed) would be sufficient.
So how about one minute?
And I prefer to define expiring time as time since that file path added to the
cache, because there may be some hot paths accessed frequently, if expiring time
is count from last accessed, they may stay in cache too long and have no chance
to update from file system.
>
>> It's a initial implementation, I believe there are still many places can
>> be improved, any comments and suggestions are welcome.
>
> My initial thought is that it would be preferable to have the OS tell
> you when a file path changes so you can invalidate the cache, but there
> may not be enough info to be able to do that...
>
> Regards,
> Tim
>
>
--
Best Regards,
Regis.
Re: [classlib] add cache for file's canonical path
Posted by Leo Li <li...@gmail.com>.
On Thu, Jul 2, 2009 at 4:53 PM, Yang Paulex <pa...@gmail.com> wrote:
> 2009/5/8 Tim Ellison <t....@gmail.com>
>
> >
> > [snip]
> >
>
>
> >
> > My initial thought is that it would be preferable to have the OS tell
> > you when a file path changes so you can invalidate the cache, but there
> > may not be enough info to be able to do that...
>
>
> It may be overkill, but Linux has a system call - inotify to send file
> change notification to registered directory.
>
I have a straightforward approach. Maybe just kidding, hehe.
If we can assume that most of file pathes put to check access has been
cannonicalized, we can first directly use them but the cannonical one as a
failover .
>
>
> Regards,
> Tim
>
>
--
Good luck!
Leo Li
Re: [classlib] add cache for file's canonical path
Posted by Yang Paulex <pa...@gmail.com>.
2009/5/8 Tim Ellison <t....@gmail.com>
>
> [snip]
>
>
> My initial thought is that it would be preferable to have the OS tell
> you when a file path changes so you can invalidate the cache, but there
> may not be enough info to be able to do that...
It may be overkill, but Linux has a system call - inotify to send file
change notification to registered directory.
>
>
> Regards,
> Tim
>
>
Re: [classlib] add cache for file's canonical path
Posted by Tim Ellison <t....@gmail.com>.
Regis wrote:
> Recently I'm trying to optimize implementation of java.File,
Cool.
> and find File.getCanonicalPath is very time cost and is heavily used
> by checking file permission. But in the most of cases, the canonical
> path is never change, so I think it's better to cache them to avoid
> calculate every time. And I found RI also has cache:
> File.getCanonicalPath doesn't reflect change in file system
> real-time.
Hmm, sounds risky to me -- at least liable to create unpredictable behavior.
> I have created a JIRA HARMONY-6200 for this, and attach a simple cache
> implementation for file's canonical path, someone interested in this may
> can help to review it.
>
> The cache has fixed size, when it's full oldest element would be remove,
> and elements would be expired in 10 minutes, that mean if the element
> stay in cache more than 10 minutes, it's not validate and will be removed.
Ten minutes seems like a very long time. I would have thought that
having a cache that expires in a relatively short time (since that file
path was last accessed) would be sufficient.
> It's a initial implementation, I believe there are still many places can
> be improved, any comments and suggestions are welcome.
My initial thought is that it would be preferable to have the OS tell
you when a file path changes so you can invalidate the cache, but there
may not be enough info to be able to do that...
Regards,
Tim