You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Regis <xu...@gmail.com> on 2009/05/07 10:38:07 UTC

[classlib] add cache for file's canonical path

Hi,

Recently I'm trying to optimize implementation of java.File, and find 
File.getCanonicalPath is very time cost and is heavily used by checking file 
permission. But in the most of cases, the canonical path is never change, so I 
think it's better to cache them to avoid calculate every time. And I found RI 
also has cache: File.getCanonicalPath doesn't reflect change in file system 
real-time.

I have created a JIRA HARMONY-6200 for this, and attach a simple cache 
implementation for file's canonical path, someone interested in this may can 
help to review it.

The cache has fixed size, when it's full oldest element would be remove, and 
elements would be expired in 10 minutes, that mean if the element stay in cache 
more than 10 minutes, it's not validate and will be removed.

It's a initial implementation, I believe there are still many places can be 
improved, any comments and suggestions are welcome.

-- 
Best Regards,
Regis.

Re: [classlib] add cache for file's canonical path

Posted by Regis <xu...@gmail.com>.
Regis wrote:
> Tim Ellison wrote:
>> Sorry for the delay in responding ...
>>
>> On 06/Jul/2009 06:42, Regis wrote:
>>> MapMaker used a j.u.Timer, when put element to map, schedule a new
>>> j.u.TimerTask
>>> according to expiration time. For me, using timer can save the extra
>>> LinkedList, but a little complicated, since it need to start a timer
>>> thread and management of TimerTask is more complicated than LinkedList.
>>
>> I agree that using a j.u.Timer is too heavyweight for this case.  Since
>> we don't care about being too exact on the expiry of items from the
>> cache we can simply charge a 'tax' on each look-up to see if an item is
>> ready to be removed.
>>
>> Anyway, it would be good to put this in and get the speed up.
>>
>> Regards,
>> Tim
>>
>>
> 
> Agree.
> 
> I'd like to commit the patch in JIRA if no one objects.
> 

Committed at r794945.

-- 
Best Regards,
Regis.

Re: [classlib] add cache for file's canonical path

Posted by Regis <xu...@gmail.com>.
Tim Ellison wrote:
> Sorry for the delay in responding ...
> 
> On 06/Jul/2009 06:42, Regis wrote:
>> MapMaker used a j.u.Timer, when put element to map, schedule a new
>> j.u.TimerTask
>> according to expiration time. For me, using timer can save the extra
>> LinkedList, but a little complicated, since it need to start a timer
>> thread and management of TimerTask is more complicated than LinkedList.
> 
> I agree that using a j.u.Timer is too heavyweight for this case.  Since
> we don't care about being too exact on the expiry of items from the
> cache we can simply charge a 'tax' on each look-up to see if an item is
> ready to be removed.
> 
> Anyway, it would be good to put this in and get the speed up.
> 
> Regards,
> Tim
> 
> 

Agree.

I'd like to commit the patch in JIRA if no one objects.

-- 
Best Regards,
Regis.

Re: [classlib] add cache for file's canonical path

Posted by Tim Ellison <t....@gmail.com>.
Sorry for the delay in responding ...

On 06/Jul/2009 06:42, Regis wrote:
> MapMaker used a j.u.Timer, when put element to map, schedule a new
> j.u.TimerTask
> according to expiration time. For me, using timer can save the extra
> LinkedList, but a little complicated, since it need to start a timer
> thread and management of TimerTask is more complicated than LinkedList.

I agree that using a j.u.Timer is too heavyweight for this case.  Since
we don't care about being too exact on the expiry of items from the
cache we can simply charge a 'tax' on each look-up to see if an item is
ready to be removed.

Anyway, it would be good to put this in and get the speed up.

Regards,
Tim


Re: [classlib] add cache for file's canonical path

Posted by Regis <xu...@gmail.com>.
Regis wrote:
> Tim Ellison wrote:
>> Regis wrote:
>>> Tim Ellison wrote:
>>>> Regis wrote:
>>>>> Recently I'm trying to optimize implementation of java.File,
>>>> Cool.
>>>>
>>>>> and find File.getCanonicalPath is very time cost and is heavily used
>>>>> by checking file permission. But in the most of cases, the canonical
>>>>> path is never change, so I think it's better to cache them to avoid
>>>>> calculate every time. And I found RI also has cache:
>>>>> File.getCanonicalPath doesn't reflect change in file system
>>>>> real-time.
>>>> Hmm, sounds risky to me -- at least liable to create unpredictable
>>>> behavior.
>>> The file's canonical path changed rarely, on Linux, only symbol link in
>>> the path changed, can cause canonical path change; and on Windows, only
>>> when path contains short name, the canonical path can be changed.
>>> Maybe we can choose to not cache paths which are relatively "volatile",
>>> such as path contained short name on Windows?
>>>
>>>>> I have created a JIRA HARMONY-6200 for this, and attach a simple cache
>>>>> implementation for file's canonical path, someone interested in 
>>>>> this may
>>>>> can help to review it.
>>>>>
>>>>> The cache has fixed size, when it's full oldest element would be 
>>>>> remove,
>>>>> and elements would be expired in 10 minutes, that mean if the element
>>>>> stay in cache more than 10 minutes, it's not validate and will be
>>>>> removed.
>>>> Ten minutes seems like a very long time.  I would have thought that
>>>> having a cache that expires in a relatively short time (since that file
>>>> path was last accessed) would be sufficient.
>>> So how about one minute?
>>> And I prefer to define expiring time as time since that file path added
>>> to the cache, because there may be some hot paths accessed frequently,
>>> if expiring time is count from last accessed, they may stay in cache too
>>> long and have no chance to update from file system.
>>>
>>>>> It's a initial implementation, I believe there are still many 
>>>>> places can
>>>>> be improved, any comments and suggestions are welcome.
>>>> My initial thought is that it would be preferable to have the OS tell
>>>> you when a file path changes so you can invalidate the cache, but there
>>>> may not be enough info to be able to do that...
>>
>> Now that M10 is out the way, it may be time to revisit this idea.
>>
>> WDYT?
>>
>> Tim
>>
> 
> Hi Tim,
> 
> I see your comment on JIRA, it's a good suggestion, I'll take a look at 
> MapMaker  of Google-collections to find if there are anything we can 
> learn from it.
> 
> A file's canonical path is changed rarely, so in the almost time, it's 
> safe to cache them. Now, the cache implementation used a map to cache 
> path and a extra LinkList to keep all cached element in time order. I'm 
> thinking are there any ways to do this in one data structure?
> 

MapMaker used a j.u.Timer, when put element to map, schedule a new j.u.TimerTask
according to expiration time. For me, using timer can save the extra LinkedList, 
but a little complicated, since it need to start a timer thread and management 
of TimerTask is more complicated than LinkedList.

-- 
Best Regards,
Regis.

Re: [classlib] add cache for file's canonical path

Posted by Regis <xu...@gmail.com>.
Tim Ellison wrote:
> Regis wrote:
>> Tim Ellison wrote:
>>> Regis wrote:
>>>> Recently I'm trying to optimize implementation of java.File,
>>> Cool.
>>>
>>>> and find File.getCanonicalPath is very time cost and is heavily used
>>>> by checking file permission. But in the most of cases, the canonical
>>>> path is never change, so I think it's better to cache them to avoid
>>>> calculate every time. And I found RI also has cache:
>>>> File.getCanonicalPath doesn't reflect change in file system
>>>> real-time.
>>> Hmm, sounds risky to me -- at least liable to create unpredictable
>>> behavior.
>> The file's canonical path changed rarely, on Linux, only symbol link in
>> the path changed, can cause canonical path change; and on Windows, only
>> when path contains short name, the canonical path can be changed.
>> Maybe we can choose to not cache paths which are relatively "volatile",
>> such as path contained short name on Windows?
>>
>>>> I have created a JIRA HARMONY-6200 for this, and attach a simple cache
>>>> implementation for file's canonical path, someone interested in this may
>>>> can help to review it.
>>>>
>>>> The cache has fixed size, when it's full oldest element would be remove,
>>>> and elements would be expired in 10 minutes, that mean if the element
>>>> stay in cache more than 10 minutes, it's not validate and will be
>>>> removed.
>>> Ten minutes seems like a very long time.  I would have thought that
>>> having a cache that expires in a relatively short time (since that file
>>> path was last accessed) would be sufficient.
>> So how about one minute?
>> And I prefer to define expiring time as time since that file path added
>> to the cache, because there may be some hot paths accessed frequently,
>> if expiring time is count from last accessed, they may stay in cache too
>> long and have no chance to update from file system.
>>
>>>> It's a initial implementation, I believe there are still many places can
>>>> be improved, any comments and suggestions are welcome.
>>> My initial thought is that it would be preferable to have the OS tell
>>> you when a file path changes so you can invalidate the cache, but there
>>> may not be enough info to be able to do that...
> 
> Now that M10 is out the way, it may be time to revisit this idea.
> 
> WDYT?
> 
> Tim
> 

Hi Tim,

I see your comment on JIRA, it's a good suggestion, I'll take a look at MapMaker 
  of Google-collections to find if there are anything we can learn from it.

A file's canonical path is changed rarely, so in the almost time, it's safe to 
cache them. Now, the cache implementation used a map to cache path and a extra 
LinkList to keep all cached element in time order. I'm thinking are there any 
ways to do this in one data structure?

-- 
Best Regards,
Regis.

Re: [classlib] add cache for file's canonical path

Posted by Tim Ellison <t....@gmail.com>.
Regis wrote:
> Tim Ellison wrote:
>> Regis wrote:
>>> Recently I'm trying to optimize implementation of java.File,
>>
>> Cool.
>>
>>> and find File.getCanonicalPath is very time cost and is heavily used
>>> by checking file permission. But in the most of cases, the canonical
>>> path is never change, so I think it's better to cache them to avoid
>>> calculate every time. And I found RI also has cache:
>>> File.getCanonicalPath doesn't reflect change in file system
>>> real-time.
>>
>> Hmm, sounds risky to me -- at least liable to create unpredictable
>> behavior.
> 
> The file's canonical path changed rarely, on Linux, only symbol link in
> the path changed, can cause canonical path change; and on Windows, only
> when path contains short name, the canonical path can be changed.
> Maybe we can choose to not cache paths which are relatively "volatile",
> such as path contained short name on Windows?
> 
>>
>>> I have created a JIRA HARMONY-6200 for this, and attach a simple cache
>>> implementation for file's canonical path, someone interested in this may
>>> can help to review it.
>>>
>>> The cache has fixed size, when it's full oldest element would be remove,
>>> and elements would be expired in 10 minutes, that mean if the element
>>> stay in cache more than 10 minutes, it's not validate and will be
>>> removed.
>>
>> Ten minutes seems like a very long time.  I would have thought that
>> having a cache that expires in a relatively short time (since that file
>> path was last accessed) would be sufficient.
> 
> So how about one minute?
> And I prefer to define expiring time as time since that file path added
> to the cache, because there may be some hot paths accessed frequently,
> if expiring time is count from last accessed, they may stay in cache too
> long and have no chance to update from file system.
> 
>>
>>> It's a initial implementation, I believe there are still many places can
>>> be improved, any comments and suggestions are welcome.
>>
>> My initial thought is that it would be preferable to have the OS tell
>> you when a file path changes so you can invalidate the cache, but there
>> may not be enough info to be able to do that...

Now that M10 is out the way, it may be time to revisit this idea.

WDYT?

Tim

Re: [classlib] add cache for file's canonical path

Posted by Regis <xu...@gmail.com>.
Tim Ellison wrote:
> Regis wrote:
>> Recently I'm trying to optimize implementation of java.File,
> 
> Cool.
> 
>> and find File.getCanonicalPath is very time cost and is heavily used
>> by checking file permission. But in the most of cases, the canonical
>> path is never change, so I think it's better to cache them to avoid
>> calculate every time. And I found RI also has cache:
>> File.getCanonicalPath doesn't reflect change in file system
>> real-time.
> 
> Hmm, sounds risky to me -- at least liable to create unpredictable behavior.

The file's canonical path changed rarely, on Linux, only symbol link in the path 
changed, can cause canonical path change; and on Windows, only when path 
contains short name, the canonical path can be changed.
Maybe we can choose to not cache paths which are relatively "volatile", such as 
path contained short name on Windows?

> 
>> I have created a JIRA HARMONY-6200 for this, and attach a simple cache
>> implementation for file's canonical path, someone interested in this may
>> can help to review it.
>>
>> The cache has fixed size, when it's full oldest element would be remove,
>> and elements would be expired in 10 minutes, that mean if the element
>> stay in cache more than 10 minutes, it's not validate and will be removed.
> 
> Ten minutes seems like a very long time.  I would have thought that
> having a cache that expires in a relatively short time (since that file
> path was last accessed) would be sufficient.

So how about one minute?
And I prefer to define expiring time as time since that file path added to the 
cache, because there may be some hot paths accessed frequently, if expiring time 
is count from last accessed, they may stay in cache too long and have no chance 
to update from file system.

> 
>> It's a initial implementation, I believe there are still many places can
>> be improved, any comments and suggestions are welcome.
> 
> My initial thought is that it would be preferable to have the OS tell
> you when a file path changes so you can invalidate the cache, but there
> may not be enough info to be able to do that...
> 
> Regards,
> Tim
> 
> 


-- 
Best Regards,
Regis.

Re: [classlib] add cache for file's canonical path

Posted by Leo Li <li...@gmail.com>.
On Thu, Jul 2, 2009 at 4:53 PM, Yang Paulex <pa...@gmail.com> wrote:

> 2009/5/8 Tim Ellison <t....@gmail.com>
>
> >
> > [snip]
> >
>
>
> >
> > My initial thought is that it would be preferable to have the OS tell
> > you when a file path changes so you can invalidate the cache, but there
> > may not be enough info to be able to do that...
>
>
> It may be overkill, but Linux has a system call - inotify to send file
> change notification to registered directory.
>


   I have a straightforward approach. Maybe just kidding, hehe.
   If we can assume that most of file pathes put to check access has been
cannonicalized, we can first directly use them but the cannonical one as a
failover .

>
>
> Regards,
> Tim
>
>



-- 
Good luck!

Leo Li

Re: [classlib] add cache for file's canonical path

Posted by Yang Paulex <pa...@gmail.com>.
2009/5/8 Tim Ellison <t....@gmail.com>

>
> [snip]
>


>
> My initial thought is that it would be preferable to have the OS tell
> you when a file path changes so you can invalidate the cache, but there
> may not be enough info to be able to do that...


It may be overkill, but Linux has a system call - inotify to send file
change notification to registered directory.


>
>
> Regards,
> Tim
>
>

Re: [classlib] add cache for file's canonical path

Posted by Tim Ellison <t....@gmail.com>.
Regis wrote:
> Recently I'm trying to optimize implementation of java.File,

Cool.

> and find File.getCanonicalPath is very time cost and is heavily used
> by checking file permission. But in the most of cases, the canonical
> path is never change, so I think it's better to cache them to avoid
> calculate every time. And I found RI also has cache:
> File.getCanonicalPath doesn't reflect change in file system
> real-time.

Hmm, sounds risky to me -- at least liable to create unpredictable behavior.

> I have created a JIRA HARMONY-6200 for this, and attach a simple cache
> implementation for file's canonical path, someone interested in this may
> can help to review it.
> 
> The cache has fixed size, when it's full oldest element would be remove,
> and elements would be expired in 10 minutes, that mean if the element
> stay in cache more than 10 minutes, it's not validate and will be removed.

Ten minutes seems like a very long time.  I would have thought that
having a cache that expires in a relatively short time (since that file
path was last accessed) would be sufficient.

> It's a initial implementation, I believe there are still many places can
> be improved, any comments and suggestions are welcome.

My initial thought is that it would be preferable to have the OS tell
you when a file path changes so you can invalidate the cache, but there
may not be enough info to be able to do that...

Regards,
Tim