You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Paco Avila <mo...@gmail.com> on 2011/10/12 10:02:16 UTC

About FileDataStore mod-time update

Hi there.

I'm curious about the behabior of the
FileDataStore.getRecordIfStored(DataIdentifier identifier) method: when
access a DataRecord, it update the modification time of the file. This means
that every time a file is accessed is like has been modified and this is not
true.

The main reason of this questions os the backup of the Jacrabbit DataStore:
incremental backups does not make sense because the these kind of
aplications check the file mod-time to see if has been modified, and always
are modified in my case because I run a DataStoreGarbageCollector every
nigth and this process update the mod-time of every file in the DataDatore.

Thanks in advance.

-- 
OpenKM
http://www.openkm.com
http://www.guia-ubuntu.org

Re: About FileDataStore mod-time update

Posted by Paco Avila <pa...@openkm.com>.
You are right, lastAccessTime() isn't included until Java 7 :(

http://openjdk.java.net/projects/nio/javadoc/java/nio/file/attribute/BasicFileAttributes.html#lastAccessTime%28%29

On Thu, Oct 13, 2011 at 8:22 AM, Thomas Mueller <mu...@adobe.com> wrote:

> Hi,
>
> > Why don't check the "access time", instead of using the "modification
> time" for this? At least in UNIX / Linux you can check when a file was
> accesed last time.
>
> First of all, in Java you can't. Second, many file systems don't update
> that value.
>
> Regards,
> Thomas
>
>


-- 
OpenKM
http://www.openkm.com

Re: About FileDataStore mod-time update

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

> Why don't check the "access time", instead of using the "modification time" for this? At least in UNIX / Linux you can check when a file was accesed last time.

First of all, in Java you can't. Second, many file systems don't update that value.

Regards,
Thomas


Re: About FileDataStore mod-time update

Posted by Paco Avila <pa...@openkm.com>.
Why don't check the "access time", instead of using the "modification time"
for this? At least in UNIX / Linux you can check when a file was accesed
last time.

On Wed, Oct 12, 2011 at 1:42 PM, Thomas Mueller <mu...@adobe.com> wrote:

> Hi,
>
> > I think we should drop support for multiple distinct repositories
>
>
> Removing features from existing classes is problematic... What about a
> config option or a new class where this isn't supported.
>
> >The upside would be that with
> >something like that we'd be able to avoid the troublesome need to
> >update the last access timestamp whenever a record is accessed.
>
> We don't do that currently. It's only done when garbage collection is run.
>
> Regards,
> Thomas
>
>


-- 
OpenKM
http://www.openkm.com

Re: About FileDataStore mod-time update

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

> I think we should drop support for multiple distinct repositories


Removing features from existing classes is problematic... What about a
config option or a new class where this isn't supported.

>The upside would be that with
>something like that we'd be able to avoid the troublesome need to
>update the last access timestamp whenever a record is accessed.

We don't do that currently. It's only done when garbage collection is run.

Regards,
Thomas


Re: About FileDataStore mod-time update

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Oct 12, 2011 at 11:34 AM, Thomas Mueller <mu...@adobe.com> wrote:
> Please note the data store can be shared by multiple *processes* (multiple
> distinct repositories and multiple cluster nodes).

I think we should drop support for multiple distinct repositories
using the same data store. For multiple cluster nodes we should be
able to come up with a mechanism by which the nodes can coordinate on
the garbage collection task.

> Keeping a separate log is problematic, as accessing the separate log would
> need to be synchronized somehow. Keeping the list in memory is even more
> problematic (not to mention memory usage).

Yep, it's obviously a tradeoff. The upside would be that with
something like that we'd be able to avoid the troublesome need to
update the last access timestamp whenever a record is accessed.
Ideally the datastore would only be written to when a new record is
added or when the garbage collector decides to remove an unused
record.

BR,

Jukka Zitting

Re: About FileDataStore mod-time update

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

Please note the data store can be shared by multiple *processes* (multiple
distinct repositories and multiple cluster nodes).

Keeping a separate log is problematic, as accessing the separate log would
need to be synchronized somehow. Keeping the list in memory is even more
problematic (not to mention memory usage).

Regards,
Thomas


Re: About FileDataStore mod-time update

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Oct 12, 2011 at 10:22 AM, Thomas Mueller <mu...@adobe.com> wrote:
> However I'm afraid I don't know a better way to solve the problem (mark the
> file as still being used). Suggestions are always welcome of course.

We could keep a separate log of the last access times of records.

Or better yet, since the last access time is only really used to
prevent garbage collection from removing items still in use, we could
make the garbage collector keep a temporary list of all the record ids
it's considering to delete, and have the data store remove from that
list all the entries still referenced in content or currently
accessed. Any remaining entries can then be safely removed. This way
we wouldn't need to persist any access time information.

BR,

Jukka Zitting

Re: About FileDataStore mod-time update

Posted by Paco Avila <pa...@openkm.com>.
On Wed, Oct 12, 2011 at 10:22 AM, Thomas Mueller <mu...@adobe.com> wrote:

> Hi,
>
> > This means that every time a file is accessed is like has been modified
> and this is not true.
>
> When garbage collection is running, yes.
>

I don't understand why the file mod-time need to be updated when
DataStoreGarbageCollector is running. What is the sense of this?


> > incremental backups does not make sense
>
> If incremental backup only checks the last modified time, then it's a
> problem, I agree. However I'm afraid I don't know a better way to solve the
> problem (mark the file as still being used). Suggestions are always welcome
> of course.
>
> Is it possible that the incremental backup checks the file creation time
> instead of the last modification time?
>

Well, the modification time check has the meaning of when the file was
modified (of course) and the backup application need to know if a file has
been modified, because the file creation time should no be modified during
the existence of the file.

rsync, for example can be configured to avoid the mod-time comparison and
use a checksum-based comparison algorithm. But not all the backup
application are so configurable :(


>
> Regards,
> Thomas
>
>
-- 
OpenKM
http://www.openkm.com

Re: About FileDataStore mod-time update

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

> This means that every time a file is accessed is like has been modified and this is not true.

When garbage collection is running, yes.

> incremental backups does not make sense

If incremental backup only checks the last modified time, then it's a problem, I agree. However I'm afraid I don't know a better way to solve the problem (mark the file as still being used). Suggestions are always welcome of course.

Is it possible that the incremental backup checks the file creation time instead of the last modification time?

Regards,
Thomas