You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Ulrich Cech (JIRA)" <ji...@apache.org> on 2010/07/16 08:57:50 UTC

[jira] Created: (JCR-2677) Extend the FileDataStore implementation to support read-only media (eg. WORMs)

Extend the FileDataStore implementation to support read-only media (eg. WORMs)
------------------------------------------------------------------------------

                 Key: JCR-2677
                 URL: https://issues.apache.org/jira/browse/JCR-2677
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-core
    Affects Versions: 2.2.0
            Reporter: Ulrich Cech


Actually, the FileDataStore does not support read-only media. In a professional environment, where data consistence and unchangable of data is important (like archiving systems) this functionality is very important.

I would try to do the implementation and contribute it...


I attachted the conversation of the jackrabbit users mailinglist:

-----Ursprüngliche Nachricht-----
Von: Thomas Müller [mailto:thomas.mueller@day.com] 
Gesendet: Mittwoch, 14. Juli 2010 11:52
An: users@jackrabbit.apache.org
Betreff: Re: Jackrabbit and WORM

Hi,

> written to read-only media

Do you mean written to write-only media? The DataStore implementation
does not support this feature currently, however you could probably
change the FileDataStore to support it. Instead of writing the
temporary file to the datastore directory, it would have to be written
to a different place (the temp directory for example). If you don't
have a temp directory then it's a bit more complicated (binaries would
need to be split into smaller blocks that fit in memory).

Regards,
Thomas


On Wed, Jul 14, 2010 at 11:42 AM, Cech. Ulrich <Ul...@aeb.de> wrote:
> I have problems using JackRabbit with a storage-system, where files could only be added, but not changed or deleted.
> I found out, that in BinaryImpl.class there is created a TransientFileFactory, where the stream is written in a temporary file and later be deleted. If this deletion fails, I get an exception
> ...
> Caused by: java.io.IOException: Can not rename c:\temp\cr20fs\repository\datastore\tmp21866.tmp to c:\temp\cr20fs\repository\datastore\8d\54\82\8d548201d39d7594d182c2a3901fa38dfeebc6b3 (media read only?)
> ...
>
> I tried to set the DataStore parameter "minRecordLength" to a very high value, so that the stream is handled in memory, but this is limited to the available heap space and so not applicable.
>
> Has anyone some experiences with Jackrabbit and read-only media? Can it be configured, that only the repository and the versions are written to read-only media, but other files (like the Lucene index, which could be well configured to some other directory, so that's no problem) is written to some "normal" storage system?
>
> Many thanks in advance,
> Ulrich


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2677) Extend the FileDataStore implementation to support read-only media (eg. WORMs)

Posted by "Ulrich Cech (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890211#action_12890211 ] 

Ulrich Cech commented on JCR-2677:
----------------------------------

<How do you suggest do do that? renameTo might not work. >
Ah, I get the problem... copy-methods() via streams cannot be used

> Extend the FileDataStore implementation to support read-only media (eg. WORMs)
> ------------------------------------------------------------------------------
>
>                 Key: JCR-2677
>                 URL: https://issues.apache.org/jira/browse/JCR-2677
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.2.0
>            Reporter: Ulrich Cech
>
> Actually, the FileDataStore does not support read-only media. In a professional environment, where data consistence and unchangable of data is important (like archiving systems) this functionality is very important.
> I would try to do the implementation and contribute it...
> I attachted the conversation of the jackrabbit users mailinglist:
> -----Ursprüngliche Nachricht-----
> Von: Thomas Müller [mailto:thomas.mueller@day.com] 
> Gesendet: Mittwoch, 14. Juli 2010 11:52
> An: users@jackrabbit.apache.org
> Betreff: Re: Jackrabbit and WORM
> Hi,
> > written to read-only media
> Do you mean written to write-only media? The DataStore implementation
> does not support this feature currently, however you could probably
> change the FileDataStore to support it. Instead of writing the
> temporary file to the datastore directory, it would have to be written
> to a different place (the temp directory for example). If you don't
> have a temp directory then it's a bit more complicated (binaries would
> need to be split into smaller blocks that fit in memory).
> Regards,
> Thomas
> On Wed, Jul 14, 2010 at 11:42 AM, Cech. Ulrich <Ul...@aeb.de> wrote:
> > I have problems using JackRabbit with a storage-system, where files could only be added, but not changed or deleted.
> > I found out, that in BinaryImpl.class there is created a TransientFileFactory, where the stream is written in a temporary file and later be deleted. If this deletion fails, I get an exception
> > ...
> > Caused by: java.io.IOException: Can not rename c:\temp\cr20fs\repository\datastore\tmp21866.tmp to c:\temp\cr20fs\repository\datastore\8d\54\82\8d548201d39d7594d182c2a3901fa38dfeebc6b3 (media read only?)
> > ...
> >
> > I tried to set the DataStore parameter "minRecordLength" to a very high value, so that the stream is handled in memory, but this is limited to the available heap space and so not applicable.
> >
> > Has anyone some experiences with Jackrabbit and read-only media? Can it be configured, that only the repository and the versions are written to read-only media, but other files (like the Lucene index, which could be well configured to some other directory, so that's no problem) is written to some "normal" storage system?
> >
> > Many thanks in advance,
> > Ulrich

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2677) Extend the FileDataStore implementation to support read-only media (eg. WORMs)

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889788#action_12889788 ] 

Thomas Mueller commented on JCR-2677:
-------------------------------------

That would be a nice feature.

With WORM media you might want to create the temp file in the temp directory, and not in the datastore directory. This would also allow distributing the data store (using soft links to other disks within the datastore directory). In this case, you also can't simply "rename" a temp file because the target directory is on a different drive than the source directory.

But that means creating the file in the "final" place is no longer an atomic "rename" (you have to copy the file block by block). This is a problem for concurrent writes (currently the FileDataStore would throw an exception saying the size doesn't match). But it's also a problem for concurrent reads: How can you detect the non-atomic write operation is still running? Are file locked while they are created, or will reading from the input stream automatically wait for the writer? If this is not the case, can we just re-try a few times (until the file doesn't change for 10 seconds for example)?

For WORM media, what if an exception occurs while the file is being created (so the file is broken)?

For such cases, what about creating multiple versions of the same file if we detect the current one is broken, for example:

1d26ee96b6b5b886a3ac2b68df0636c97db5fbfd (broken)
1d26ee96b6b5b886a3ac2b68df0636c97db5fbfd-1 (correct)

Broken files would be very rare of course, but I guess we still need a way to solve this problem. The algorithm could support multiple broken files of course, and just use the suffix -2, -3,..., until it works (up to a limit of maybe 100).



> Extend the FileDataStore implementation to support read-only media (eg. WORMs)
> ------------------------------------------------------------------------------
>
>                 Key: JCR-2677
>                 URL: https://issues.apache.org/jira/browse/JCR-2677
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.2.0
>            Reporter: Ulrich Cech
>
> Actually, the FileDataStore does not support read-only media. In a professional environment, where data consistence and unchangable of data is important (like archiving systems) this functionality is very important.
> I would try to do the implementation and contribute it...
> I attachted the conversation of the jackrabbit users mailinglist:
> -----Ursprüngliche Nachricht-----
> Von: Thomas Müller [mailto:thomas.mueller@day.com] 
> Gesendet: Mittwoch, 14. Juli 2010 11:52
> An: users@jackrabbit.apache.org
> Betreff: Re: Jackrabbit and WORM
> Hi,
> > written to read-only media
> Do you mean written to write-only media? The DataStore implementation
> does not support this feature currently, however you could probably
> change the FileDataStore to support it. Instead of writing the
> temporary file to the datastore directory, it would have to be written
> to a different place (the temp directory for example). If you don't
> have a temp directory then it's a bit more complicated (binaries would
> need to be split into smaller blocks that fit in memory).
> Regards,
> Thomas
> On Wed, Jul 14, 2010 at 11:42 AM, Cech. Ulrich <Ul...@aeb.de> wrote:
> > I have problems using JackRabbit with a storage-system, where files could only be added, but not changed or deleted.
> > I found out, that in BinaryImpl.class there is created a TransientFileFactory, where the stream is written in a temporary file and later be deleted. If this deletion fails, I get an exception
> > ...
> > Caused by: java.io.IOException: Can not rename c:\temp\cr20fs\repository\datastore\tmp21866.tmp to c:\temp\cr20fs\repository\datastore\8d\54\82\8d548201d39d7594d182c2a3901fa38dfeebc6b3 (media read only?)
> > ...
> >
> > I tried to set the DataStore parameter "minRecordLength" to a very high value, so that the stream is handled in memory, but this is limited to the available heap space and so not applicable.
> >
> > Has anyone some experiences with Jackrabbit and read-only media? Can it be configured, that only the repository and the versions are written to read-only media, but other files (like the Lucene index, which could be well configured to some other directory, so that's no problem) is written to some "normal" storage system?
> >
> > Many thanks in advance,
> > Ulrich

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2677) Extend the FileDataStore implementation to support read-only media (eg. WORMs)

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889806#action_12889806 ] 

Thomas Mueller commented on JCR-2677:
-------------------------------------

> 2. The temp file is moved to the permanent target directory 

How do you suggest do do that? renameTo might not work.

> I think when this method (addRecord() from FileDataStore) returns, the record is unavailable to other "processes"

What do you mean exactly with this? How/why/when exactly is it unavailable?

> Extend the FileDataStore implementation to support read-only media (eg. WORMs)
> ------------------------------------------------------------------------------
>
>                 Key: JCR-2677
>                 URL: https://issues.apache.org/jira/browse/JCR-2677
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.2.0
>            Reporter: Ulrich Cech
>
> Actually, the FileDataStore does not support read-only media. In a professional environment, where data consistence and unchangable of data is important (like archiving systems) this functionality is very important.
> I would try to do the implementation and contribute it...
> I attachted the conversation of the jackrabbit users mailinglist:
> -----Ursprüngliche Nachricht-----
> Von: Thomas Müller [mailto:thomas.mueller@day.com] 
> Gesendet: Mittwoch, 14. Juli 2010 11:52
> An: users@jackrabbit.apache.org
> Betreff: Re: Jackrabbit and WORM
> Hi,
> > written to read-only media
> Do you mean written to write-only media? The DataStore implementation
> does not support this feature currently, however you could probably
> change the FileDataStore to support it. Instead of writing the
> temporary file to the datastore directory, it would have to be written
> to a different place (the temp directory for example). If you don't
> have a temp directory then it's a bit more complicated (binaries would
> need to be split into smaller blocks that fit in memory).
> Regards,
> Thomas
> On Wed, Jul 14, 2010 at 11:42 AM, Cech. Ulrich <Ul...@aeb.de> wrote:
> > I have problems using JackRabbit with a storage-system, where files could only be added, but not changed or deleted.
> > I found out, that in BinaryImpl.class there is created a TransientFileFactory, where the stream is written in a temporary file and later be deleted. If this deletion fails, I get an exception
> > ...
> > Caused by: java.io.IOException: Can not rename c:\temp\cr20fs\repository\datastore\tmp21866.tmp to c:\temp\cr20fs\repository\datastore\8d\54\82\8d548201d39d7594d182c2a3901fa38dfeebc6b3 (media read only?)
> > ...
> >
> > I tried to set the DataStore parameter "minRecordLength" to a very high value, so that the stream is handled in memory, but this is limited to the available heap space and so not applicable.
> >
> > Has anyone some experiences with Jackrabbit and read-only media? Can it be configured, that only the repository and the versions are written to read-only media, but other files (like the Lucene index, which could be well configured to some other directory, so that's no problem) is written to some "normal" storage system?
> >
> > Many thanks in advance,
> > Ulrich

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2677) Extend the FileDataStore implementation to support read-only media (eg. WORMs)

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890217#action_12890217 ] 

Thomas Mueller commented on JCR-2677:
-------------------------------------

> Ah, I get the problem... copy-methods() via streams cannot be used 

Well, I don't know what features your WORM storage supports, and what kind of failures are possible. For example, if you write a block of data (append to a file), is it guaranteed that the write is atomic (the block is either written as is or not written at all)?

> Extend the FileDataStore implementation to support read-only media (eg. WORMs)
> ------------------------------------------------------------------------------
>
>                 Key: JCR-2677
>                 URL: https://issues.apache.org/jira/browse/JCR-2677
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.2.0
>            Reporter: Ulrich Cech
>
> Actually, the FileDataStore does not support read-only media. In a professional environment, where data consistence and unchangable of data is important (like archiving systems) this functionality is very important.
> I would try to do the implementation and contribute it...
> I attachted the conversation of the jackrabbit users mailinglist:
> -----Ursprüngliche Nachricht-----
> Von: Thomas Müller [mailto:thomas.mueller@day.com] 
> Gesendet: Mittwoch, 14. Juli 2010 11:52
> An: users@jackrabbit.apache.org
> Betreff: Re: Jackrabbit and WORM
> Hi,
> > written to read-only media
> Do you mean written to write-only media? The DataStore implementation
> does not support this feature currently, however you could probably
> change the FileDataStore to support it. Instead of writing the
> temporary file to the datastore directory, it would have to be written
> to a different place (the temp directory for example). If you don't
> have a temp directory then it's a bit more complicated (binaries would
> need to be split into smaller blocks that fit in memory).
> Regards,
> Thomas
> On Wed, Jul 14, 2010 at 11:42 AM, Cech. Ulrich <Ul...@aeb.de> wrote:
> > I have problems using JackRabbit with a storage-system, where files could only be added, but not changed or deleted.
> > I found out, that in BinaryImpl.class there is created a TransientFileFactory, where the stream is written in a temporary file and later be deleted. If this deletion fails, I get an exception
> > ...
> > Caused by: java.io.IOException: Can not rename c:\temp\cr20fs\repository\datastore\tmp21866.tmp to c:\temp\cr20fs\repository\datastore\8d\54\82\8d548201d39d7594d182c2a3901fa38dfeebc6b3 (media read only?)
> > ...
> >
> > I tried to set the DataStore parameter "minRecordLength" to a very high value, so that the stream is handled in memory, but this is limited to the available heap space and so not applicable.
> >
> > Has anyone some experiences with Jackrabbit and read-only media? Can it be configured, that only the repository and the versions are written to read-only media, but other files (like the Lucene index, which could be well configured to some other directory, so that's no problem) is written to some "normal" storage system?
> >
> > Many thanks in advance,
> > Ulrich

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2677) Extend the FileDataStore implementation to support read-only media (eg. WORMs)

Posted by "Ulrich Cech (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889795#action_12889795 ] 

Ulrich Cech commented on JCR-2677:
----------------------------------

That's right, we have to implement some more "checks", that the target-file is correctly transferred from the temp-directory.
But actually, there is also a potential problem when relying on the renameTo() method, because this method is not really atomic.

I think we can go like this:
1. The temporary file is written to a temp-directory (could be customized somewhere and/or use some default directory)
2. The temp file is moved to the permanent target directory
3. checks are made to be sure the file is moved correctly
4. remove the "setLastModified()", this would also fail on WORM media
5. following the actual existent code...
I think when this method (addRecord() from FileDataStore) returns, the record is unavailable to other "processes", so we do not need to handle the cases mentioned with special effort. Or do I oversee side effects here?


> Extend the FileDataStore implementation to support read-only media (eg. WORMs)
> ------------------------------------------------------------------------------
>
>                 Key: JCR-2677
>                 URL: https://issues.apache.org/jira/browse/JCR-2677
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.2.0
>            Reporter: Ulrich Cech
>
> Actually, the FileDataStore does not support read-only media. In a professional environment, where data consistence and unchangable of data is important (like archiving systems) this functionality is very important.
> I would try to do the implementation and contribute it...
> I attachted the conversation of the jackrabbit users mailinglist:
> -----Ursprüngliche Nachricht-----
> Von: Thomas Müller [mailto:thomas.mueller@day.com] 
> Gesendet: Mittwoch, 14. Juli 2010 11:52
> An: users@jackrabbit.apache.org
> Betreff: Re: Jackrabbit and WORM
> Hi,
> > written to read-only media
> Do you mean written to write-only media? The DataStore implementation
> does not support this feature currently, however you could probably
> change the FileDataStore to support it. Instead of writing the
> temporary file to the datastore directory, it would have to be written
> to a different place (the temp directory for example). If you don't
> have a temp directory then it's a bit more complicated (binaries would
> need to be split into smaller blocks that fit in memory).
> Regards,
> Thomas
> On Wed, Jul 14, 2010 at 11:42 AM, Cech. Ulrich <Ul...@aeb.de> wrote:
> > I have problems using JackRabbit with a storage-system, where files could only be added, but not changed or deleted.
> > I found out, that in BinaryImpl.class there is created a TransientFileFactory, where the stream is written in a temporary file and later be deleted. If this deletion fails, I get an exception
> > ...
> > Caused by: java.io.IOException: Can not rename c:\temp\cr20fs\repository\datastore\tmp21866.tmp to c:\temp\cr20fs\repository\datastore\8d\54\82\8d548201d39d7594d182c2a3901fa38dfeebc6b3 (media read only?)
> > ...
> >
> > I tried to set the DataStore parameter "minRecordLength" to a very high value, so that the stream is handled in memory, but this is limited to the available heap space and so not applicable.
> >
> > Has anyone some experiences with Jackrabbit and read-only media? Can it be configured, that only the repository and the versions are written to read-only media, but other files (like the Lucene index, which could be well configured to some other directory, so that's no problem) is written to some "normal" storage system?
> >
> > Many thanks in advance,
> > Ulrich

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.