You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by polofan123 <po...@web.de> on 2012/10/04 16:02:51 UTC

Avoid binary streaming

In my jackrabbit datastore there are large binary files stored. I can browse
the datastore filesystem and open these files without any problems.

Now how can I use these files from within my application? I of course could
use the getStream() method of type jcr.binary but then I would just stream
all the content of the already exsisting file into a new temporary file
right? Since my binarys are very large I don't want that. I'm looking for a
way to get the full filesystem path of a binary. The method getpath() of
jcr.Property only returns the path within the repository and only with the
mapped node names and not the node names which are really stored on my
filesystem. In general I have to parse a binary object into a Java.IO.File
object and I want to avoid Streaming




--
View this message in context: http://jackrabbit.510166.n4.nabble.com/Avoid-binary-streaming-tp4656705.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Avoid binary streaming

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 28.11.2012, at 18:01, Seidel. Robert <Ro...@aeb.de> wrote:

> For this specific file system (http://www.fast-lta.de/en/) , it is necessary to set the lastmodified date of the file to a date in the future, how long this file should be protected from deletion,

That seems like an awkward requirement - it obviously breaks the lastmodified semantic for files. Isn't there another option?

For example, the data store garbage collection relies on the lastmodified of files in the FileDataStore.

> this depends in our case dynamically on the data (can't be set fix to 6 years or so).

You know that the datastore only applies to binaries, and all other JCR data (node hierarchy and properties) are stored via the persistence manager and (depending on the implementation) usually in a database where all data is together in a few files on the file system. If those db files are lost (because the protection ran out or something) than you won't be able to get to the binaries in the datastore anymore (or at least no longer no their filenames and other metadata).

Cheers,
Alex

AW: Avoid binary streaming

Posted by "Seidel. Robert" <Ro...@aeb.de>.

Hello Alex,

For this specific file system (http://www.fast-lta.de/en/) , it is necessary to set the lastmodified date of the file to a date in the future, how long this file should be protected from deletion, this depends in our case dynamically on the data (can't be set fix to 6 years or so).

The data store interface provides just two methods (which is a very good thing to integrate newer sub systems). One to store a stream and return an id and another to receive a stream with a given id. So the data store implementation just lacks of the necessary information to handle this problem itself.

Modifying the stream (write a header for example) would result in parsing problems at full text indexing, besides every access to and from the data store had to handle this header, so this was no solution at all.

Regards Robert

-----Ursprüngliche Nachricht-----
Von: Alexander Klimetschek [mailto:aklimets@adobe.com]
Gesendet: Mittwoch, 28. November 2012 17:34
An: users@jackrabbit.apache.org
Betreff: Re: Avoid binary streaming

On 28.11.2012, at 12:10, Cech. Ulrich <Ul...@aeb.de> wrote:

> That's basically correct, but sometimes, it is important to have the file directly. In combination with some long-life hardware storage systems, it is necessary to change the file attribute and/or the "lastmodified"...

I don't know exactly what you mean... generally speaking, managing its files is the task of the persistence layer in Jackrabbit, i.e. the datastore implementation, and not the application level above the JCR API.

Cheers,
Alex

Re: Avoid binary streaming

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 28.11.2012, at 12:10, Cech. Ulrich <Ul...@aeb.de> wrote:

> That's basically correct, but sometimes, it is important to have the file directly. In combination with some long-life hardware storage systems, it is necessary to change the file attribute and/or the "lastmodified"...

I don't know exactly what you mean... generally speaking, managing its files is the task of the persistence layer in Jackrabbit, i.e. the datastore implementation, and not the application level above the JCR API.

Cheers,
Alex

AW: Avoid binary streaming

Posted by "Cech. Ulrich" <Ul...@aeb.de>.

Hi Alex,

< Breaking the JCR API abstraction by getting to the datastore file directly is not a good idea.>

That's basically correct, but sometimes, it is important to have the file directly. In combination with some long-life hardware storage systems, it is necessary to change the file attribute and/or the "lastmodified"...


Bye,
Ulrich

Re: Avoid binary streaming

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 04.10.2012, at 16:02, polofan123 <po...@web.de> wrote:

> In my jackrabbit datastore there are large binary files stored. I can browse
> the datastore filesystem and open these files without any problems.
> 
> Now how can I use these files from within my application? I of course could
> use the getStream() method of type jcr.binary but then I would just stream
> all the content of the already exsisting file into a new temporary file
> right?

Use Binary.read() [0] [1] to get random access to the file. It was explicitly designed for that. 

> Since my binarys are very large I don't want that. I'm looking for a
> way to get the full filesystem path of a binary.

Breaking the JCR API abstraction by getting to the datastore file directly is not a good idea.

[0] http://www.day.com/maven/javax.jcr/javadocs/jcr-2.0/javax/jcr/Binary.html#read(byte[],%20long)
[1] http://www.day.com/specs/jcr/2.0/5_Reading.html#5.10.5%20Binary%20Object

Cheers,
Alex

AW: Avoid binary streaming

Posted by "Cech. Ulrich" <Ul...@aeb.de>.

You can get the physical path to the file in a FileDataStore as following (note, that this is then Jackrabbit-specific).

(The PhysicalFileInDataStore-class is just a simple POJO)
    public static class PhysicalFileInDataStore {

        private String nodeIdentifier;
        private File file;

        ... (getter und setter)
    }


...
List<Node> fileNodes = getResourceNodes(node);
DataStore ds = ((RepositoryImpl) node.getSession().getRepository()).getConfig().getDataStore();
if (ds instanceof FileDataStore) {
    dsPath = new File(((FileDataStore) ds).getPath());
}
for (Node fn : fileNodes) {
    Property prop = fn.getProperty(JcrConstants.JCR_DATA);
    Binary b = prop.getBinary();
    if (b instanceof BLOBInDataStore) {
        if (ds instanceof FileDataStore) {
            PhysicalFileInDataStore pf = new PhysicalFileInDataStore();
            pf.setFile(getFile(getIdentifier(b), dsPath));
            pf.setNodeIdentifier(fn.getIdentifier());
            files.add(pf);
        }
    }
}




    public List<Node> getResourceNodes(final Node node) throws
            RepositoryException {
        List<Node> nodes = new ArrayList<Node>();
        NodeIterator nitg = node.getNodes();
        while (nitg.hasNext()) {
            Node n = (Node) nitg.next();
            if (n.isNodeType(JcrConstants.NT_FILE)) {
                NodeIterator nitFile = n.getNodes();
                while (nitFile.hasNext()) {
                    Node nodeContent = (Node) nitFile.next();
                    if ((JcrConstants.JCR_CONTENT.equals(nodeContent.getName()))
                            && (nodeContent.isNodeType(JcrConstants.NT_RESOURCE))) {
                        nodes.add(nodeContent);
                    }
                }
            }
        }
        return nodes;
    }


    public File getFile(DataIdentifier identifier, File directory) {
        String string = identifier.toString();
        File file = directory;
        file = new File(file, string.substring(0, 2));
        file = new File(file, string.substring(2, 4));
        file = new File(file, string.substring(4, 6));
        return new File(file, string);
    }


    public DataIdentifier getIdentifier(Binary binary) {
        if (binary instanceof BLOBFileValue) {
            return ((BLOBFileValue) binary).getDataIdentifier();
        } else {
            return null;
        }
    }

-----Ursprüngliche Nachricht-----
Von: polofan123 [mailto:polofan@web.de]
Gesendet: Donnerstag, 4. Oktober 2012 16:03
An: users@jackrabbit.apache.org
Betreff: Avoid binary streaming

In my jackrabbit datastore there are large binary files stored. I can browse the datastore filesystem and open these files without any problems.

Now how can I use these files from within my application? I of course could use the getStream() method of type jcr.binary but then I would just stream all the content of the already exsisting file into a new temporary file right? Since my binarys are very large I don't want that. I'm looking for a way to get the full filesystem path of a binary. The method getpath() of jcr.Property only returns the path within the repository and only with the mapped node names and not the node names which are really stored on my filesystem. In general I have to parse a binary object into a Java.IO.File object and I want to avoid Streaming




--
View this message in context: http://jackrabbit.510166.n4.nabble.com/Avoid-binary-streaming-tp4656705.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
________________________________

Treffen Sie AEB vom 17.-19. Oktober 2012 auf dem 29. Deutschen Logistik-Kongress in Berlin.
Weitere Informationen und Termin-Vereinbarung unter: www.aeb.de/dlk