You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Julian Reschke (JIRA)" <ji...@apache.org> on 2007/04/18 15:43:16 UTC

[jira] Created: (JCR-851) Handling of binary properties (streams) in QValue interface

Handling of binary properties (streams) in QValue interface
-----------------------------------------------------------

                 Key: JCR-851
                 URL: https://issues.apache.org/jira/browse/JCR-851
             Project: Jackrabbit
          Issue Type: Improvement
          Components: SPI
            Reporter: Julian Reschke


The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.

In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Julian Reschke (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489995 ] 

Julian Reschke commented on JCR-851:
------------------------------------

Maybe I'm a bit slow, but I don't see how to do that.

In the write/read case, the QValue is instantiated without any knowledge where the InputStream finally will be written to (or more precisely: who is going to read it).

In the general case (I don't know the consumer), I will have to make a copy of every byte read from it, and also have to take into account the case where the consumer actually doesn't consume the whole stream. In this case, I can preserve the contents of the stream either in memory or on disk.

The write to repository case theoretically can be optimized: the InputStream held by the QValue would be replaced with the information required to get the contents from the backing store later on. However, this assumes that the write operation will succeed. It also doesn't take care of the case where the binary property in the repository later on gets replaced by a new value, in which case the information in the QValue object would become invalid.

Hint: if you feel the current interface is good enough, please propose a change to SPI2JCR that implements QValue.getStream without making a copy of the contents...

Best regards, Julian


> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Julian Reschke (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489966 ] 

Julian Reschke commented on JCR-851:
------------------------------------

Agreed for the case where the client just gets the property from the store.

However, in the "set" case, JCR2SPI currently caches the QValue, and re-uses it. Thus, a test case that first sets a binary property and then reads it fails, unless the QValue preserves the stream contents. This is a problem.


> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "angela (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489939 ] 

angela commented on JCR-851:
----------------------------

jukka, maybe i didn't get your post. from what i understood, i think you are mistaken about the current 
usage of QValue in the SPI:
they are used both for reading from the SPI impl and upon creation/modification of existing properties.

therefore:

> The most usual (only?) reason for a client to create QValues is when the client wants to 
> store content in the repository

is probably not totally correct, isn't it? i was told recently that almost everything was reading :)

since in contrast to the jackrabbit core QValue objects are always created by a factory and both QValueFactory and QValue itself are interfaces defined by the SPI api,  i would argue that it's an implementation detail (which might or might not be crucial) if a large binary is stored on the server upon creation or not.

> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489845 ] 

Jukka Zitting commented on JCR-851:
-----------------------------------

In most cases, especially when working with large binaries, the client will not try to read the stream from transient space when storing the binary in the repository. Thus a reasonable SPI implementation would stream (large) binaries to the backend server already during QValueFactory.create(InputStream) and use internal of identifiers to keep track of the values.

Just like with the normal JCR ValueFactory interface, the most reasonable implementation is to consume the entire InputStream when the create() method is called instead of lazily keeping just the reference.

> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Julian Reschke (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489978 ] 

Julian Reschke commented on JCR-851:
------------------------------------

I really meant "SPI": the SPI contract needs to be sufficiently clear, and JCR2SPI has to work against that.

That being said, the problem seems to be that JCR2SPI likes to keep the QValue, thus SPI requires QValue to allow multiple calls to getStream() (requiring new stream instances to be returned). Supporting multiple calls to getStream() is simple for multiple reads, but not for the case where a property is set and then re-read (such as in some of the TCK tests).

One solution to that would be to:

- require getStream() to *either* return a new stream instance, or to throw an IllegalStateException, and
- modify JCR2SPI to obtain a new QValue if getStream() on the cached value yielded that exception.

That would allow JCR2SPI to keep QValue instances in most cases (read/read), but would make it much simpler to implement the write/read sequence.


> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "angela (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489968 ] 

angela commented on JCR-851:
----------------------------

> The current SPI requires QValue to return new streams upon each call to getStream(). 

oh i see.... you meant jcr2spi... not spi...
so, what are the modifications you propose?

kind regards
angela





> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509517 ] 

Jukka Zitting commented on JCR-851:
-----------------------------------

Some background on the patch above. We discussed this issue with Marcel, Angela, and Julian, and it seems like the best way to manage binary streams is to use some sort of a binary handle (a temporary file or a reference to the backend store) instead of the live InputStream reference.

> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>         Attachments: QValue.patch
>
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489979 ] 

Jukka Zitting commented on JCR-851:
-----------------------------------

You'll need to have the re-read feature somewhere in any case, so why not put it inside the QValue implementation? You don't need to keep the entire value inside the QValue instance, just enough information that you can retrieve the value when requested.

Leaving the contract as-is allows an implementation for example to decide that it'll cache small binaries locally but keep large binaries only on the backend server, accessing them only as needed.

> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489878 ] 

Jukka Zitting commented on JCR-851:
-----------------------------------

I'm not sure I follow you. What kind of an alternative contract do you have in mind?

AFAIUI it never makes sense to have (large) binaries associated directly with the QValue instances. They should always be stored on the server-side and the QValue instance should only hold a reference or identifier to the backend storage. Calling getStream() will request the binary to be streamed from the backend. This solution should cover both reading and writing of values pretty well.

> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Julian Reschke (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489866 ] 

Julian Reschke commented on JCR-851:
------------------------------------

I'm not sure I understand.

The code implementing QValue has no way to predict who is going to call getStream(), and in which order this will occur. If it doesn't copy the data upon creation, it will have to when it's read first. How exactly does that help, except by delaying the issue, and by potentially optimizing the case that getStream() never get's called?

> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489942 ] 

Jukka Zitting commented on JCR-851:
-----------------------------------

> > The most usual (only?) reason for a client to create QValues is when the client wants to
> > store content in the repository
>
> is probably not totally correct, isn't it? i was told recently that almost everything was reading :) 

Note the part "for a client". The SPI implementation obviously creates QValues whenever content is read, but I don't believe that is a problem in this case.

> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-851:
------------------------------

    Fix Version/s: 1.4

> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>             Fix For: 1.4
>
>         Attachments: QValue.patch
>
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-851:
------------------------------

    Attachment: QValue.patch

Suggested patch

> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>         Attachments: QValue.patch
>
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Julian Reschke (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489872 ] 

Julian Reschke commented on JCR-851:
------------------------------------

In which case we should change the contract, right? Right now, making the 99% use case work both correctly and efficiently is just harder than it needs to be...


> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "angela (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

angela resolved JCR-851.
------------------------

    Resolution: Fixed

applied patch with rev. 552909 and modified the QValueTest class accordingly


> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>         Attachments: QValue.patch
>
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-851) Handling of binary properties (streams) in QValue interface

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489870 ] 

Jukka Zitting commented on JCR-851:
-----------------------------------

The most usual (only?) reason for a client to create QValues is when the client wants to store content in the repository. I would argue that in 99% of those cases the created QValue is never read by the client.

> Handling of binary properties (streams) in QValue interface
> -----------------------------------------------------------
>
>                 Key: JCR-851
>                 URL: https://issues.apache.org/jira/browse/JCR-851
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: SPI
>            Reporter: Julian Reschke
>
> The current SPI requires QValue to return new streams upon each call to getStream(). As far as I can tell, this essentially requires a QValue implementation to preserve the whole content of a stream, be it in memory or on disk.
> In particular (and unless I'm missing something), when importing large content into a repository, this causes the whole data stream to be written twice. We really should try to avoid that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.