You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2008/12/02 13:24:44 UTC

[jira] Created: (JCR-1892) Unique ID for org.apache.jackrabbit.value.BinaryValue

Unique ID for org.apache.jackrabbit.value.BinaryValue
-----------------------------------------------------

                 Key: JCR-1892
                 URL: https://issues.apache.org/jira/browse/JCR-1892
             Project: Jackrabbit
          Issue Type: New Feature
          Components: jackrabbit-jcr-commons
            Reporter: Thomas Mueller
            Assignee: Thomas Mueller


BinaryValue should have a method get the unique identifier (if one is available). That way an application may not have to read the stream if that value is already processed.

When the DataStore is used, a unique identifier is available, so probably this feature is quite simple to implement.

See also http://www.nabble.com/Workspace.copy()-Question-...-td20435164.html (but please don't reply to this thread from now on - instead add comments to this issue).

Another feature is getFileName() to get the file name if it is stored in the file system. This method may need a security mechanism, for example getFileName(Session s) so that the system can check it. In any case the file should not be modified, but maybe knowing the file name is already too dangerous in some cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-1892) Unique ID for org.apache.jackrabbit.value.BinaryValue

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller updated JCR-1892:
--------------------------------

    Attachment: JackrabbitValue-api.patch

Proposed JackrabbitValue interface

> Unique ID for org.apache.jackrabbit.value.BinaryValue
> -----------------------------------------------------
>
>                 Key: JCR-1892
>                 URL: https://issues.apache.org/jira/browse/JCR-1892
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: jackrabbit-jcr-commons
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>         Attachments: JackrabbitValue-api.patch
>
>
> BinaryValue should have a method get the unique identifier (if one is available). That way an application may not have to read the stream if that value is already processed.
> When the DataStore is used, a unique identifier is available, so probably this feature is quite simple to implement.
> See also http://www.nabble.com/Workspace.copy()-Question-...-td20435164.html (but please don't reply to this thread from now on - instead add comments to this issue).
> Another feature is getFileName() to get the file name if it is stored in the file system. This method may need a security mechanism, for example getFileName(Session s) so that the system can check it. In any case the file should not be modified, but maybe knowing the file name is already too dangerous in some cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (JCR-1892) Unique ID for org.apache.jackrabbit.value.BinaryValue

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller resolved JCR-1892.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 1.6.0

Committed in revision 738119 (jackrabbit-core) and revision 738121 (jackrabbit-api).

> Unique ID for org.apache.jackrabbit.value.BinaryValue
> -----------------------------------------------------
>
>                 Key: JCR-1892
>                 URL: https://issues.apache.org/jira/browse/JCR-1892
>             Project: Jackrabbit Content Repository
>          Issue Type: New Feature
>          Components: jackrabbit-jcr-commons
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.6.0
>
>         Attachments: JackrabbitValue-api.patch, JackrabbitValue-core.patch
>
>
> BinaryValue should have a method get the unique identifier (if one is available). That way an application may not have to read the stream if that value is already processed.
> When the DataStore is used, a unique identifier is available, so probably this feature is quite simple to implement.
> See also http://www.nabble.com/Workspace.copy()-Question-...-td20435164.html (but please don't reply to this thread from now on - instead add comments to this issue).
> Another feature is getFileName() to get the file name if it is stored in the file system. This method may need a security mechanism, for example getFileName(Session s) so that the system can check it. In any case the file should not be modified, but maybe knowing the file name is already too dangerous in some cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1892) Unique ID for org.apache.jackrabbit.value.BinaryValue

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661972#action_12661972 ] 

Thomas Mueller commented on JCR-1892:
-------------------------------------

> expectations about the string length

Sure, what about "The string is normally about 50 characters long."

> With the current definition the implementation could simply return the base64 encoding of the entire binary value!

For very short binaries it may make sense to do that.

> Do we need the DataStore.hasRecord() method?

No. Currently, DbDataStore.getRecord() throws an exception if the record doesn't exist, I wanted to avoid that.

> race conditions with the garbage collection

Good point! What about getRecordIfStored()? getRecord() would then just call that method and throw an exception if it returns null.

> BinaryValueImpl.getContentIdentity() I'd use blob.getDataIdentifier().toString()

Ups, that was my plan...

> not call getContentIdentity() internally

You are right of course.


> Unique ID for org.apache.jackrabbit.value.BinaryValue
> -----------------------------------------------------
>
>                 Key: JCR-1892
>                 URL: https://issues.apache.org/jira/browse/JCR-1892
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: jackrabbit-jcr-commons
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>         Attachments: JackrabbitValue-api.patch, JackrabbitValue-core.patch
>
>
> BinaryValue should have a method get the unique identifier (if one is available). That way an application may not have to read the stream if that value is already processed.
> When the DataStore is used, a unique identifier is available, so probably this feature is quite simple to implement.
> See also http://www.nabble.com/Workspace.copy()-Question-...-td20435164.html (but please don't reply to this thread from now on - instead add comments to this issue).
> Another feature is getFileName() to get the file name if it is stored in the file system. This method may need a security mechanism, for example getFileName(Session s) so that the system can check it. In any case the file should not be modified, but maybe knowing the file name is already too dangerous in some cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-1892) Unique ID for org.apache.jackrabbit.value.BinaryValue

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller updated JCR-1892:
--------------------------------

    Attachment: JackrabbitValue-core.patch

Implementation and tests for the JackrabbitValue.
Additionally, values already stored in the repository
are not spooled again if the value is stored again.

> Unique ID for org.apache.jackrabbit.value.BinaryValue
> -----------------------------------------------------
>
>                 Key: JCR-1892
>                 URL: https://issues.apache.org/jira/browse/JCR-1892
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: jackrabbit-jcr-commons
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>         Attachments: JackrabbitValue-api.patch, JackrabbitValue-core.patch
>
>
> BinaryValue should have a method get the unique identifier (if one is available). That way an application may not have to read the stream if that value is already processed.
> When the DataStore is used, a unique identifier is available, so probably this feature is quite simple to implement.
> See also http://www.nabble.com/Workspace.copy()-Question-...-td20435164.html (but please don't reply to this thread from now on - instead add comments to this issue).
> Another feature is getFileName() to get the file name if it is stored in the file system. This method may need a security mechanism, for example getFileName(Session s) so that the system can check it. In any case the file should not be modified, but maybe knowing the file name is already too dangerous in some cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1892) Unique ID for org.apache.jackrabbit.value.BinaryValue

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661638#action_12661638 ] 

Jukka Zitting commented on JCR-1892:
------------------------------------

Good stuff! Some comments:

* Should we set some expectations (not necessarily fixed limits) about the size of the content identity string? With the current definition the implementation could simply return the base64 encoding of the entire binary value!

* Do we need the DataStore.hasRecord() method? If we use getRecord() directly we wouldn't need to worry about race conditions with the garbage collection.

* In BinaryValueImpl.getContentIdentity() I'd use blob.getDataIdentifier().toString() (with appropriate null guards) to avoid surprises if we ever decide to make blob.toString() more expressive for debug purposes.

* I wouldn't call getContentIdentity() internally in jackrabbit-core as we already have access to the DataIdentifier in the source value. Also, a malicious user could have passed in their own subclass that overrides getContentIdentity().



> Unique ID for org.apache.jackrabbit.value.BinaryValue
> -----------------------------------------------------
>
>                 Key: JCR-1892
>                 URL: https://issues.apache.org/jira/browse/JCR-1892
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: jackrabbit-jcr-commons
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>         Attachments: JackrabbitValue-api.patch, JackrabbitValue-core.patch
>
>
> BinaryValue should have a method get the unique identifier (if one is available). That way an application may not have to read the stream if that value is already processed.
> When the DataStore is used, a unique identifier is available, so probably this feature is quite simple to implement.
> See also http://www.nabble.com/Workspace.copy()-Question-...-td20435164.html (but please don't reply to this thread from now on - instead add comments to this issue).
> Another feature is getFileName() to get the file name if it is stored in the file system. This method may need a security mechanism, for example getFileName(Session s) so that the system can check it. In any case the file should not be modified, but maybe knowing the file name is already too dangerous in some cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1892) Unique ID for org.apache.jackrabbit.value.BinaryValue

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662918#action_12662918 ] 

Thomas Mueller commented on JCR-1892:
-------------------------------------

What about 

"The identifier is opaque, meaning it can have any format and size, however
it is at normally about 50 characters and at most 255 characters long.
The string only contains Unicode code points from 32 to 127 (including)."

For some databases, the VARCHAR limit is 255. Unicode code points 32 to 127 are the same in ASCII, and I don't like to exclude ':' and '/' to support URLs; and I don't like to exclude base64 encoding.

> If you already have a DataIdentifier for something, then it's highly likely that the corresponding record also exists in the DataStore

One use case is fast data migration (from one repository to another). In that case the record usually doesn't exist yet.

I don't see any harm in getRecordIfStored().

> Unique ID for org.apache.jackrabbit.value.BinaryValue
> -----------------------------------------------------
>
>                 Key: JCR-1892
>                 URL: https://issues.apache.org/jira/browse/JCR-1892
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: jackrabbit-jcr-commons
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>         Attachments: JackrabbitValue-api.patch, JackrabbitValue-core.patch
>
>
> BinaryValue should have a method get the unique identifier (if one is available). That way an application may not have to read the stream if that value is already processed.
> When the DataStore is used, a unique identifier is available, so probably this feature is quite simple to implement.
> See also http://www.nabble.com/Workspace.copy()-Question-...-td20435164.html (but please don't reply to this thread from now on - instead add comments to this issue).
> Another feature is getFileName() to get the file name if it is stored in the file system. This method may need a security mechanism, for example getFileName(Session s) so that the system can check it. In any case the file should not be modified, but maybe knowing the file name is already too dangerous in some cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1892) Unique ID for org.apache.jackrabbit.value.BinaryValue

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662388#action_12662388 ] 

Jukka Zitting commented on JCR-1892:
------------------------------------

> Sure, what about "The string is normally about 50 characters long."

Or: "The returned string contains only ASCII letters and digits and is at most 128 characters long." Both limitations are generic enough for most identifier formats (even a hex representation of a huge 512-bit hash), and they allow client applications to easily (no quoting) and efficiently (bounded size) use the returned strings.

> What about getRecordIfStored()?

I think the current getRecord() is fine. If you already have a DataIdentifier for something, then it's highly likely that the corresponding record also exists in the DataStore, and an exception would truly be exceptional.


> Unique ID for org.apache.jackrabbit.value.BinaryValue
> -----------------------------------------------------
>
>                 Key: JCR-1892
>                 URL: https://issues.apache.org/jira/browse/JCR-1892
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: jackrabbit-jcr-commons
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>         Attachments: JackrabbitValue-api.patch, JackrabbitValue-core.patch
>
>
> BinaryValue should have a method get the unique identifier (if one is available). That way an application may not have to read the stream if that value is already processed.
> When the DataStore is used, a unique identifier is available, so probably this feature is quite simple to implement.
> See also http://www.nabble.com/Workspace.copy()-Question-...-td20435164.html (but please don't reply to this thread from now on - instead add comments to this issue).
> Another feature is getFileName() to get the file name if it is stored in the file system. This method may need a security mechanism, for example getFileName(Session s) so that the system can check it. In any case the file should not be modified, but maybe knowing the file name is already too dangerous in some cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.