You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2008/04/30 09:25:55 UTC

[jira] Created: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Data Store: UTFDataFormatException when using large minRecordLength
-------------------------------------------------------------------

                 Key: JCR-1563
                 URL: https://issues.apache.org/jira/browse/JCR-1563
             Project: Jackrabbit
          Issue Type: Bug
            Reporter: Thomas Mueller
            Priority: Minor


If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.

This is a problem for both FileDataStore and DbDataStore.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593733#action_12593733 ] 

Jukka Zitting commented on JCR-1563:
------------------------------------

Since we already have a procedure for detecting the type of the data and parsing it accordingly, couldn't we push that down so that it would operate on byte streams instead of strings? This way we wouldn't need to waste twice the amount of bytes when persisting small binary values. Also, this issue with witeUTF would simply not exist.

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Priority: Minor
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594198#action_12594198 ] 

Jukka Zitting commented on JCR-1563:
------------------------------------

I'd rather not muddy the waters by having different fixes in different releases. The issue is documented here and I'm referencing this as a known issue in the 1.4.4 release notes. IMHO we don't need to hand-hold people who can just as well manually avoid configuring too large minRecordLength values until the proper fix gets released.

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: jackrabbit-core
>    Affects Versions: 1.4
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: maximumMinRecordLength.txt
>
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593271#action_12593271 ] 

Thomas Mueller commented on JCR-1563:
-------------------------------------

Because of the prefix, and because the id is string based (DataIdentifier, BLOB id).

Large values that are stored in the data store are persisted as: 
dataStore:<sha-1>

Small values (that are kept in memory) are persisted as: 
0x<hex encoded value>

Values that are stored in a FileSystemResource (resource-based BLOB store) are persisted as: 
fsResource:<blob id>

Values that are stored in a temporary file are persisted as: 
file:<file name> (actually those should never be written)



> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Priority: Minor
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594195#action_12594195 ] 

Thomas Mueller commented on JCR-1563:
-------------------------------------

Well, yes, the correct fix it's non-trivial. 

That's why I have proposed the simple solution maximumMinRecordLength.txt for 1.4.4. That one is trivial :-)

Do you agree now to commit maximumMinRecordLength.txt in 1.4.4?


> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: jackrabbit-core
>    Affects Versions: 1.4
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: maximumMinRecordLength.txt
>
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Stefan Guggisberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593257#action_12593257 ] 

Stefan Guggisberg commented on JCR-1563:
----------------------------------------

i've stumbled over the same issue a couple of years ago;
i've chosen to serialize the byte[] (as jukka suggests),
see o.a.j.c.persistence.util.Serializer, line 226:

                /**
                 * because writeUTF(String) has a size limit of 65k,
                 * Strings are serialized as <length><byte[]>
                 */
                //out.writeUTF(val.toString());   // value
                byte[] bytes = val.toString().getBytes(ENCODING);
                out.writeInt(bytes.length); // lenght of byte[]
                out.write(bytes);   // byte[]



> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Priority: Minor
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller updated JCR-1563:
--------------------------------

          Component/s: jackrabbit-core
        Fix Version/s: core 1.4.4
             Assignee: Thomas Mueller
    Affects Version/s: 1.4

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: jackrabbit-core
>    Affects Versions: 1.4
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Minor
>             Fix For: core 1.4.4
>
>         Attachments: maximumMinRecordLength.txt
>
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-1563:
-------------------------------

    Fix Version/s:     (was: core 1.4.4)
                   1.5

I'd rather not have this in 1.4.4 as the fix is non-trivial and there's a reasonably simple workaround for 1.4.

If there's demand we could include this in a 1.4.5 once people have had a chance to try this out in trunk.

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: jackrabbit-core
>    Affects Versions: 1.4
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: maximumMinRecordLength.txt
>
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593260#action_12593260 ] 

Thomas Mueller commented on JCR-1563:
-------------------------------------

That's true, it was a mistake to use writeUTF. I can try to change it to write the raw bytes in BundleBinding. Data using the new format will have the prefix '-3' (so far we have '-1' and '-2'). I will introduce constants for those magic numbers. However this change is more complex than simply capping minRecordLength to 32000.

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Priority: Minor
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594155#action_12594155 ] 

Jukka Zitting commented on JCR-1563:
------------------------------------

> I'm not sure if you read my comment above [...]

I did, my last comment was targeted at your comment with the prefix stuff (by type detection I meant the parsing of that prefix).

I agree that the best solution would be to write the raw bytes directly. And I wouldn't worry about extra complexity, it's better to solve this right instead of adding layers of extra hacks like the arbitrary limit on minRecordLength.

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Priority: Minor
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594185#action_12594185 ] 

Thomas Mueller commented on JCR-1563:
-------------------------------------

Fixed in revision 653367 (trunk)

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Priority: Minor
>         Attachments: maximumMinRecordLength.txt
>
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593269#action_12593269 ] 

Jukka Zitting commented on JCR-1563:
------------------------------------

I mean, why do we need to convert the value to a string for serialization?

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Priority: Minor
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller updated JCR-1563:
--------------------------------

    Attachment: maximumMinRecordLength.txt

> the best solution would be to write the raw bytes directly.
You are right. I am working on that right now. I have implemented it and need to test it now. I hope that I can commit it later today.

Here is the patch for the workaround (limit the minRecordLength). Hopefully this patch will not be required :-)

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Priority: Minor
>         Attachments: maximumMinRecordLength.txt
>
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594152#action_12594152 ] 

Thomas Mueller commented on JCR-1563:
-------------------------------------

> we already have a procedure for detecting the type of the data 
Do you mean the property type? 
The listed types are not property types, it's always PropertyType.BINARY.

I'm not sure if you read my comment above: "That's true, it was a mistake to use writeUTF. I can try to change it to write the raw bytes in BundleBinding. Data using the new format will have the prefix '-3' (so far we have '-1' and '-2'). I will introduce constants for those magic numbers. However this change is more complex than simply capping minRecordLength to 32000."


> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Priority: Minor
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller resolved JCR-1563.
---------------------------------

    Resolution: Fixed

OK, let's not fix it in 1.4 then.

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: jackrabbit-core
>    Affects Versions: 1.4
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: maximumMinRecordLength.txt
>
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-1563) Data Store: UTFDataFormatException when using large minRecordLength

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593250#action_12593250 ] 

Jukka Zitting commented on JCR-1563:
------------------------------------

Why is writeUTF() used in the first place? Couldn't the data just be written as a byte array?

> Data Store: UTFDataFormatException when using large minRecordLength
> -------------------------------------------------------------------
>
>                 Key: JCR-1563
>                 URL: https://issues.apache.org/jira/browse/JCR-1563
>             Project: Jackrabbit
>          Issue Type: Bug
>            Reporter: Thomas Mueller
>            Priority: Minor
>
> If using a value larger than 33000 for minRecordLength, and then trying to store a value with 33000 bytes, the following exception is thrown: UTFDataFormatException. The reason is that values are serialized using DataOutputStream.writeUTF. There is size limitation of 65 K when using this method. Small entries are hex encoded, and there is a prefix, so the limitation for minRecordLength should be 32000.
> This is a problem for both FileDataStore and DbDataStore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.