You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Andrei Dulceanu (JIRA)" <ji...@apache.org> on 2017/08/08 15:34:00 UTC

[jira] [Commented] (OAK-5902) Cold standby should allow syncing of blobs bigger than 2.2 GB

    [ https://issues.apache.org/jira/browse/OAK-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118474#comment-16118474 ] 

Andrei Dulceanu commented on OAK-5902:
--------------------------------------

In the end I chose to implement custom chunking logic on both client and server because:

* {{LengthFieldFrameDecoder}} could be still used on the client => no need to make changes in encoding/decoding for the rest of the packets sent between client and server (head state, segments, references, etc.)
* Using {{ChunkedStream}} and {{ChunkedWriteHandler}} on the server needed to alter the whole decoding on the client (removing {{LengthFieldFrameDecoder}}, see above), implementing a state machine, etc. Basically, before this change every packet had a header and a payload. After implementing the change all packets except blobs stay as they are, while blobs have only one header for the whole blob, followed by {{NO_CHUNKS}} chunks of data without any header.
* Computing the hash of the {{InputStream}} on the fly with {{HashingInputStream}} (from guava) means that under the hood, this class buffers all the bytes in an internal stream -> e.g. lots of memory consumed on the server side which goes against the whole chunking concept (on both client and server).
* It prevents transferring chunks that are too little, as the chunks size in {{ChunkedStream}} is computed as {{Math.min(this.chunkSize, in.available())}}. In practice this means at most 30K per chunk.

All changes can be tracked in [0]. I'm a bit worried about the additional {{-Xmx3584m}} heap size parameter added in {{pom.xml}} for {{DataStoreTestBase.testSyncBigBlob}}. This test verifies the sync of a 2.5 GB blob and without the increased heap size it fails. As a side note, having this additional test increases the duration of {{ExternalPrivateStoreIT}} from ~30 seconds to ~ 180 seconds on my machine. But I guess we can decide later if we want to keep it or not.

[~frm], could you take a look at the proposed changes?

[0] https://github.com/dulceanu/jackrabbit-oak/tree/cold-standby/chunking

> Cold standby should allow syncing of blobs bigger than 2.2 GB
> -------------------------------------------------------------
>
>                 Key: OAK-5902
>                 URL: https://issues.apache.org/jira/browse/OAK-5902
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>    Affects Versions: 1.6.1
>            Reporter: Andrei Dulceanu
>            Assignee: Andrei Dulceanu
>            Priority: Minor
>             Fix For: 1.8, 1.7.6
>
>
> Currently there is a limitation for the maximum binary size (in bytes) to be synced between primary and standby instances. This matches {{Integer.MAX_VALUE}} (2,147,483,647) bytes and no binaries bigger than this limit can be synced between the instances.
> Per comment at [1], the current protocol needs to be changed to allow sending of binaries in chunks, to surpass this limitation.
> [1] https://github.com/apache/jackrabbit-oak/blob/1.6/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/StandbyClient.java#L125



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)