You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2017/01/17 09:08:26 UTC

[jira] [Updated] (OAK-4104) Refactor reading records from segments

     [ https://issues.apache.org/jira/browse/OAK-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Dürig updated OAK-4104:
-------------------------------
    Description: 
We should refactor how records (e.g. node states) are read from segments. Currently this is scattered and replicated across various places. All of which hard coding certain indexes into a byte buffer (see calls to {{Record.getOffset}} for how bad this is). 
The current implementation makes it very hard to maintain the code and evolve the segment format. We should optimally have one place per segment version defining the format as a single source of truth which is then reused by the various parts in of the SegmentMK, tooling and tests. 

We should also evaluate 3rd party data serialisation libraries, which could make our lives easier. Focus should be on ease of use, separation of concerns (schema vs. implementation), compactness of format, efficient en/decoding, support for schema evolution. Possible candidates include [protocol buffers|https://developers.google.com/protocol-buffers/] and [Apache Avro|http://avro.apache.org/]. 

  was:
We should refactor how records (e.g. node states) are read from segments. Currently this is scattered and replicated across various places. All of which hard coding certain indexes into a byte buffer (see calls to {{Record.getOffset}} for how bad this is). 
The current implementation makes it very hard to maintain the code and evolve the segment format. We should optimally have one place per segment version defining the format as a single source of truth which is then reused by the various parts in of the SegmentMK, tooling and tests. 


> Refactor reading records from segments
> --------------------------------------
>
>                 Key: OAK-4104
>                 URL: https://issues.apache.org/jira/browse/OAK-4104
>             Project: Jackrabbit Oak
>          Issue Type: Technical task
>          Components: segment-tar
>            Reporter: Michael Dürig
>              Labels: technical_debt
>             Fix For: 1.8
>
>
> We should refactor how records (e.g. node states) are read from segments. Currently this is scattered and replicated across various places. All of which hard coding certain indexes into a byte buffer (see calls to {{Record.getOffset}} for how bad this is). 
> The current implementation makes it very hard to maintain the code and evolve the segment format. We should optimally have one place per segment version defining the format as a single source of truth which is then reused by the various parts in of the SegmentMK, tooling and tests. 
> We should also evaluate 3rd party data serialisation libraries, which could make our lives easier. Focus should be on ease of use, separation of concerns (schema vs. implementation), compactness of format, efficient en/decoding, support for schema evolution. Possible candidates include [protocol buffers|https://developers.google.com/protocol-buffers/] and [Apache Avro|http://avro.apache.org/]. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)