You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2017/01/04 01:23:58 UTC

[jira] [Commented] (PHOENIX-2565) Store data for immutable tables in single KeyValue

    [ https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796808#comment-15796808 ] 

James Taylor commented on PHOENIX-2565:
---------------------------------------

Also, this format is optimized for dense data. See PHOENIX-3559. I'm not sure we'll find a serialization format that's good for both dense and sparse storage, IMHO it's ok to optimize for dense storage provided we support plugging in other storage formats optimized in other dimensions.

bq.  I'm not sure why we can't just concatenate the bytes with a delimiter (including special encoding for null and tracking of a length of fixed width datatype by schema).
This is more or less the format we use for the bytes that make up the row key. There are limitations in that a VARBINARY and an ARRAY may only appear at the end of the row key since there's no delimiter byte that we can count on not appearing in the data. You'd also need to walk through the bytes to get to the start of the column data (which would get slower and slower as the number of columns increase). The new format allows you to look up the byte offset via an array lookup so it's pretty fast. We also don't need to store any separator bytes.


> Store data for immutable tables in single KeyValue
> --------------------------------------------------
>
>                 Key: PHOENIX-2565
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2565
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Thomas D'Silva
>         Attachments: PHOENIX-2565-v2.patch, PHOENIX-2565-wip.patch, PHOENIX-2565.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never update a column value, it'd be more efficient to store all column values for a row in a single KeyValue. We could use the existing format we have for variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, you'd no longer be allowed to transition an existing table to/from being immutable. I think the best approach would be to introduce a new IMMUTABLE keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)