You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/24 01:47:37 UTC
[jira] [Updated] (GORA-267) Cassandra composite primary key support

     [ https://issues.apache.org/jira/browse/GORA-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated GORA-267:
--------------------------------------

    Description: 
The extension allows to define primary keys that are represented by avro classes. A mapping specifies how fields of the key class are mapped to the components of composite partition keys and composite column names. This gives users more control with respect to the distribution of data into Cassandra database structures. It is now possible to store data in wide rows with custom indexes that allow for fast range scans on a single node. Also there is no more need for an order-preserving partitioner that is likely to compromise data distribution in the Cassandra cluster.

The extension allows to define primary keys that are represented by avro classes. A mapping specifies how fields of the key class are mapped to the components of composite partition keys and composite column names. This gives users more control with respect to the distribution of data into Cassandra database structures. It is now possible to store data in wide rows with custom indexes that allow for fast range scans on a single node. Also there is no more need for an order-preserving partitioner that is likely to compromise data distribution in the Cassandra cluster.

In essence, composite primary keys with identical partition parts will be written in the same Cassandra row (which is essentially a partition). Within the same row entities are stored in lexical order by their cluster key components. Avro field names are appended as the last component of the composite column name. The current implementation does not substitute super columns. Thus, complex avro fields are still mapped to super columns. Super column families use the same composite primary keys as simple column families. As Gora always fully loads nested complex types, the use of super column families is not really a problem. Yet, super columns could be substituted by another level of column name components below the field qualifiers in future work. It would also be possible to rethink the decomposition of complex nested types beyond the first level.

The implementation uses the concept of Gora partitionQueries in order to decompose row scanning queries into a sets of queries that each operate on a single row. However, such a decomposition is not always possible and real range scans are limited to wide rows (partitions).

The implementation is fully backward compatible. Simple key classes can still be used and row scans are still possible with an order-preserving partitioner. The current junit tests are all passed. Furthermore, I have added an example and some unit tests to demonstrate the use of composite primary keys for time series data.

As mentioned earlier, we are happy to share this extension. I've created a jira issue for it (GORA-267) and will provide the implementation on GitHub (https://github.com/zirpins/gora/tree/GORA-267).

Regards,
Christian


  was:
The extension allows to define primary keys that are represented by avro classes. A mapping specifies how fields of the key class are mapped to the components of composite partition keys and composite column names. This gives users more control with respect to the distribution of data into Cassandra database structures. It is now possible to store data in wide rows with custom indexes that allow for fast range scans on a single node. Also there is no more need for an order-preserving partitioner that is likely to compromise data distribution in the Cassandra cluster.



> Cassandra composite primary key support
> ---------------------------------------
>
>                 Key: GORA-267
>                 URL: https://issues.apache.org/jira/browse/GORA-267
>             Project: Apache Gora
>          Issue Type: Improvement
>          Components: gora-cassandra
>            Reporter: c.zirpins@seeburger.de
>              Labels: features
>             Fix For: 0.4
>
>         Attachments: gora-267.diff
>
>
> The extension allows to define primary keys that are represented by avro classes. A mapping specifies how fields of the key class are mapped to the components of composite partition keys and composite column names. This gives users more control with respect to the distribution of data into Cassandra database structures. It is now possible to store data in wide rows with custom indexes that allow for fast range scans on a single node. Also there is no more need for an order-preserving partitioner that is likely to compromise data distribution in the Cassandra cluster.
> The extension allows to define primary keys that are represented by avro classes. A mapping specifies how fields of the key class are mapped to the components of composite partition keys and composite column names. This gives users more control with respect to the distribution of data into Cassandra database structures. It is now possible to store data in wide rows with custom indexes that allow for fast range scans on a single node. Also there is no more need for an order-preserving partitioner that is likely to compromise data distribution in the Cassandra cluster.
> In essence, composite primary keys with identical partition parts will be written in the same Cassandra row (which is essentially a partition). Within the same row entities are stored in lexical order by their cluster key components. Avro field names are appended as the last component of the composite column name. The current implementation does not substitute super columns. Thus, complex avro fields are still mapped to super columns. Super column families use the same composite primary keys as simple column families. As Gora always fully loads nested complex types, the use of super column families is not really a problem. Yet, super columns could be substituted by another level of column name components below the field qualifiers in future work. It would also be possible to rethink the decomposition of complex nested types beyond the first level.
> The implementation uses the concept of Gora partitionQueries in order to decompose row scanning queries into a sets of queries that each operate on a single row. However, such a decomposition is not always possible and real range scans are limited to wide rows (partitions).
> The implementation is fully backward compatible. Simple key classes can still be used and row scans are still possible with an order-preserving partitioner. The current junit tests are all passed. Furthermore, I have added an example and some unit tests to demonstrate the use of composite primary keys for time series data.
> As mentioned earlier, we are happy to share this extension. I've created a jira issue for it (GORA-267) and will provide the implementation on GitHub (https://github.com/zirpins/gora/tree/GORA-267).
> Regards,
> Christian



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)