You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Louay Kamel (JIRA)" <ji...@apache.org> on 2019/04/22 21:39:00 UTC
[jira] [Created] (CASSANDRA-15096) [RFC CQL v4+] cql_extension:
wide range of unset_values.
Louay Kamel created CASSANDRA-15096:
---------------------------------------
Summary: [RFC CQL v4+] cql_extension: wide range of unset_values.
Key: CASSANDRA-15096
URL: https://issues.apache.org/jira/browse/CASSANDRA-15096
Project: Cassandra
Issue Type: Improvement
Components: CQL/Interpreter, CQL/Semantics
Reporter: Louay Kamel
*# Problem*
The current implementation of unset_value regularly fails (see Issues).
We need to implement a new unset_value(s) mechanism which is robust and will work well for v4+ protocols.
*# Issues*
+1- A client has to encode unset_value for all the columns+
+in an insert-prepared query values.+
example: INSERT INTO table(pkey,ckey,col1,col2,col3,col4) values(?,?,?,?,?,?);
An execute query should unset all the columns one by one by encoding unset_value as "int(-2)"
a- binded-values = (pkey_value, ckey_value, col1_value, unset_value, unset_value, unset_value) or
b- binded-values = (pkey_value, ckey_value, unset_value, unset_value, unset_value, col4_value) etc.
this increase the execute query binary buffer which is in term increase the bandwidth and latency for both request/response.
+2- Returning Select-queries buffer not differentiate between null and unset_value for a subset of given rows.+
example:
imagine you have a dataset in the table where each row of the returning select response have different
unset/null columns, consider the following query:
SELECT * FROM table where pkey = pkey_value;
and with a page_size = 3 rows ,
row1 -> pkey_value, ckey_value, col1_value, null/unset_value, null/unset_value, null/unset_value.
row2 -> pkey_value, ckey_value, null/unset_value, null/unset_value, null/unset_value, col4_value.
row3 -> pkey_value, ckey_value, null/unset_value, null/unset_value, col3_value, null/unset_value.
*# Proposed solution*
Instead of just having null(-1) and unset_value(-2), extending the unset_value(s)
to a range from unset_(-2) to unset_(-2,147,483,648),
where unset_value = unset_(-2)
unset_rest = unset_(-2,147,483,648)
anything in between will be unset_(neg_integer).
+Solution for issue_1:+
a- binded-values = (pkey_value, ckey_value, col1_value, unset_rest)
b- binded-values = (pkey_value, ckey_value, unset_(-4), col4_value)
+Solution for issue_2:+
work with all select-un/prepared responses.
row1 buffer -> pkey_value, ckey_value, col1_value, unset_rest.
this will enable the buffer to shift to a new row.
row2 buffer -> pkey_value, ckey_value, unset_(-4), col4_value.
this will enable the buffer to skip the columns metadata -4+1=-3 columns and start decoding from col4 for the next cell_value in the row.
row3 buffer -> pkey_value, ckey_value, unset_(-3), col3_value, unset_rest.
this buffer is a mix of row1/row2.
this solution not limited to unset_(neg-int) , it can be used on null cell responses to decrease the bandwidth between CQL and client.
to be compatible with all the current v4+ cql/drivers, we should force the client to send a flag with the select query request (either in the frame-header or somewhere in the cql statement),
and for returning buffer we could use the rows flags (ex, has_unset_values?: boolean) to let the driver know if it exist in the page.
*# Benefits*
-implementing this will enable apps to design complex data-model up to 2 billion columns without trading off anything.
-reducing the number of write-prepared statements in datamodel with millions of columns to a highest degree.
-huge impact on the bandwidth/cpu-cycles.
-easy to implement in the client side.
# Record of votes
+1 Louay Kamel
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org