You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mehdi Bada <me...@dbi-services.com> on 2016/09/30 16:24:20 UTC

Cassandra data model right definition

Hi all, 

I have a theoritical question: 
- Is Apache Cassandra really a column store? 
Column store mean storing the data as column rather than as a rows. 

In fact C* store the data as row, and data is partionned with row key. 

Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it true for you also??? 

Many thanks in advance for your reply 

Best Regards 
Mehdi Bada 
---- 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.bada@dbi-services.com 
www.dbi-services.com 



⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team 

Re: Cassandra data model right definition

Posted by Edward Capriolo <ed...@gmail.com>.
Then:
Physically: A data store which physically structured-log-merge of SSTables
(see) https://cloud.google.com/bigtable/.
Now:
One of the change made in Apache Cassandra 3.0 is a relatively
important refactor
of the storage engine <https://issues.apache.org/jira/browse/CASSANDRA-8099>.
I say refactor because the basics have not changed: data is still inserted
in a memtable which get flushed over time to a sstable with compaction
baby-sitting the set of sstables on disk, and reads uses both memtable and
sstables to retrieve results. But the internal structure of the objects
manipulated in those phases has changed, and that entails a significant
amount of refactoring in the code. The principal motivation is that new
storage engine more directly manipulate the structure that is exposed
through CQL, and knowing that structure at the storage engine level has
many advantages: some features are easier to add and the engine has more
information to optimize.

http://www.datastax.com/2015/12/storage-engine-30

Then:
An RPC abstraction over he data with methods like get_slice which selected
columns from a single 'row key'
Now:
A Query based abstraction over the data with queries like SELECT * FROM
table WHERE x=y in which most language features works over single
'partitions'

And 3? implementations of secondary index like things:
Secondary Indexes
Materialized Views
SasiIndex

Which add to query functionality typically by storing an index (or
secondary form) in a way optimized for given query functionality.






On Fri, Sep 30, 2016 at 1:52 PM, DuyHai Doan <do...@gmail.com> wrote:

> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.org
>> /what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares <joaquin@thelastpickle.com
>> > wrote:
>>
>>> Hi Mehdi,
>>>
>>> I can help clarify a few things.
>>>
>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>> million columns.
>>>
>>> Cassandra partitions data to certain nodes based on the partition
>>> key(s), but does provide the option of setting zero or more clustering
>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>> key.
>>>
>>> When writing to Cassandra, you will need to provide the full primary
>>> key, however, when reading from Cassandra, you only need to provide the
>>> full partition key.
>>>
>>> When you only provide the partition key for a read operation, you're
>>> able to return all columns that exist on that partition with low latency.
>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>
>>> Consider the schema:
>>>
>>> CREATE TABLE foo (
>>>   bar uuid,
>>>
>>>   boz uuid,
>>>
>>>   baz timeuuid,
>>>   data1 text,
>>>
>>>   data2 text,
>>>
>>>   PRIMARY KEY ((bar, boz), baz)
>>>
>>> );
>>>
>>>
>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>> define a data* field for a particular CQL row, then nothing is stored nor
>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>
>>> However, all writes to the same bar/boz will end up on the same
>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>> not a partition key is stored as a column, including clustering keys (this
>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>
>>> In this way you can get fast responses for all activity for bar/boz
>>> either over time, or for a specific time, with roughly the same number of
>>> disk seeks, with varying lengths on the disk scans.
>>>
>>> Hope that helps!
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>> wrote:
>>>
>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>> /system/Cassandra
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso
>>>> <https://twitter.com/calonso>
>>>>
>>>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a theoritical question:
>>>>> - Is Apache Cassandra really a column store?
>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>
>>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>>
>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is
>>>>> it true for you also???
>>>>>
>>>>> Many thanks in advance for your reply
>>>>>
>>>>> Best Regards
>>>>> Mehdi Bada
>>>>> ----
>>>>>
>>>>> *Mehdi Bada* | Consultant
>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>>> 96 15
>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>> mehdi.bada@dbi-services.com
>>>>> www.dbi-services.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>>>> team
>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Cassandra data model right definition

Posted by Benedict Elliott Smith <be...@apache.org>.
The equivalent statement would be:  "Like a bike, a scooter has wheels."

This is a really important linguistic distinction you seem to be glossing
over.  It is not saying "A is like X," it is saying "A has specific traits
in common with X."

For example "Like cancer, heart disease is a leading cause of mortality."
 Cancer is really very unalike heart disease, but it is similar in that it
causes death.  This kind of phraseology is tremendously common, and I see
nothing wrong with it.

This conversation suggests people like yourself are indeed confused by
these constructs, so we should perhaps avoid them where that confusion can
confound further understanding.  So, as already suggested, file a
ticket/pull request to update the phrasing.

----

To respond to your second email:  That is simply not how C* stores its data
on disk.  It does, as of 3.x, store it almost exactly (in general terms;
the minutiae obviously differ from system to system) like an RDBMS.  But
even before this, the inefficiency of the storage format doesn't change the
fact it is a "row store" - the literature makes no prescriptions on the
data format besides the spatial locality of rows vs columns.

This all ignores the LSMT confounder, which I am unsure is what you were
referring to.  That is largely orthogonal to this discussion AFAICT, though
if you wanted to call C* a "partitioned LSMT row store" I certainly
wouldn't object.  Of course the more qualifiers you add, the more it starts
to become a description rather than a named category / shorthand.  It's
also not clear C* will remain exclusively LSMT based indefinitely.

It seems like this conversation is a bit of a dead end to me, so I will try
really hard not to respond to further follow ups.  Regrettably,
https://xkcd.com/386/




On 3 October 2016 at 14:25, Edward Capriolo <ed...@gmail.com> wrote:

> The phrase is defensible, but that is the root of the problem. Take for
> example a skateboard.
>
> "A skateboard is like a bike because it has wheels and you ride on it."
>
> That is true and defensively true. :) However with not much more text you
> can accurately describe what it is, as opposed to something it is almost
> like.
>
> "A skateboard is a thin piece of wood on top of four small wheels that you
> stand on and ride"
>
> The old sentence Cassandra statement was something to the effect of "with
> the storage model of big table and the consistency model of dynamo". This
> accurately described the system and gave reference to specific known
> quantities (bigtable/dynamo) in which white papers existed for further
> reading.
>
> On Mon, Oct 3, 2016 at 6:24 AM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
>> While that sentence leaves a lot to be desired (for me because it confers
>> a different meaning on row store), it doesn't say "Cassandra is like a
>> RDBMS" - it says "like an RDBMS, it organises data by rows and columns" -
>> i.e., in this regard only it is like an RDBMS, not more generally.
>>
>> I believe it was meant to help people, especially those afraid of the
>> NoSQL thrift world, understand that it still uses the basic concept of a
>> rows and columns they are used to.  I agree it could be improved to
>> minimise the chance of misreading it, and I'm certain contributions would
>> be welcome here.
>>
>> I don't personally want to get bogged down in analysing every piece of
>> text anyone has ever written, so I'll bow out of further discussion on
>> this.  These phrases may all be suboptimal, but they are certainly
>> defensible.  Column store is not, that's all I wanted to contribute here.
>>
>>
>>
>>
>>
>> On 1 October 2016 at 19:35, Peter Lin <wo...@gmail.com> wrote:
>>
>>> I'll second Ed's comment.
>>>
>>> The documentation should be more careful when using phrases "like
>>> relational databases". When we look at the history of relational databases,
>>> people expect certain things like ACID transactions, primary/foriegn key
>>> constraints, query planners, joins and relational algebra. Clearly
>>> Cassandra's storage engine does not follow most of those principals for a
>>> good reason.
>>>
>>> The term row oriented storage would be more descriptive and appropriate.
>>> It avoids conflating Cassandra storage engine with "traditional" relational
>>> storage engines. Those of us that have spent over a decade using IBM DB2,
>>> Oracle, Sql Server and Sybase tend to think of relational databases in a
>>> certain way. If we go back to 1998, most RDBMS storage engine had a max row
>>> size limit. Databases like Sybase before version 9 preferred RAW disk for
>>> optimal performance. I can go on and on, but there's no point really.
>>>
>>> Cassandra's storage engine is "row oriented", but it's not relational in
>>> RDBMS sense. We do everyone a huge disservice by using confusing
>>> terminology and then making fun of those who get confused. No one wins when
>>> that happens. At the end of the day, what differentiates cassandra's
>>> storage engine is it support static and dynamic columns, which traditional
>>> RDBMS don't support today. Calling Cassandra storage "distributed tables"
>>> doesn't really help in my bias opinion.
>>>
>>> For example, if you tell a SqlServer or Oracle RAC admin "cassandra uses
>>> distributed tables" they might answer "so what, sql server and oracle can
>>> do that too." The difference is with RDBMS the partitioning is optional and
>>> requires more work to configure. Whereas with Cassandra you can have
>>> everything in 1 node, which means there is only 1 partition and no
>>> different to 1 instance of sql server. Where you win is when you need to
>>> add 2 more nodes, Cassandra makes this easier whereas with SqlServer and
>>> Oracle you have to do a little bit more work. I've lost count of how many
>>> times I've to explained noSql databases to RDBMS admins and had to explain
>>> the official docs are stupid.
>>>
>>>
>>>
>>> On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo <ed...@gmail.com>
>>> wrote:
>>>
>>>> https://github.com/apache/cassandra
>>>>
>>>> Row store <http://wiki.apache.org/cassandra/DataModel> means that like
>>>> relational databases, Cassandra organizes data by rows and columns. The
>>>> Cassandra Query Language (CQL) is a close relative of SQL.
>>>>
>>>> I generally do not know what to say about these high level
>>>> "oversimplifications" like "firewalls block hackers". Are there "firewalls"
>>>> or do they mean IP routers with layer 4 packet inspections and layer 3
>>>> Access Control Lists?
>>>>
>>>> We say (and I catch myself doing it all the time) "like relational
>>>> databases" often as if all relational databases work alike. A columnar
>>>> store like HP Vertica is a relational database.MySql has different storage
>>>> engines does MyIsam work like InnoDB?
>>>>
>>>> Google docs organizes data by rows and columns as well. You can wrap
>>>> any storage system into an API that makes them look like rows and columns.
>>>> Microsoft LINQ can enumerate your network cars and query them
>>>> https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really
>>>> does not make your network cards a "row store"
>>>>
>>>> "Theoretically a row can have 2 billion columns, but in practice it
>>>> shouldn't have more than 100 million columns."
>>>> In practice (In my experience) the number is much lower than 100
>>>> million, and if the data actually is deleted and readded frequently the
>>>> number of live columns(rows, whatever) you can use happily is even lower
>>>>
>>>>
>>>> I believe on twitter (I am unable to find the tweet) someone was trying
>>>> to convince me Cassandra was a "columnar analytic database".  ROFL
>>>>
>>>> I believe telling someone it "row store" "like a database", is not a
>>>> good idea. They might away content with that explanation. You are setting
>>>> them up to walk into an anti-pattern. Like a case where the user is
>>>> attempting to write and deleting 1 row and 1 column 6 billion times a day.
>>>> Then you end up explaining to them http://stackoverflow.com/
>>>> questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached
>>>>
>>>>
>>>> and how the cassandra storage model is not "like a relational
>>>> database".
>>>>
>>>> On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo <edlinuxguru@gmail.com
>>>> > wrote:
>>>>
>>>>> I can iterate over JSON data stored in mongo and present it as a table
>>>>> with rows and columns. It does not make mongo a rowstore.
>>>>>
>>>>> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <
>>>>> edlinuxguru@gmail.com> wrote:
>>>>>
>>>>>> The problem with calling it a row store:
>>>>>>
>>>>>> https://en.wikipedia.org/wiki/Row_(database)
>>>>>>
>>>>>> In the context of a relational database
>>>>>> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also
>>>>>> called a record
>>>>>> <https://en.wikipedia.org/wiki/Record_(computer_science)> or tuple
>>>>>> <https://en.wikipedia.org/wiki/Tuple>—represents a single,
>>>>>> implicitly structured data <https://en.wikipedia.org/wiki/Data> item
>>>>>> in a table <https://en.wikipedia.org/wiki/Table_(database)>. In
>>>>>> simple terms, a database table can be thought of as consisting of
>>>>>> *rows* andcolumns <https://en.wikipedia.org/wiki/Column_(database)>
>>>>>>  or fields <https://en.wikipedia.org/wiki/Field_(computer_science)>.[
>>>>>> 1] <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each
>>>>>> row in a table represents a set of related data, and every row in the table
>>>>>> has the same structure.
>>>>>>
>>>>>> When you have static columns and rows with maps, and lists, it is
>>>>>> hard to argue that every row has the same structure. Physically at the
>>>>>> storage layer they do not have the same structure and logically when
>>>>>> accessing the data they barely have the same structure, as the static
>>>>>> column is just appearing inside each row it is actually not contained in.
>>>>>>
>>>>>> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <jo...@jonhaddad.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +1000 to what Benedict says. I usually call it a "partitioned row
>>>>>>> store" which usually needs some extra explanation but is more accurate than
>>>>>>> "column family" or whatever other thrift era terminology people still use.
>>>>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>>>>>> table. This definition is closer to CQL and has some academic background
>>>>>>>> (distributed hash table).
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>>>>>> benedict@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Cassandra is not a "wide column store" anymore.  It has a schema.
>>>>>>>>> Only thrift users no longer think they have a schema (though they do), and
>>>>>>>>> thrift is being deprecated.
>>>>>>>>>
>>>>>>>>> I really wish everyone would kill the term "wide column store"
>>>>>>>>> with fire.  It seems to have never meant anything beyond "schema-less,
>>>>>>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>>>>>>
>>>>>>>>> Not only that, but people don't even seem to realise the term
>>>>>>>>> "column store" existed long before "wide column store" and the latter is
>>>>>>>>> often abbreviated to the former, as here:
>>>>>>>>> http://www.planetcassandra.org/what-is-nosql/
>>>>>>>>>
>>>>>>>>> Since it no longer applies, let's all agree as a community to
>>>>>>>>> forget this awful nomenclature ever existed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>>>>>>> joaquin@thelastpickle.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Mehdi,
>>>>>>>>>>
>>>>>>>>>> I can help clarify a few things.
>>>>>>>>>>
>>>>>>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a
>>>>>>>>>> row can have 2 billion columns, but in practice it shouldn't have more than
>>>>>>>>>> 100 million columns.
>>>>>>>>>>
>>>>>>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>>>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>>>>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>>>>>>>> key.
>>>>>>>>>>
>>>>>>>>>> When writing to Cassandra, you will need to provide the full
>>>>>>>>>> primary key, however, when reading from Cassandra, you only need to provide
>>>>>>>>>> the full partition key.
>>>>>>>>>>
>>>>>>>>>> When you only provide the partition key for a read operation,
>>>>>>>>>> you're able to return all columns that exist on that partition with low
>>>>>>>>>> latency. These columns are displayed as "CQL rows" to make it easier to
>>>>>>>>>> reason about.
>>>>>>>>>>
>>>>>>>>>> Consider the schema:
>>>>>>>>>>
>>>>>>>>>> CREATE TABLE foo (
>>>>>>>>>>   bar uuid,
>>>>>>>>>>
>>>>>>>>>>   boz uuid,
>>>>>>>>>>
>>>>>>>>>>   baz timeuuid,
>>>>>>>>>>   data1 text,
>>>>>>>>>>
>>>>>>>>>>   data2 text,
>>>>>>>>>>
>>>>>>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>>>>>>
>>>>>>>>>> );
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> When you write to Cassandra you will need to send bar, boz, and
>>>>>>>>>> baz and optionally data*, if it's relevant for that CQL row. If you chose
>>>>>>>>>> not to define a data* field for a particular CQL row, then nothing is
>>>>>>>>>> stored nor allocated on disk. But I wouldn't consider that caveat to be
>>>>>>>>>> "schema-less".
>>>>>>>>>>
>>>>>>>>>> However, all writes to the same bar/boz will end up on the same
>>>>>>>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>>>>>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>>>>>>>> not a partition key is stored as a column, including clustering keys (this
>>>>>>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>>>>>>
>>>>>>>>>> In this way you can get fast responses for all activity for
>>>>>>>>>> bar/boz either over time, or for a specific time, with roughly the same
>>>>>>>>>> number of disk seeks, with varying lengths on the disk scans.
>>>>>>>>>>
>>>>>>>>>> Hope that helps!
>>>>>>>>>>
>>>>>>>>>> Joaquin Casares
>>>>>>>>>> Consultant
>>>>>>>>>> Austin, TX
>>>>>>>>>>
>>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>>> http://www.thelastpickle.com
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <
>>>>>>>>>> info@mrcalonso.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>>>>>>>> /system/Cassandra
>>>>>>>>>>>
>>>>>>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>>>>>>> <https://twitter.com/calonso>
>>>>>>>>>>>
>>>>>>>>>>> On 30 September 2016 at 18:24, Mehdi Bada <
>>>>>>>>>>> mehdi.bada@dbi-services.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I have a theoritical question:
>>>>>>>>>>>> - Is Apache Cassandra really a column store?
>>>>>>>>>>>> Column store mean storing the data as column rather than as a
>>>>>>>>>>>> rows.
>>>>>>>>>>>>
>>>>>>>>>>>> In fact C* store the data as row, and data is partionned with
>>>>>>>>>>>> row key.
>>>>>>>>>>>>
>>>>>>>>>>>> Finally, for me, Cassandra is a row oriented schema less
>>>>>>>>>>>> DBMS.... Is it true for you also???
>>>>>>>>>>>>
>>>>>>>>>>>> Many thanks in advance for your reply
>>>>>>>>>>>>
>>>>>>>>>>>> Best Regards
>>>>>>>>>>>> Mehdi Bada
>>>>>>>>>>>> ----
>>>>>>>>>>>>
>>>>>>>>>>>> *Mehdi Bada* | Consultant
>>>>>>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41
>>>>>>>>>>>> 32 422 96 15
>>>>>>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>>>>>>>> mehdi.bada@dbi-services.com
>>>>>>>>>>>> www.dbi-services.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! –
>>>>>>>>>>>> Join the team
>>>>>>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra data model right definition

Posted by selcuk mart <ad...@hostingdevi.com>.
unsubscribe


3.10.2016 16:25 tarihinde Edward Capriolo yazd\u0131:
> The phrase is defensible, but that is the root of the problem. Take 
> for example a skateboard.
>
> "A skateboard is like a bike because it has wheels and you ride on it."
>
> That is true and defensively true. :) However with not much more text 
> you can accurately describe what it is, as opposed to something it is 
> almost like.
>
> "A skateboard is a thin piece of wood on top of four small wheels that 
> you stand on and ride"
>
> The old sentence Cassandra statement was something to the effect of 
> "with the storage model of big table and the consistency model of 
> dynamo". This accurately described the system and gave reference to 
> specific known quantities (bigtable/dynamo) in which white papers 
> existed for further reading.
>
> On Mon, Oct 3, 2016 at 6:24 AM, Benedict Elliott Smith 
> <benedict@apache.org <ma...@apache.org>> wrote:
>
>     While that sentence leaves a lot to be desired (for me because it
>     confers a different meaning on row store), it doesn't say
>     "Cassandra is like a RDBMS" - it says "like an RDBMS, it organises
>     data by rows and columns" - i.e., in this regard only it is like
>     an RDBMS, not more generally.
>
>     I believe it was meant to help people, especially those afraid of
>     the NoSQL thrift world, understand that it still uses the basic
>     concept of a rows and columns they are used to.  I agree it could
>     be improved to minimise the chance of misreading it, and I'm
>     certain contributions would be welcome here.
>
>     I don't personally want to get bogged down in analysing every
>     piece of text anyone has ever written, so I'll bow out of further
>     discussion on this.  These phrases may all be suboptimal, but they
>     are certainly defensible.  Column store is not, that's all I
>     wanted to contribute here.
>
>
>
>
>
>     On 1 October 2016 at 19:35, Peter Lin <woolfel@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         I'll second Ed's comment.
>
>         The documentation should be more careful when using phrases
>         "like relational databases". When we look at the history of
>         relational databases, people expect certain things like ACID
>         transactions, primary/foriegn key constraints, query planners,
>         joins and relational algebra. Clearly Cassandra's storage
>         engine does not follow most of those principals for a good reason.
>
>         The term row oriented storage would be more descriptive and
>         appropriate. It avoids conflating Cassandra storage engine
>         with "traditional" relational storage engines. Those of us
>         that have spent over a decade using IBM DB2, Oracle, Sql
>         Server and Sybase tend to think of relational databases in a
>         certain way. If we go back to 1998, most RDBMS storage engine
>         had a max row size limit. Databases like Sybase before version
>         9 preferred RAW disk for optimal performance. I can go on and
>         on, but there's no point really.
>
>         Cassandra's storage engine is "row oriented", but it's not
>         relational in RDBMS sense. We do everyone a huge disservice by
>         using confusing terminology and then making fun of those who
>         get confused. No one wins when that happens. At the end of the
>         day, what differentiates cassandra's storage engine is it
>         support static and dynamic columns, which traditional RDBMS
>         don't support today. Calling Cassandra storage "distributed
>         tables" doesn't really help in my bias opinion.
>
>         For example, if you tell a SqlServer or Oracle RAC admin
>         "cassandra uses distributed tables" they might answer "so
>         what, sql server and oracle can do that too." The difference
>         is with RDBMS the partitioning is optional and requires more
>         work to configure. Whereas with Cassandra you can have
>         everything in 1 node, which means there is only 1 partition
>         and no different to 1 instance of sql server. Where you win is
>         when you need to add 2 more nodes, Cassandra makes this easier
>         whereas with SqlServer and Oracle you have to do a little bit
>         more work. I've lost count of how many times I've to explained
>         noSql databases to RDBMS admins and had to explain the
>         official docs are stupid.
>
>
>
>         On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo
>         <edlinuxguru@gmail.com <ma...@gmail.com>> wrote:
>
>             https://github.com/apache/cassandra
>             <https://github.com/apache/cassandra>
>
>             Row store
>             <http://wiki.apache.org/cassandra/DataModel> means that
>             like relational databases, Cassandra organizes data by
>             rows and columns. The Cassandra Query Language (CQL) is a
>             close relative of SQL.
>
>             I generally do not know what to say about these high level
>             "oversimplifications" like "firewalls block hackers". Are
>             there "firewalls" or do they mean IP routers with layer 4
>             packet inspections and layer 3 Access Control Lists?
>
>             We say (and I catch myself doing it all the time) "like
>             relational databases" often as if all relational databases
>             work alike. A columnar store like HP Vertica is a
>             relational database.MySql has different storage engines
>             does MyIsam work like InnoDB?
>
>             Google docs organizes data by rows and columns as well.
>             You can wrap any storage system into an API that makes
>             them look like rows and columns. Microsoft LINQ can
>             enumerate your network cars and query them
>             https://msdn.microsoft.com/en-us/library/bb308959.aspx
>             <https://msdn.microsoft.com/en-us/library/bb308959.aspx> ,
>             that really does not make your network cards a "row store"
>
>             "Theoretically a row can have 2 billion columns, but in
>             practice it shouldn't have more than 100 million columns."
>             In practice (In my experience) the number is much lower
>             than 100 million, and if the data actually is deleted and
>             readded frequently the number of live columns(rows,
>             whatever) you can use happily is even lower
>
>
>             I believe on twitter (I am unable to find the tweet)
>             someone was trying to convince me Cassandra was a
>             "columnar analytic database".  ROFL
>
>             I believe telling someone it "row store" "like a
>             database", is not a good idea. They might away content
>             with that explanation. You are setting them up to walk
>             into an anti-pattern. Like a case where the user is
>             attempting to write and deleting 1 row and 1 column 6
>             billion times a day. Then you end up explaining to them
>             http://stackoverflow.com/questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached
>             <http://stackoverflow.com/questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached>
>
>
>             and how the cassandra storage model is not "like a
>             relational database".
>
>             On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo
>             <edlinuxguru@gmail.com <ma...@gmail.com>> wrote:
>
>                 I can iterate over JSON data stored in mongo and
>                 present it as a table with rows and columns. It does
>                 not make mongo a rowstore.
>
>                 On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo
>                 <edlinuxguru@gmail.com <ma...@gmail.com>>
>                 wrote:
>
>                     The problem with calling it a row store:
>
>                     https://en.wikipedia.org/wiki/Row_(database)
>                     <https://en.wikipedia.org/wiki/Row_%28database%29>
>
>                     In the context of a relational database
>                     <https://en.wikipedia.org/wiki/Relational_database>,
>                     a *row*\u2014also called a record
>                     <https://en.wikipedia.org/wiki/Record_%28computer_science%29> or
>                     tuple
>                     <https://en.wikipedia.org/wiki/Tuple>\u2014represents a
>                     single, implicitly structured data
>                     <https://en.wikipedia.org/wiki/Data> item in a
>                     table
>                     <https://en.wikipedia.org/wiki/Table_%28database%29>.
>                     In simple terms, a database table can be thought
>                     of as consisting of /rows/ andcolumns
>                     <https://en.wikipedia.org/wiki/Column_%28database%29> or
>                     fields
>                     <https://en.wikipedia.org/wiki/Field_%28computer_science%29>.^[1]
>                     <https://en.wikipedia.org/wiki/Row_%28database%29#cite_note-1>
>                      Each row in a table represents a set of related
>                     data, and every row in the table has the same
>                     structure.
>
>                     When you have static columns and rows with maps,
>                     and lists, it is hard to argue that every row has
>                     the same structure. Physically at the storage
>                     layer they do not have the same structure and
>                     logically when accessing the data they barely have
>                     the same structure, as the static column is just
>                     appearing inside each row it is actually not
>                     contained in.
>
>                     On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad
>                     <jon@jonhaddad.com <ma...@jonhaddad.com>> wrote:
>
>                         +1000 to what Benedict says. I usually call it
>                         a "partitioned row store" which usually needs
>                         some extra explanation but is more accurate
>                         than "column family" or whatever other thrift
>                         era terminology people still use.
>                         On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan
>                         <doanduyhai@gmail.com
>                         <ma...@gmail.com>> wrote:
>
>                             I used to present Cassandra as a NoSQL
>                             datastore with "distributed" table. This
>                             definition is closer to CQL and has some
>                             academic background (distributed hash table).
>
>
>                             On Fri, Sep 30, 2016 at 7:43 PM, Benedict
>                             Elliott Smith <benedict@apache.org
>                             <ma...@apache.org>> wrote:
>
>                                 Cassandra is not a "wide column store"
>                                 anymore.  It has a schema. Only thrift
>                                 users no longer think they have a
>                                 schema (though they do), and thrift is
>                                 being deprecated.
>
>                                 I really wish everyone would kill the
>                                 term "wide column store" with fire. 
>                                 It seems to have never meant anything
>                                 beyond "schema-less, row-oriented",
>                                 and a "column store" means literally
>                                 the opposite of this.
>
>                                 Not only that, but people don't even
>                                 seem to realise the term "column
>                                 store" existed long before "wide
>                                 column store" and the latter is often
>                                 abbreviated to the former, as here:
>                                 http://www.planetcassandra.org/what-is-nosql/
>                                 <http://www.planetcassandra.org/what-is-nosql/>
>
>
>                                 Since it no longer applies, let's all
>                                 agree as a community to forget this
>                                 awful nomenclature ever existed.
>
>
>
>                                 On 30 September 2016 at 18:09, Joaquin
>                                 Casares <joaquin@thelastpickle.com
>                                 <ma...@thelastpickle.com>> wrote:
>
>                                     Hi Mehdi,
>
>                                     I can help clarify a few things.
>
>                                     As Carlos said, Cassandra is a
>                                     Wide Column Store. Theoretically a
>                                     row can have 2 billion columns,
>                                     but in practice it shouldn't have
>                                     more than 100 million columns.
>
>                                     Cassandra partitions data to
>                                     certain nodes based on the
>                                     partition key(s), but does provide
>                                     the option of setting zero or more
>                                     clustering keys. Together,
>                                     the partition key(s) and
>                                     clustering key(s) form the primary
>                                     key.
>
>                                     When writing to Cassandra, you
>                                     will need to provide the full
>                                     primary key, however, when reading
>                                     from Cassandra, you only need to
>                                     provide the full partition key.
>
>                                     When you only provide the
>                                     partition key for a read
>                                     operation, you're able to return
>                                     all columns that exist on that
>                                     partition with low latency. These
>                                     columns are displayed as "CQL
>                                     rows" to make it easier to reason
>                                     about.
>
>                                     Consider the schema:
>
>                                         CREATE TABLE foo (
>                                           bar uuid,
>
>                                           boz uuid,
>
>                                           baz timeuuid,
>                                           data1 text,
>
>                                           data2 text,
>
>                                           PRIMARY KEY ((bar, boz), baz)
>
>                                         );
>
>
>                                     When you write to Cassandra you
>                                     will need to send bar, boz, and
>                                     baz and optionally data*, if it's
>                                     relevant for that CQL row. If you
>                                     chose not to define a data* field
>                                     for a particular CQL row, then
>                                     nothing is stored nor allocated on
>                                     disk. But I wouldn't consider that
>                                     caveat to be "schema-less".
>
>                                     However, all writes to the same
>                                     bar/boz will end up on the same
>                                     Cassandra replica set (a
>                                     configurable number of nodes) and
>                                     be stored on the same place(s) on
>                                     disk within the SSTable(s). And on
>                                     disk, each field that's not a
>                                     partition key is stored as a
>                                     column, including clustering keys
>                                     (this is optimized in Cassandra
>                                     3+, but now we're getting deep
>                                     into internals).
>
>                                     In this way you can get fast
>                                     responses for all activity for
>                                     bar/boz either over time, or for a
>                                     specific time, with roughly the
>                                     same number of disk seeks, with
>                                     varying lengths on the disk scans.
>
>                                     Hope that helps!
>
>                                     Joaquin Casares
>                                     Consultant
>                                     Austin, TX
>
>                                     Apache Cassandra Consulting
>                                     http://www.thelastpickle.com
>
>                                     On Fri, Sep 30, 2016 at 11:40 AM,
>                                     Carlos Alonso <info@mrcalonso.com
>                                     <ma...@mrcalonso.com>> wrote:
>
>                                         Cassandra is a Wide Column
>                                         Store
>                                         http://db-engines.com/en/system/Cassandra
>                                         <http://db-engines.com/en/system/Cassandra>
>
>                                         Carlos Alonso | Software
>                                         Engineer | @calonso
>                                         <https://twitter.com/calonso>
>
>                                         On 30 September 2016 at 18:24,
>                                         Mehdi Bada
>                                         <mehdi.bada@dbi-services.com
>                                         <ma...@dbi-services.com>>
>                                         wrote:
>
>                                             Hi all,
>
>                                             I have a theoritical
>                                             question:
>                                             - Is Apache Cassandra
>                                             really a column store?
>                                             Column store mean storing
>                                             the data as column rather
>                                             than as a rows.
>
>                                             In fact C* store the data
>                                             as row, and data is
>                                             partionned with row key.
>
>                                             Finally, for me, Cassandra
>                                             is a row oriented schema
>                                             less DBMS.... Is it true
>                                             for you also???
>
>                                             Many thanks in advance for
>                                             your reply
>
>                                             Best Regards
>                                             Mehdi Bada
>                                             ----
>
>                                             *Mehdi Bada* | Consultant
>                                             Phone: +41 32 422 96 00
>                                             <tel:%2B41%2032%20422%2096%2000>
>                                             | Mobile: +41 79 928 75 48
>                                             <tel:%2B41%2079%20928%2075%2048>
>                                             | Fax: +41 32 422 96 15
>                                             <tel:%2B41%2032%20422%2096%2015>
>
>                                             dbi services, Rue de la
>                                             Jeunesse 2, CH-2800 Del�mont
>                                             mehdi.bada@dbi-services.com
>                                             <ma...@dbi-services.com>
>
>                                             www.dbi-services.com
>                                             <http://www.dbi-services.com>
>
>
>
>                                             *\u21d2 dbi services is
>                                             recruiting Oracle & SQL
>                                             Server experts ! \u2013 Join
>                                             the team
>                                             <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>
>                                             *
>
>
>
>
>
>
>
>
>
>
>

-- 
\u0130yi �al\u0131\u015fmalar
Sel�uk MART
ONLINE KURUM
Hacettepe �niversitesi Teknokent,
�niversiteliler Mah. 1596. Sok.
Safir Bloklar\u0131, E BLOK 802/A,
Beytepe, �ankaya/ANKARA
Tel: +90 (312) 227 000 5


Re: Cassandra data model right definition

Posted by Edward Capriolo <ed...@gmail.com>.
The phrase is defensible, but that is the root of the problem. Take for
example a skateboard.

"A skateboard is like a bike because it has wheels and you ride on it."

That is true and defensively true. :) However with not much more text you
can accurately describe what it is, as opposed to something it is almost
like.

"A skateboard is a thin piece of wood on top of four small wheels that you
stand on and ride"

The old sentence Cassandra statement was something to the effect of "with
the storage model of big table and the consistency model of dynamo". This
accurately described the system and gave reference to specific known
quantities (bigtable/dynamo) in which white papers existed for further
reading.

On Mon, Oct 3, 2016 at 6:24 AM, Benedict Elliott Smith <be...@apache.org>
wrote:

> While that sentence leaves a lot to be desired (for me because it confers
> a different meaning on row store), it doesn't say "Cassandra is like a
> RDBMS" - it says "like an RDBMS, it organises data by rows and columns" -
> i.e., in this regard only it is like an RDBMS, not more generally.
>
> I believe it was meant to help people, especially those afraid of the
> NoSQL thrift world, understand that it still uses the basic concept of a
> rows and columns they are used to.  I agree it could be improved to
> minimise the chance of misreading it, and I'm certain contributions would
> be welcome here.
>
> I don't personally want to get bogged down in analysing every piece of
> text anyone has ever written, so I'll bow out of further discussion on
> this.  These phrases may all be suboptimal, but they are certainly
> defensible.  Column store is not, that's all I wanted to contribute here.
>
>
>
>
>
> On 1 October 2016 at 19:35, Peter Lin <wo...@gmail.com> wrote:
>
>> I'll second Ed's comment.
>>
>> The documentation should be more careful when using phrases "like
>> relational databases". When we look at the history of relational databases,
>> people expect certain things like ACID transactions, primary/foriegn key
>> constraints, query planners, joins and relational algebra. Clearly
>> Cassandra's storage engine does not follow most of those principals for a
>> good reason.
>>
>> The term row oriented storage would be more descriptive and appropriate.
>> It avoids conflating Cassandra storage engine with "traditional" relational
>> storage engines. Those of us that have spent over a decade using IBM DB2,
>> Oracle, Sql Server and Sybase tend to think of relational databases in a
>> certain way. If we go back to 1998, most RDBMS storage engine had a max row
>> size limit. Databases like Sybase before version 9 preferred RAW disk for
>> optimal performance. I can go on and on, but there's no point really.
>>
>> Cassandra's storage engine is "row oriented", but it's not relational in
>> RDBMS sense. We do everyone a huge disservice by using confusing
>> terminology and then making fun of those who get confused. No one wins when
>> that happens. At the end of the day, what differentiates cassandra's
>> storage engine is it support static and dynamic columns, which traditional
>> RDBMS don't support today. Calling Cassandra storage "distributed tables"
>> doesn't really help in my bias opinion.
>>
>> For example, if you tell a SqlServer or Oracle RAC admin "cassandra uses
>> distributed tables" they might answer "so what, sql server and oracle can
>> do that too." The difference is with RDBMS the partitioning is optional and
>> requires more work to configure. Whereas with Cassandra you can have
>> everything in 1 node, which means there is only 1 partition and no
>> different to 1 instance of sql server. Where you win is when you need to
>> add 2 more nodes, Cassandra makes this easier whereas with SqlServer and
>> Oracle you have to do a little bit more work. I've lost count of how many
>> times I've to explained noSql databases to RDBMS admins and had to explain
>> the official docs are stupid.
>>
>>
>>
>> On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo <ed...@gmail.com>
>> wrote:
>>
>>> https://github.com/apache/cassandra
>>>
>>> Row store <http://wiki.apache.org/cassandra/DataModel> means that like
>>> relational databases, Cassandra organizes data by rows and columns. The
>>> Cassandra Query Language (CQL) is a close relative of SQL.
>>>
>>> I generally do not know what to say about these high level
>>> "oversimplifications" like "firewalls block hackers". Are there "firewalls"
>>> or do they mean IP routers with layer 4 packet inspections and layer 3
>>> Access Control Lists?
>>>
>>> We say (and I catch myself doing it all the time) "like relational
>>> databases" often as if all relational databases work alike. A columnar
>>> store like HP Vertica is a relational database.MySql has different storage
>>> engines does MyIsam work like InnoDB?
>>>
>>> Google docs organizes data by rows and columns as well. You can wrap any
>>> storage system into an API that makes them look like rows and columns.
>>> Microsoft LINQ can enumerate your network cars and query them
>>> https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really
>>> does not make your network cards a "row store"
>>>
>>> "Theoretically a row can have 2 billion columns, but in practice it
>>> shouldn't have more than 100 million columns."
>>> In practice (In my experience) the number is much lower than 100
>>> million, and if the data actually is deleted and readded frequently the
>>> number of live columns(rows, whatever) you can use happily is even lower
>>>
>>>
>>> I believe on twitter (I am unable to find the tweet) someone was trying
>>> to convince me Cassandra was a "columnar analytic database".  ROFL
>>>
>>> I believe telling someone it "row store" "like a database", is not a
>>> good idea. They might away content with that explanation. You are setting
>>> them up to walk into an anti-pattern. Like a case where the user is
>>> attempting to write and deleting 1 row and 1 column 6 billion times a day.
>>> Then you end up explaining to them http://stackoverflow.com/
>>> questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached
>>>
>>> and how the cassandra storage model is not "like a relational database".
>>>
>>> On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo <ed...@gmail.com>
>>> wrote:
>>>
>>>> I can iterate over JSON data stored in mongo and present it as a table
>>>> with rows and columns. It does not make mongo a rowstore.
>>>>
>>>> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <edlinuxguru@gmail.com
>>>> > wrote:
>>>>
>>>>> The problem with calling it a row store:
>>>>>
>>>>> https://en.wikipedia.org/wiki/Row_(database)
>>>>>
>>>>> In the context of a relational database
>>>>> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also
>>>>> called a record
>>>>> <https://en.wikipedia.org/wiki/Record_(computer_science)> or tuple
>>>>> <https://en.wikipedia.org/wiki/Tuple>—represents a single, implicitly
>>>>> structured data <https://en.wikipedia.org/wiki/Data> item in a table
>>>>> <https://en.wikipedia.org/wiki/Table_(database)>. In simple terms, a
>>>>> database table can be thought of as consisting of *rows* andcolumns
>>>>> <https://en.wikipedia.org/wiki/Column_(database)> or fields
>>>>> <https://en.wikipedia.org/wiki/Field_(computer_science)>.[1]
>>>>> <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each row
>>>>> in a table represents a set of related data, and every row in the table has
>>>>> the same structure.
>>>>>
>>>>> When you have static columns and rows with maps, and lists, it is hard
>>>>> to argue that every row has the same structure. Physically at the storage
>>>>> layer they do not have the same structure and logically when accessing the
>>>>> data they barely have the same structure, as the static column is just
>>>>> appearing inside each row it is actually not contained in.
>>>>>
>>>>> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <jo...@jonhaddad.com>
>>>>> wrote:
>>>>>
>>>>>> +1000 to what Benedict says. I usually call it a "partitioned row
>>>>>> store" which usually needs some extra explanation but is more accurate than
>>>>>> "column family" or whatever other thrift era terminology people still use.
>>>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>>>>> table. This definition is closer to CQL and has some academic background
>>>>>>> (distributed hash table).
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>>>>> benedict@apache.org> wrote:
>>>>>>>
>>>>>>>> Cassandra is not a "wide column store" anymore.  It has a schema.
>>>>>>>> Only thrift users no longer think they have a schema (though they do), and
>>>>>>>> thrift is being deprecated.
>>>>>>>>
>>>>>>>> I really wish everyone would kill the term "wide column store" with
>>>>>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>>>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>>>>>
>>>>>>>> Not only that, but people don't even seem to realise the term
>>>>>>>> "column store" existed long before "wide column store" and the latter is
>>>>>>>> often abbreviated to the former, as here:
>>>>>>>> http://www.planetcassandra.org/what-is-nosql/
>>>>>>>>
>>>>>>>> Since it no longer applies, let's all agree as a community to
>>>>>>>> forget this awful nomenclature ever existed.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>>>>>> joaquin@thelastpickle.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Mehdi,
>>>>>>>>>
>>>>>>>>> I can help clarify a few things.
>>>>>>>>>
>>>>>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a
>>>>>>>>> row can have 2 billion columns, but in practice it shouldn't have more than
>>>>>>>>> 100 million columns.
>>>>>>>>>
>>>>>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>>>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>>>>>>> key.
>>>>>>>>>
>>>>>>>>> When writing to Cassandra, you will need to provide the full
>>>>>>>>> primary key, however, when reading from Cassandra, you only need to provide
>>>>>>>>> the full partition key.
>>>>>>>>>
>>>>>>>>> When you only provide the partition key for a read operation,
>>>>>>>>> you're able to return all columns that exist on that partition with low
>>>>>>>>> latency. These columns are displayed as "CQL rows" to make it easier to
>>>>>>>>> reason about.
>>>>>>>>>
>>>>>>>>> Consider the schema:
>>>>>>>>>
>>>>>>>>> CREATE TABLE foo (
>>>>>>>>>   bar uuid,
>>>>>>>>>
>>>>>>>>>   boz uuid,
>>>>>>>>>
>>>>>>>>>   baz timeuuid,
>>>>>>>>>   data1 text,
>>>>>>>>>
>>>>>>>>>   data2 text,
>>>>>>>>>
>>>>>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>>>>>
>>>>>>>>> );
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> When you write to Cassandra you will need to send bar, boz, and
>>>>>>>>> baz and optionally data*, if it's relevant for that CQL row. If you chose
>>>>>>>>> not to define a data* field for a particular CQL row, then nothing is
>>>>>>>>> stored nor allocated on disk. But I wouldn't consider that caveat to be
>>>>>>>>> "schema-less".
>>>>>>>>>
>>>>>>>>> However, all writes to the same bar/boz will end up on the same
>>>>>>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>>>>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>>>>>>> not a partition key is stored as a column, including clustering keys (this
>>>>>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>>>>>
>>>>>>>>> In this way you can get fast responses for all activity for
>>>>>>>>> bar/boz either over time, or for a specific time, with roughly the same
>>>>>>>>> number of disk seeks, with varying lengths on the disk scans.
>>>>>>>>>
>>>>>>>>> Hope that helps!
>>>>>>>>>
>>>>>>>>> Joaquin Casares
>>>>>>>>> Consultant
>>>>>>>>> Austin, TX
>>>>>>>>>
>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>> http://www.thelastpickle.com
>>>>>>>>>
>>>>>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <
>>>>>>>>> info@mrcalonso.com> wrote:
>>>>>>>>>
>>>>>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>>>>>>> /system/Cassandra
>>>>>>>>>>
>>>>>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>>>>>> <https://twitter.com/calonso>
>>>>>>>>>>
>>>>>>>>>> On 30 September 2016 at 18:24, Mehdi Bada <
>>>>>>>>>> mehdi.bada@dbi-services.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I have a theoritical question:
>>>>>>>>>>> - Is Apache Cassandra really a column store?
>>>>>>>>>>> Column store mean storing the data as column rather than as a
>>>>>>>>>>> rows.
>>>>>>>>>>>
>>>>>>>>>>> In fact C* store the data as row, and data is partionned with
>>>>>>>>>>> row key.
>>>>>>>>>>>
>>>>>>>>>>> Finally, for me, Cassandra is a row oriented schema less
>>>>>>>>>>> DBMS.... Is it true for you also???
>>>>>>>>>>>
>>>>>>>>>>> Many thanks in advance for your reply
>>>>>>>>>>>
>>>>>>>>>>> Best Regards
>>>>>>>>>>> Mehdi Bada
>>>>>>>>>>> ----
>>>>>>>>>>>
>>>>>>>>>>> *Mehdi Bada* | Consultant
>>>>>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41
>>>>>>>>>>> 32 422 96 15
>>>>>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>>>>>>> mehdi.bada@dbi-services.com
>>>>>>>>>>> www.dbi-services.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! –
>>>>>>>>>>> Join the team
>>>>>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra data model right definition

Posted by Benedict Elliott Smith <be...@apache.org>.
While that sentence leaves a lot to be desired (for me because it confers a
different meaning on row store), it doesn't say "Cassandra is like a RDBMS"
- it says "like an RDBMS, it organises data by rows and columns" - i.e., in
this regard only it is like an RDBMS, not more generally.

I believe it was meant to help people, especially those afraid of the NoSQL
thrift world, understand that it still uses the basic concept of a rows and
columns they are used to.  I agree it could be improved to minimise the
chance of misreading it, and I'm certain contributions would be welcome
here.

I don't personally want to get bogged down in analysing every piece of text
anyone has ever written, so I'll bow out of further discussion on this.
These phrases may all be suboptimal, but they are certainly defensible.
Column store is not, that's all I wanted to contribute here.





On 1 October 2016 at 19:35, Peter Lin <wo...@gmail.com> wrote:

> I'll second Ed's comment.
>
> The documentation should be more careful when using phrases "like
> relational databases". When we look at the history of relational databases,
> people expect certain things like ACID transactions, primary/foriegn key
> constraints, query planners, joins and relational algebra. Clearly
> Cassandra's storage engine does not follow most of those principals for a
> good reason.
>
> The term row oriented storage would be more descriptive and appropriate.
> It avoids conflating Cassandra storage engine with "traditional" relational
> storage engines. Those of us that have spent over a decade using IBM DB2,
> Oracle, Sql Server and Sybase tend to think of relational databases in a
> certain way. If we go back to 1998, most RDBMS storage engine had a max row
> size limit. Databases like Sybase before version 9 preferred RAW disk for
> optimal performance. I can go on and on, but there's no point really.
>
> Cassandra's storage engine is "row oriented", but it's not relational in
> RDBMS sense. We do everyone a huge disservice by using confusing
> terminology and then making fun of those who get confused. No one wins when
> that happens. At the end of the day, what differentiates cassandra's
> storage engine is it support static and dynamic columns, which traditional
> RDBMS don't support today. Calling Cassandra storage "distributed tables"
> doesn't really help in my bias opinion.
>
> For example, if you tell a SqlServer or Oracle RAC admin "cassandra uses
> distributed tables" they might answer "so what, sql server and oracle can
> do that too." The difference is with RDBMS the partitioning is optional and
> requires more work to configure. Whereas with Cassandra you can have
> everything in 1 node, which means there is only 1 partition and no
> different to 1 instance of sql server. Where you win is when you need to
> add 2 more nodes, Cassandra makes this easier whereas with SqlServer and
> Oracle you have to do a little bit more work. I've lost count of how many
> times I've to explained noSql databases to RDBMS admins and had to explain
> the official docs are stupid.
>
>
>
> On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> https://github.com/apache/cassandra
>>
>> Row store <http://wiki.apache.org/cassandra/DataModel> means that like
>> relational databases, Cassandra organizes data by rows and columns. The
>> Cassandra Query Language (CQL) is a close relative of SQL.
>>
>> I generally do not know what to say about these high level
>> "oversimplifications" like "firewalls block hackers". Are there "firewalls"
>> or do they mean IP routers with layer 4 packet inspections and layer 3
>> Access Control Lists?
>>
>> We say (and I catch myself doing it all the time) "like relational
>> databases" often as if all relational databases work alike. A columnar
>> store like HP Vertica is a relational database.MySql has different storage
>> engines does MyIsam work like InnoDB?
>>
>> Google docs organizes data by rows and columns as well. You can wrap any
>> storage system into an API that makes them look like rows and columns.
>> Microsoft LINQ can enumerate your network cars and query them
>> https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really
>> does not make your network cards a "row store"
>>
>> "Theoretically a row can have 2 billion columns, but in practice it
>> shouldn't have more than 100 million columns."
>> In practice (In my experience) the number is much lower than 100 million,
>> and if the data actually is deleted and readded frequently the number of
>> live columns(rows, whatever) you can use happily is even lower
>>
>>
>> I believe on twitter (I am unable to find the tweet) someone was trying
>> to convince me Cassandra was a "columnar analytic database".  ROFL
>>
>> I believe telling someone it "row store" "like a database", is not a good
>> idea. They might away content with that explanation. You are setting them
>> up to walk into an anti-pattern. Like a case where the user is attempting
>> to write and deleting 1 row and 1 column 6 billion times a day. Then you
>> end up explaining to them http://stackoverflow.com/
>> questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached
>>
>> and how the cassandra storage model is not "like a relational database".
>>
>> On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo <ed...@gmail.com>
>> wrote:
>>
>>> I can iterate over JSON data stored in mongo and present it as a table
>>> with rows and columns. It does not make mongo a rowstore.
>>>
>>> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <ed...@gmail.com>
>>> wrote:
>>>
>>>> The problem with calling it a row store:
>>>>
>>>> https://en.wikipedia.org/wiki/Row_(database)
>>>>
>>>> In the context of a relational database
>>>> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also
>>>> called a record
>>>> <https://en.wikipedia.org/wiki/Record_(computer_science)> or tuple
>>>> <https://en.wikipedia.org/wiki/Tuple>—represents a single, implicitly
>>>> structured data <https://en.wikipedia.org/wiki/Data> item in a table
>>>> <https://en.wikipedia.org/wiki/Table_(database)>. In simple terms, a
>>>> database table can be thought of as consisting of *rows* andcolumns
>>>> <https://en.wikipedia.org/wiki/Column_(database)> or fields
>>>> <https://en.wikipedia.org/wiki/Field_(computer_science)>.[1]
>>>> <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each row in
>>>> a table represents a set of related data, and every row in the table has
>>>> the same structure.
>>>>
>>>> When you have static columns and rows with maps, and lists, it is hard
>>>> to argue that every row has the same structure. Physically at the storage
>>>> layer they do not have the same structure and logically when accessing the
>>>> data they barely have the same structure, as the static column is just
>>>> appearing inside each row it is actually not contained in.
>>>>
>>>> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <jo...@jonhaddad.com>
>>>> wrote:
>>>>
>>>>> +1000 to what Benedict says. I usually call it a "partitioned row
>>>>> store" which usually needs some extra explanation but is more accurate than
>>>>> "column family" or whatever other thrift era terminology people still use.
>>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>>>> table. This definition is closer to CQL and has some academic background
>>>>>> (distributed hash table).
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>>>> benedict@apache.org> wrote:
>>>>>>
>>>>>>> Cassandra is not a "wide column store" anymore.  It has a schema.
>>>>>>> Only thrift users no longer think they have a schema (though they do), and
>>>>>>> thrift is being deprecated.
>>>>>>>
>>>>>>> I really wish everyone would kill the term "wide column store" with
>>>>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>>>>
>>>>>>> Not only that, but people don't even seem to realise the term
>>>>>>> "column store" existed long before "wide column store" and the latter is
>>>>>>> often abbreviated to the former, as here:
>>>>>>> http://www.planetcassandra.org/what-is-nosql/
>>>>>>>
>>>>>>> Since it no longer applies, let's all agree as a community to forget
>>>>>>> this awful nomenclature ever existed.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>>>>> joaquin@thelastpickle.com> wrote:
>>>>>>>
>>>>>>>> Hi Mehdi,
>>>>>>>>
>>>>>>>> I can help clarify a few things.
>>>>>>>>
>>>>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a
>>>>>>>> row can have 2 billion columns, but in practice it shouldn't have more than
>>>>>>>> 100 million columns.
>>>>>>>>
>>>>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>>>>>> key.
>>>>>>>>
>>>>>>>> When writing to Cassandra, you will need to provide the full
>>>>>>>> primary key, however, when reading from Cassandra, you only need to provide
>>>>>>>> the full partition key.
>>>>>>>>
>>>>>>>> When you only provide the partition key for a read operation,
>>>>>>>> you're able to return all columns that exist on that partition with low
>>>>>>>> latency. These columns are displayed as "CQL rows" to make it easier to
>>>>>>>> reason about.
>>>>>>>>
>>>>>>>> Consider the schema:
>>>>>>>>
>>>>>>>> CREATE TABLE foo (
>>>>>>>>   bar uuid,
>>>>>>>>
>>>>>>>>   boz uuid,
>>>>>>>>
>>>>>>>>   baz timeuuid,
>>>>>>>>   data1 text,
>>>>>>>>
>>>>>>>>   data2 text,
>>>>>>>>
>>>>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>>>>
>>>>>>>> );
>>>>>>>>
>>>>>>>>
>>>>>>>> When you write to Cassandra you will need to send bar, boz, and baz
>>>>>>>> and optionally data*, if it's relevant for that CQL row. If you chose not
>>>>>>>> to define a data* field for a particular CQL row, then nothing is stored
>>>>>>>> nor allocated on disk. But I wouldn't consider that caveat to be
>>>>>>>> "schema-less".
>>>>>>>>
>>>>>>>> However, all writes to the same bar/boz will end up on the same
>>>>>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>>>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>>>>>> not a partition key is stored as a column, including clustering keys (this
>>>>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>>>>
>>>>>>>> In this way you can get fast responses for all activity for bar/boz
>>>>>>>> either over time, or for a specific time, with roughly the same number of
>>>>>>>> disk seeks, with varying lengths on the disk scans.
>>>>>>>>
>>>>>>>> Hope that helps!
>>>>>>>>
>>>>>>>> Joaquin Casares
>>>>>>>> Consultant
>>>>>>>> Austin, TX
>>>>>>>>
>>>>>>>> Apache Cassandra Consulting
>>>>>>>> http://www.thelastpickle.com
>>>>>>>>
>>>>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <info@mrcalonso.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>>>>>> /system/Cassandra
>>>>>>>>>
>>>>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>>>>> <https://twitter.com/calonso>
>>>>>>>>>
>>>>>>>>> On 30 September 2016 at 18:24, Mehdi Bada <
>>>>>>>>> mehdi.bada@dbi-services.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have a theoritical question:
>>>>>>>>>> - Is Apache Cassandra really a column store?
>>>>>>>>>> Column store mean storing the data as column rather than as a
>>>>>>>>>> rows.
>>>>>>>>>>
>>>>>>>>>> In fact C* store the data as row, and data is partionned with row
>>>>>>>>>> key.
>>>>>>>>>>
>>>>>>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS....
>>>>>>>>>> Is it true for you also???
>>>>>>>>>>
>>>>>>>>>> Many thanks in advance for your reply
>>>>>>>>>>
>>>>>>>>>> Best Regards
>>>>>>>>>> Mehdi Bada
>>>>>>>>>> ----
>>>>>>>>>>
>>>>>>>>>> *Mehdi Bada* | Consultant
>>>>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32
>>>>>>>>>> 422 96 15
>>>>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>>>>>> mehdi.bada@dbi-services.com
>>>>>>>>>> www.dbi-services.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! –
>>>>>>>>>> Join the team
>>>>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Cassandra data model right definition

Posted by Peter Lin <wo...@gmail.com>.
I'll second Ed's comment.

The documentation should be more careful when using phrases "like
relational databases". When we look at the history of relational databases,
people expect certain things like ACID transactions, primary/foriegn key
constraints, query planners, joins and relational algebra. Clearly
Cassandra's storage engine does not follow most of those principals for a
good reason.

The term row oriented storage would be more descriptive and appropriate. It
avoids conflating Cassandra storage engine with "traditional" relational
storage engines. Those of us that have spent over a decade using IBM DB2,
Oracle, Sql Server and Sybase tend to think of relational databases in a
certain way. If we go back to 1998, most RDBMS storage engine had a max row
size limit. Databases like Sybase before version 9 preferred RAW disk for
optimal performance. I can go on and on, but there's no point really.

Cassandra's storage engine is "row oriented", but it's not relational in
RDBMS sense. We do everyone a huge disservice by using confusing
terminology and then making fun of those who get confused. No one wins when
that happens. At the end of the day, what differentiates cassandra's
storage engine is it support static and dynamic columns, which traditional
RDBMS don't support today. Calling Cassandra storage "distributed tables"
doesn't really help in my bias opinion.

For example, if you tell a SqlServer or Oracle RAC admin "cassandra uses
distributed tables" they might answer "so what, sql server and oracle can
do that too." The difference is with RDBMS the partitioning is optional and
requires more work to configure. Whereas with Cassandra you can have
everything in 1 node, which means there is only 1 partition and no
different to 1 instance of sql server. Where you win is when you need to
add 2 more nodes, Cassandra makes this easier whereas with SqlServer and
Oracle you have to do a little bit more work. I've lost count of how many
times I've to explained noSql databases to RDBMS admins and had to explain
the official docs are stupid.



On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo <ed...@gmail.com>
wrote:

> https://github.com/apache/cassandra
>
> Row store <http://wiki.apache.org/cassandra/DataModel> means that like
> relational databases, Cassandra organizes data by rows and columns. The
> Cassandra Query Language (CQL) is a close relative of SQL.
>
> I generally do not know what to say about these high level
> "oversimplifications" like "firewalls block hackers". Are there "firewalls"
> or do they mean IP routers with layer 4 packet inspections and layer 3
> Access Control Lists?
>
> We say (and I catch myself doing it all the time) "like relational
> databases" often as if all relational databases work alike. A columnar
> store like HP Vertica is a relational database.MySql has different storage
> engines does MyIsam work like InnoDB?
>
> Google docs organizes data by rows and columns as well. You can wrap any
> storage system into an API that makes them look like rows and columns.
> Microsoft LINQ can enumerate your network cars and query them
> https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really does
> not make your network cards a "row store"
>
> "Theoretically a row can have 2 billion columns, but in practice it
> shouldn't have more than 100 million columns."
> In practice (In my experience) the number is much lower than 100 million,
> and if the data actually is deleted and readded frequently the number of
> live columns(rows, whatever) you can use happily is even lower
>
>
> I believe on twitter (I am unable to find the tweet) someone was trying to
> convince me Cassandra was a "columnar analytic database".  ROFL
>
> I believe telling someone it "row store" "like a database", is not a good
> idea. They might away content with that explanation. You are setting them
> up to walk into an anti-pattern. Like a case where the user is attempting
> to write and deleting 1 row and 1 column 6 billion times a day. Then you
> end up explaining to them http://stackoverflow.com/
> questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached
>
> and how the cassandra storage model is not "like a relational database".
>
> On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> I can iterate over JSON data stored in mongo and present it as a table
>> with rows and columns. It does not make mongo a rowstore.
>>
>> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <ed...@gmail.com>
>> wrote:
>>
>>> The problem with calling it a row store:
>>>
>>> https://en.wikipedia.org/wiki/Row_(database)
>>>
>>> In the context of a relational database
>>> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also
>>> called a record
>>> <https://en.wikipedia.org/wiki/Record_(computer_science)> or tuple
>>> <https://en.wikipedia.org/wiki/Tuple>—represents a single, implicitly
>>> structured data <https://en.wikipedia.org/wiki/Data> item in a table
>>> <https://en.wikipedia.org/wiki/Table_(database)>. In simple terms, a
>>> database table can be thought of as consisting of *rows* andcolumns
>>> <https://en.wikipedia.org/wiki/Column_(database)> or fields
>>> <https://en.wikipedia.org/wiki/Field_(computer_science)>.[1]
>>> <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each row in
>>> a table represents a set of related data, and every row in the table has
>>> the same structure.
>>>
>>> When you have static columns and rows with maps, and lists, it is hard
>>> to argue that every row has the same structure. Physically at the storage
>>> layer they do not have the same structure and logically when accessing the
>>> data they barely have the same structure, as the static column is just
>>> appearing inside each row it is actually not contained in.
>>>
>>> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>
>>>> +1000 to what Benedict says. I usually call it a "partitioned row
>>>> store" which usually needs some extra explanation but is more accurate than
>>>> "column family" or whatever other thrift era terminology people still use.
>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>>> table. This definition is closer to CQL and has some academic background
>>>>> (distributed hash table).
>>>>>
>>>>>
>>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>>> benedict@apache.org> wrote:
>>>>>
>>>>>> Cassandra is not a "wide column store" anymore.  It has a schema.
>>>>>> Only thrift users no longer think they have a schema (though they do), and
>>>>>> thrift is being deprecated.
>>>>>>
>>>>>> I really wish everyone would kill the term "wide column store" with
>>>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>>>
>>>>>> Not only that, but people don't even seem to realise the term "column
>>>>>> store" existed long before "wide column store" and the latter is often
>>>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>>>> /what-is-nosql/
>>>>>>
>>>>>> Since it no longer applies, let's all agree as a community to forget
>>>>>> this awful nomenclature ever existed.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>>>> joaquin@thelastpickle.com> wrote:
>>>>>>
>>>>>>> Hi Mehdi,
>>>>>>>
>>>>>>> I can help clarify a few things.
>>>>>>>
>>>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a
>>>>>>> row can have 2 billion columns, but in practice it shouldn't have more than
>>>>>>> 100 million columns.
>>>>>>>
>>>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>>>>> key.
>>>>>>>
>>>>>>> When writing to Cassandra, you will need to provide the full primary
>>>>>>> key, however, when reading from Cassandra, you only need to provide the
>>>>>>> full partition key.
>>>>>>>
>>>>>>> When you only provide the partition key for a read operation, you're
>>>>>>> able to return all columns that exist on that partition with low latency.
>>>>>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>>>>>
>>>>>>> Consider the schema:
>>>>>>>
>>>>>>> CREATE TABLE foo (
>>>>>>>   bar uuid,
>>>>>>>
>>>>>>>   boz uuid,
>>>>>>>
>>>>>>>   baz timeuuid,
>>>>>>>   data1 text,
>>>>>>>
>>>>>>>   data2 text,
>>>>>>>
>>>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>>>
>>>>>>> );
>>>>>>>
>>>>>>>
>>>>>>> When you write to Cassandra you will need to send bar, boz, and baz
>>>>>>> and optionally data*, if it's relevant for that CQL row. If you chose not
>>>>>>> to define a data* field for a particular CQL row, then nothing is stored
>>>>>>> nor allocated on disk. But I wouldn't consider that caveat to be
>>>>>>> "schema-less".
>>>>>>>
>>>>>>> However, all writes to the same bar/boz will end up on the same
>>>>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>>>>> not a partition key is stored as a column, including clustering keys (this
>>>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>>>
>>>>>>> In this way you can get fast responses for all activity for bar/boz
>>>>>>> either over time, or for a specific time, with roughly the same number of
>>>>>>> disk seeks, with varying lengths on the disk scans.
>>>>>>>
>>>>>>> Hope that helps!
>>>>>>>
>>>>>>> Joaquin Casares
>>>>>>> Consultant
>>>>>>> Austin, TX
>>>>>>>
>>>>>>> Apache Cassandra Consulting
>>>>>>> http://www.thelastpickle.com
>>>>>>>
>>>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>>>>> /system/Cassandra
>>>>>>>>
>>>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>>>> <https://twitter.com/calonso>
>>>>>>>>
>>>>>>>> On 30 September 2016 at 18:24, Mehdi Bada <
>>>>>>>> mehdi.bada@dbi-services.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have a theoritical question:
>>>>>>>>> - Is Apache Cassandra really a column store?
>>>>>>>>> Column store mean storing the data as column rather than as a
>>>>>>>>> rows.
>>>>>>>>>
>>>>>>>>> In fact C* store the data as row, and data is partionned with row
>>>>>>>>> key.
>>>>>>>>>
>>>>>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS....
>>>>>>>>> Is it true for you also???
>>>>>>>>>
>>>>>>>>> Many thanks in advance for your reply
>>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>> Mehdi Bada
>>>>>>>>> ----
>>>>>>>>>
>>>>>>>>> *Mehdi Bada* | Consultant
>>>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32
>>>>>>>>> 422 96 15
>>>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>>>>> mehdi.bada@dbi-services.com
>>>>>>>>> www.dbi-services.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join
>>>>>>>>> the team
>>>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: Cassandra data model right definition

Posted by Edward Capriolo <ed...@gmail.com>.
https://github.com/apache/cassandra

Row store <http://wiki.apache.org/cassandra/DataModel> means that like
relational databases, Cassandra organizes data by rows and columns. The
Cassandra Query Language (CQL) is a close relative of SQL.

I generally do not know what to say about these high level
"oversimplifications" like "firewalls block hackers". Are there "firewalls"
or do they mean IP routers with layer 4 packet inspections and layer 3
Access Control Lists?

We say (and I catch myself doing it all the time) "like relational
databases" often as if all relational databases work alike. A columnar
store like HP Vertica is a relational database.MySql has different storage
engines does MyIsam work like InnoDB?

Google docs organizes data by rows and columns as well. You can wrap any
storage system into an API that makes them look like rows and columns.
Microsoft LINQ can enumerate your network cars and query them
https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really does
not make your network cards a "row store"

"Theoretically a row can have 2 billion columns, but in practice it
shouldn't have more than 100 million columns."
In practice (In my experience) the number is much lower than 100 million,
and if the data actually is deleted and readded frequently the number of
live columns(rows, whatever) you can use happily is even lower


I believe on twitter (I am unable to find the tweet) someone was trying to
convince me Cassandra was a "columnar analytic database".  ROFL

I believe telling someone it "row store" "like a database", is not a good
idea. They might away content with that explanation. You are setting them
up to walk into an anti-pattern. Like a case where the user is attempting
to write and deleting 1 row and 1 column 6 billion times a day. Then you
end up explaining to them
http://stackoverflow.com/questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached


and how the cassandra storage model is not "like a relational database".

On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo <ed...@gmail.com>
wrote:

> I can iterate over JSON data stored in mongo and present it as a table
> with rows and columns. It does not make mongo a rowstore.
>
> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> The problem with calling it a row store:
>>
>> https://en.wikipedia.org/wiki/Row_(database)
>>
>> In the context of a relational database
>> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also called
>> a record <https://en.wikipedia.org/wiki/Record_(computer_science)> or
>> tuple <https://en.wikipedia.org/wiki/Tuple>—represents a single,
>> implicitly structured data <https://en.wikipedia.org/wiki/Data> item in
>> a table <https://en.wikipedia.org/wiki/Table_(database)>. In simple
>> terms, a database table can be thought of as consisting of *rows* and
>> columns <https://en.wikipedia.org/wiki/Column_(database)> or fields
>> <https://en.wikipedia.org/wiki/Field_(computer_science)>.[1]
>> <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each row in a
>> table represents a set of related data, and every row in the table has the
>> same structure.
>>
>> When you have static columns and rows with maps, and lists, it is hard to
>> argue that every row has the same structure. Physically at the storage
>> layer they do not have the same structure and logically when accessing the
>> data they barely have the same structure, as the static column is just
>> appearing inside each row it is actually not contained in.
>>
>> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>>> which usually needs some extra explanation but is more accurate than
>>> "column family" or whatever other thrift era terminology people still use.
>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com>
>>> wrote:
>>>
>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>> table. This definition is closer to CQL and has some academic background
>>>> (distributed hash table).
>>>>
>>>>
>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>> benedict@apache.org> wrote:
>>>>
>>>>> Cassandra is not a "wide column store" anymore.  It has a schema.
>>>>> Only thrift users no longer think they have a schema (though they do), and
>>>>> thrift is being deprecated.
>>>>>
>>>>> I really wish everyone would kill the term "wide column store" with
>>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>>
>>>>> Not only that, but people don't even seem to realise the term "column
>>>>> store" existed long before "wide column store" and the latter is often
>>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>>> /what-is-nosql/
>>>>>
>>>>> Since it no longer applies, let's all agree as a community to forget
>>>>> this awful nomenclature ever existed.
>>>>>
>>>>>
>>>>>
>>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>>> joaquin@thelastpickle.com> wrote:
>>>>>
>>>>>> Hi Mehdi,
>>>>>>
>>>>>> I can help clarify a few things.
>>>>>>
>>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>>>>> million columns.
>>>>>>
>>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>>>> key.
>>>>>>
>>>>>> When writing to Cassandra, you will need to provide the full primary
>>>>>> key, however, when reading from Cassandra, you only need to provide the
>>>>>> full partition key.
>>>>>>
>>>>>> When you only provide the partition key for a read operation, you're
>>>>>> able to return all columns that exist on that partition with low latency.
>>>>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>>>>
>>>>>> Consider the schema:
>>>>>>
>>>>>> CREATE TABLE foo (
>>>>>>   bar uuid,
>>>>>>
>>>>>>   boz uuid,
>>>>>>
>>>>>>   baz timeuuid,
>>>>>>   data1 text,
>>>>>>
>>>>>>   data2 text,
>>>>>>
>>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>>
>>>>>> );
>>>>>>
>>>>>>
>>>>>> When you write to Cassandra you will need to send bar, boz, and baz
>>>>>> and optionally data*, if it's relevant for that CQL row. If you chose not
>>>>>> to define a data* field for a particular CQL row, then nothing is stored
>>>>>> nor allocated on disk. But I wouldn't consider that caveat to be
>>>>>> "schema-less".
>>>>>>
>>>>>> However, all writes to the same bar/boz will end up on the same
>>>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>>>> not a partition key is stored as a column, including clustering keys (this
>>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>>
>>>>>> In this way you can get fast responses for all activity for bar/boz
>>>>>> either over time, or for a specific time, with roughly the same number of
>>>>>> disk seeks, with varying lengths on the disk scans.
>>>>>>
>>>>>> Hope that helps!
>>>>>>
>>>>>> Joaquin Casares
>>>>>> Consultant
>>>>>> Austin, TX
>>>>>>
>>>>>> Apache Cassandra Consulting
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>>>> /system/Cassandra
>>>>>>>
>>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>>> <https://twitter.com/calonso>
>>>>>>>
>>>>>>> On 30 September 2016 at 18:24, Mehdi Bada <
>>>>>>> mehdi.bada@dbi-services.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have a theoritical question:
>>>>>>>> - Is Apache Cassandra really a column store?
>>>>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>>>>
>>>>>>>> In fact C* store the data as row, and data is partionned with row
>>>>>>>> key.
>>>>>>>>
>>>>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS....
>>>>>>>> Is it true for you also???
>>>>>>>>
>>>>>>>> Many thanks in advance for your reply
>>>>>>>>
>>>>>>>> Best Regards
>>>>>>>> Mehdi Bada
>>>>>>>> ----
>>>>>>>>
>>>>>>>> *Mehdi Bada* | Consultant
>>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32
>>>>>>>> 422 96 15
>>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>>>> mehdi.bada@dbi-services.com
>>>>>>>> www.dbi-services.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join
>>>>>>>> the team
>>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Re: Cassandra data model right definition

Posted by Edward Capriolo <ed...@gmail.com>.
I can iterate over JSON data stored in mongo and present it as a table with
rows and columns. It does not make mongo a rowstore.

On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo <ed...@gmail.com>
wrote:

> The problem with calling it a row store:
>
> https://en.wikipedia.org/wiki/Row_(database)
>
> In the context of a relational database
> <https://en.wikipedia.org/wiki/Relational_database>, a *row*—also called
> a record <https://en.wikipedia.org/wiki/Record_(computer_science)> or
> tuple <https://en.wikipedia.org/wiki/Tuple>—represents a single,
> implicitly structured data <https://en.wikipedia.org/wiki/Data> item in a
> table <https://en.wikipedia.org/wiki/Table_(database)>. In simple terms,
> a database table can be thought of as consisting of *rows* andcolumns
> <https://en.wikipedia.org/wiki/Column_(database)> or fields
> <https://en.wikipedia.org/wiki/Field_(computer_science)>.[1]
> <https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each row in a
> table represents a set of related data, and every row in the table has the
> same structure.
>
> When you have static columns and rows with maps, and lists, it is hard to
> argue that every row has the same structure. Physically at the storage
> layer they do not have the same structure and logically when accessing the
> data they barely have the same structure, as the static column is just
> appearing inside each row it is actually not contained in.
>
> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>>
>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>> table. This definition is closer to CQL and has some academic background
>>> (distributed hash table).
>>>
>>>
>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>> benedict@apache.org> wrote:
>>>
>>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>>> thrift users no longer think they have a schema (though they do), and
>>>> thrift is being deprecated.
>>>>
>>>> I really wish everyone would kill the term "wide column store" with
>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>
>>>> Not only that, but people don't even seem to realise the term "column
>>>> store" existed long before "wide column store" and the latter is often
>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>> /what-is-nosql/
>>>>
>>>> Since it no longer applies, let's all agree as a community to forget
>>>> this awful nomenclature ever existed.
>>>>
>>>>
>>>>
>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>> joaquin@thelastpickle.com> wrote:
>>>>
>>>>> Hi Mehdi,
>>>>>
>>>>> I can help clarify a few things.
>>>>>
>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>>>> million columns.
>>>>>
>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>>> key.
>>>>>
>>>>> When writing to Cassandra, you will need to provide the full primary
>>>>> key, however, when reading from Cassandra, you only need to provide the
>>>>> full partition key.
>>>>>
>>>>> When you only provide the partition key for a read operation, you're
>>>>> able to return all columns that exist on that partition with low latency.
>>>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>>>
>>>>> Consider the schema:
>>>>>
>>>>> CREATE TABLE foo (
>>>>>   bar uuid,
>>>>>
>>>>>   boz uuid,
>>>>>
>>>>>   baz timeuuid,
>>>>>   data1 text,
>>>>>
>>>>>   data2 text,
>>>>>
>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>
>>>>> );
>>>>>
>>>>>
>>>>> When you write to Cassandra you will need to send bar, boz, and baz
>>>>> and optionally data*, if it's relevant for that CQL row. If you chose not
>>>>> to define a data* field for a particular CQL row, then nothing is stored
>>>>> nor allocated on disk. But I wouldn't consider that caveat to be
>>>>> "schema-less".
>>>>>
>>>>> However, all writes to the same bar/boz will end up on the same
>>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>>> not a partition key is stored as a column, including clustering keys (this
>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>
>>>>> In this way you can get fast responses for all activity for bar/boz
>>>>> either over time, or for a specific time, with roughly the same number of
>>>>> disk seeks, with varying lengths on the disk scans.
>>>>>
>>>>> Hope that helps!
>>>>>
>>>>> Joaquin Casares
>>>>> Consultant
>>>>> Austin, TX
>>>>>
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>>>> wrote:
>>>>>
>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>>> /system/Cassandra
>>>>>>
>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>> <https://twitter.com/calonso>
>>>>>>
>>>>>> On 30 September 2016 at 18:24, Mehdi Bada <
>>>>>> mehdi.bada@dbi-services.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a theoritical question:
>>>>>>> - Is Apache Cassandra really a column store?
>>>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>>>
>>>>>>> In fact C* store the data as row, and data is partionned with row
>>>>>>> key.
>>>>>>>
>>>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is
>>>>>>> it true for you also???
>>>>>>>
>>>>>>> Many thanks in advance for your reply
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Mehdi Bada
>>>>>>> ----
>>>>>>>
>>>>>>> *Mehdi Bada* | Consultant
>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32
>>>>>>> 422 96 15
>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>>> mehdi.bada@dbi-services.com
>>>>>>> www.dbi-services.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join
>>>>>>> the team
>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: Cassandra data model right definition

Posted by Edward Capriolo <ed...@gmail.com>.
The problem with calling it a row store:

https://en.wikipedia.org/wiki/Row_(database)

In the context of a relational database
<https://en.wikipedia.org/wiki/Relational_database>, a *row*—also called a
record <https://en.wikipedia.org/wiki/Record_(computer_science)> or tuple
<https://en.wikipedia.org/wiki/Tuple>—represents a single, implicitly
structured data <https://en.wikipedia.org/wiki/Data> item in a table
<https://en.wikipedia.org/wiki/Table_(database)>. In simple terms, a
database table can be thought of as consisting of *rows* andcolumns
<https://en.wikipedia.org/wiki/Column_(database)> or fields
<https://en.wikipedia.org/wiki/Field_(computer_science)>.[1]
<https://en.wikipedia.org/wiki/Row_(database)#cite_note-1> Each row in a
table represents a set of related data, and every row in the table has the
same structure.

When you have static columns and rows with maps, and lists, it is hard to
argue that every row has the same structure. Physically at the storage
layer they do not have the same structure and logically when accessing the
data they barely have the same structure, as the static column is just
appearing inside each row it is actually not contained in.

On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>> thrift users no longer think they have a schema (though they do), and
>>> thrift is being deprecated.
>>>
>>> I really wish everyone would kill the term "wide column store" with
>>> fire.  It seems to have never meant anything beyond "schema-less,
>>> row-oriented", and a "column store" means literally the opposite of this.
>>>
>>> Not only that, but people don't even seem to realise the term "column
>>> store" existed long before "wide column store" and the latter is often
>>> abbreviated to the former, as here: http://www.planetcassandra.
>>> org/what-is-nosql/
>>>
>>> Since it no longer applies, let's all agree as a community to forget
>>> this awful nomenclature ever existed.
>>>
>>>
>>>
>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>> joaquin@thelastpickle.com> wrote:
>>>
>>>> Hi Mehdi,
>>>>
>>>> I can help clarify a few things.
>>>>
>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>>> million columns.
>>>>
>>>> Cassandra partitions data to certain nodes based on the partition
>>>> key(s), but does provide the option of setting zero or more clustering
>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>> key.
>>>>
>>>> When writing to Cassandra, you will need to provide the full primary
>>>> key, however, when reading from Cassandra, you only need to provide the
>>>> full partition key.
>>>>
>>>> When you only provide the partition key for a read operation, you're
>>>> able to return all columns that exist on that partition with low latency.
>>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>>
>>>> Consider the schema:
>>>>
>>>> CREATE TABLE foo (
>>>>   bar uuid,
>>>>
>>>>   boz uuid,
>>>>
>>>>   baz timeuuid,
>>>>   data1 text,
>>>>
>>>>   data2 text,
>>>>
>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>
>>>> );
>>>>
>>>>
>>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>>> define a data* field for a particular CQL row, then nothing is stored nor
>>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>>
>>>> However, all writes to the same bar/boz will end up on the same
>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>> not a partition key is stored as a column, including clustering keys (this
>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>
>>>> In this way you can get fast responses for all activity for bar/boz
>>>> either over time, or for a specific time, with roughly the same number of
>>>> disk seeks, with varying lengths on the disk scans.
>>>>
>>>> Hope that helps!
>>>>
>>>> Joaquin Casares
>>>> Consultant
>>>> Austin, TX
>>>>
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>>> wrote:
>>>>
>>>>> Cassandra is a Wide Column Store http://db-engines.com/
>>>>> en/system/Cassandra
>>>>>
>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>> <https://twitter.com/calonso>
>>>>>
>>>>> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.bada@dbi-services.com
>>>>> > wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a theoritical question:
>>>>>> - Is Apache Cassandra really a column store?
>>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>>
>>>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>>>
>>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is
>>>>>> it true for you also???
>>>>>>
>>>>>> Many thanks in advance for your reply
>>>>>>
>>>>>> Best Regards
>>>>>> Mehdi Bada
>>>>>> ----
>>>>>>
>>>>>> *Mehdi Bada* | Consultant
>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>>>> 96 15
>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>> mehdi.bada@dbi-services.com
>>>>>> www.dbi-services.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join
>>>>>> the team
>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Re: Cassandra data model right definition

Posted by Mehdi Bada <me...@dbi-services.com>.
Hi all, 

Just to refocus the debat (because I'm the at the origin of this very interesting exchanges). 
I think for a good understanding of the data model of any DMBS, we have (technical experts) to decompose the data objects of the model and understand how the data is precisely stored and what kind of mechanisms is used. 
In this way, I think, Russell has describe very well the situation, and we can said that Apache Cassandra data model can be defined as a Partitioned Row Store . 

Many thanks for your all feedbacks and contribution 

Best Regards 
Mehdi Bada 

--- 
Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 499 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.bada@dbi-services.com 
www.dbi-services.com 




From: "Edward Capriolo" <ed...@gmail.com> 
To: "user" <us...@cassandra.apache.org> 
Sent: Monday, October 3, 2016 4:53:16 PM 
Subject: Re: Cassandra data model right definition 

My original point can be summed up as: 

Do not define cassandra in terms SMILES & METAPHORS. Such words include "like" and "close relative". 

For the specifics: 

Any relational db could (and I'm sure one does!) allow for sparse fields as well. MySQL can be backed by rocksdb now, does that make it not a row store? 

Lets draw some lines, a relational database is clearly defined. 

https://en.wikipedia.org/wiki/Edgar_F._Codd 


Codd's theorem , a result proven in his seminal work on the relational model, equates the expressive power of relational algebra and relational calculus (both of which, lacking recursion, are strictly less powerful than first-order logic ). [ citation needed ] 

As the relational model started to become fashionable in the early 1980s, Codd fought a sometimes bitter campaign to prevent the term being misused by database vendors who had merely added a relational veneer to older technology. As part of this campaign, he published his 12 rules to define what constituted a relational database. This made his position in IBM increasingly difficult, so he left to form his own consulting company with Chris Date and others. 

Cassandra is not a relational database. 



I am have attempted to illustrate that a "row store" is defined as well. I do not believe Cassandra is a "row store". 

" Just because it uses log structured storage, sparse fields, and semi-flexible collections doesn't disqualify it from calling it a "row store"" 

What is the definition of "row store". Is it a logical construct or a physical one? 

Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and present it as rows and columns. It seems to pass the litmus test being presented. 

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage 







On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad < jon@jonhaddad.com > wrote: 


Sorry Ed, but you're really stretching here. A table in Cassandra is structured by a schema with the data for each row stored together in each data file. Just because it uses log structured storage, sparse fields, and semi-flexible collections doesn't disqualify it from calling it a "row store" 

Postgres added flexible storage through hstore, I don't hear anyone arguing that it needs to be renamed. 

Any relational db could (and I'm sure one does!) allow for sparse fields as well. MySQL can be backed by rocksdb now, does that make it not a row store? 

You're arguing that everything is wrong but you're not proposing an alternative, which is not productive. 
On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo < edlinuxguru@gmail.com > wrote: 

BQ_BEGIN

Also every piece of techincal information that describes a rowstore 

http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf 
https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems 

Does it like this: 
001:10,Smith,Joe,40000;
002:12,Jones,Mary,50000;
003:11,Johnson,Cathy,44000;
004:22,Jones,Bob,55000; 


The never depict a scenario where a the data looks like this on disk: 

001:10,Smith 
001:10,40000; 
Which is much closer to how Cassandra stores it's data. 



On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith < benedict@apache.org > wrote: 

BQ_BEGIN

Absolutely. A "partitioned row store" is exactly what I would call it. As it happens, our README thinks the same, which is fantastic. 

I thought I'd take a look at the rest of our cohort, and didn't get far before disappointment. HBase literally calls itself a " column-oriented store" - which is so totally wrong it's simultaneously hilarious and tragic. 

I guess we can't blame the wider internet for misunderstanding/misnaming us poor "wide column stores" if even one of the major examples doesn't know what it, itself, is! 




On 30 September 2016 at 21:47, Jonathan Haddad < jon@jonhaddad.com > wrote: 

BQ_BEGIN
+1000 to what Benedict says. I usually call it a "partitioned row store" which usually needs some extra explanation but is more accurate than "column family" or whatever other thrift era terminology people still use. 
On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan < doanduyhai@gmail.com > wrote: 

BQ_BEGIN

I used to present Cassandra as a NoSQL datastore with "distributed" table. This definition is closer to CQL and has some academic background (distributed hash table). 


On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith < benedict@apache.org > wrote: 

BQ_BEGIN

Cassandra is not a "wide column store" anymore. It has a schema. Only thrift users no longer think they have a schema (though they do), and thrift is being deprecated. 

I really wish everyone would kill the term "wide column store" with fire. It seems to have never meant anything beyond "schema-less, row-oriented", and a "column store" means literally the opposite of this. 

Not only that, but people don't even seem to realise the term "column store" existed long before "wide column store" and the latter is often abbreviated to the former, as here: http://www.planetcassandra.org/what-is-nosql/ 

Since it no longer applies, let's all agree as a community to forget this awful nomenclature ever existed. 



On 30 September 2016 at 18:09, Joaquin Casares < joaquin@thelastpickle.com > wrote: 

BQ_BEGIN

Hi Mehdi, 

I can help clarify a few things. 

As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can have 2 billion columns, but in practice it shouldn't have more than 100 million columns. 

Cassandra partitions data to certain nodes based on the partition key(s), but does provide the option of setting zero or more clustering keys. Together, the partition key(s) and clustering key(s) form the primary key. 

When writing to Cassandra, you will need to provide the full primary key, however, when reading from Cassandra, you only need to provide the full partition key. 

When you only provide the partition key for a read operation, you're able to return all columns that exist on that partition with low latency. These columns are displayed as "CQL rows" to make it easier to reason about. 

Consider the schema: 


BQ_BEGIN

CREATE TABLE foo ( 
bar uuid, 



BQ_BEGIN

boz uuid, 

BQ_END

BQ_BEGIN

baz timeuuid, 
data1 text, 

BQ_END

BQ_BEGIN

data2 text, 

BQ_END

BQ_BEGIN

PRIMARY KEY ((bar, boz), baz) 

BQ_END

BQ_BEGIN

); 

BQ_END

When you write to Cassandra you will need to send bar, boz, and baz and optionally data*, if it's relevant for that CQL row. If you chose not to define a data* field for a particular CQL row, then nothing is stored nor allocated on disk. But I wouldn't consider that caveat to be "schema-less". 

However, all writes to the same bar/boz will end up on the same Cassandra replica set (a configurable number of nodes) and be stored on the same place(s) on disk within the SSTable(s). And on disk, each field that's not a partition key is stored as a column, including clustering keys (this is optimized in Cassandra 3+, but now we're getting deep into internals). 

In this way you can get fast responses for all activity for bar/boz either over time, or for a specific time, with roughly the same number of disk seeks, with varying lengths on the disk scans. 

Hope that helps! 

Joaquin Casares 
Consultant 
Austin, TX 

Apache Cassandra Consulting 
http://www.thelastpickle.com 

On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso < info@mrcalonso.com > wrote: 

BQ_BEGIN

Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra 

Carlos Alonso | Software Engineer | @calonso 

On 30 September 2016 at 18:24, Mehdi Bada < mehdi.bada@dbi-services.com > wrote: 

BQ_BEGIN

Hi all, 

I have a theoritical question: 
- Is Apache Cassandra really a column store? 
Column store mean storing the data as column rather than as a rows. 

In fact C* store the data as row, and data is partionned with row key. 

Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it true for you also??? 

Many thanks in advance for your reply 

Best Regards 
Mehdi Bada 
---- 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.bada@dbi-services.com 
www.dbi-services.com 



⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team 

BQ_END



BQ_END



BQ_END



BQ_END



BQ_END


BQ_END



BQ_END



BQ_END


BQ_END



Re: Cassandra data model right definition

Posted by Peter Lin <wo...@gmail.com>.
Whether a storage engine requires schema isn't really critical for row
oriented storage. How about CSV that doesn't have a header row? CSV is
probably the most commonly used row oriented storage and tons of businesses
still use it for B2B transactions.

As you pointed out, some traditional RDBMS have been adding
"non-traditional" storage options, which is good for everyone. What RDBMS
still don't support is dynamic columns and I really doubt the SQL working
group would add it in the near future. Though SqlServer and Oracle both
support XML datatype, which one could argue "kind of" achieves the similar
flexibility to dynamic columns.

Then there's RDBMS that are adding native support for JSON, which muddies
the water even more. As an english major, being precise and concise with
language is important even if 80% of the people in the IT field abuse it.
I've been on countless sales calls with management. More often than not,
they read the documentation written by developers and feel like they're
reading gibberish. It's best to avoid loaded terms like "row store". Just
because some people like it, doesn't mean it achieves the goal of clear
communication.


On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
>
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,40000;
>> 002:12,Jones,Mary,50000;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,40000;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.
>> org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares <joaquin@thelastpickle.com
>> > wrote:
>>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>
>> Cassandra partitions data to certain nodes based on the partition key(s),
>> but does provide the option of setting zero or more clustering keys.
>> Together, the partition key(s) and clustering key(s) form the primary key.
>>
>> When writing to Cassandra, you will need to provide the full primary key,
>> however, when reading from Cassandra, you only need to provide the full
>> partition key.
>>
>> When you only provide the partition key for a read operation, you're able
>> to return all columns that exist on that partition with low latency. These
>> columns are displayed as "CQL rows" to make it easier to reason about.
>>
>> Consider the schema:
>>
>> CREATE TABLE foo (
>>   bar uuid,
>>
>>   boz uuid,
>>
>>   baz timeuuid,
>>   data1 text,
>>
>>   data2 text,
>>
>>   PRIMARY KEY ((bar, boz), baz)
>>
>> );
>>
>>
>> When you write to Cassandra you will need to send bar, boz, and baz and
>> optionally data*, if it's relevant for that CQL row. If you chose not to
>> define a data* field for a particular CQL row, then nothing is stored nor
>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>
>> However, all writes to the same bar/boz will end up on the same Cassandra
>> replica set (a configurable number of nodes) and be stored on the same
>> place(s) on disk within the SSTable(s). And on disk, each field that's not
>> a partition key is stored as a column, including clustering keys (this is
>> optimized in Cassandra 3+, but now we're getting deep into internals).
>>
>> In this way you can get fast responses for all activity for bar/boz
>> either over time, or for a specific time, with roughly the same number of
>> disk seeks, with varying lengths on the disk scans.
>>
>> Hope that helps!
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>> wrote:
>>
>> Cassandra is a Wide Column Store http://db-engines.com/
>> en/system/Cassandra
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>> wrote:
>>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>> true for you also???
>>
>> Many thanks in advance for your reply
>>
>> Best Regards
>> Mehdi Bada
>> ----
>>
>> *Mehdi Bada* | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>> 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.bada@dbi-services.com
>> www.dbi-services.com
>>
>>
>>
>>
>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>> team
>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>
>>
>>
>>
>>
>>
>>
>>

Re: Cassandra data model right definition

Posted by Russell Bradberry <rb...@gmail.com>.
"X-store" refers to how data is stored, in almost every case it refers to
what logical constructs are grouped together physically on disk.  It has
nothing to do with whether a database is relational or not.

Cassandra does, in fact meet the definition of row-store, however, I would
like to re-iterate that it goes beyond that and stores all rows for a
single partition together on disk as well.  Therefore row-store does not do
it justice, which is why I like the term "Partitioned row-store"

On Mon, Oct 3, 2016 at 12:37 PM, Benedict Elliott Smith <benedict@apache.org
> wrote:

> ... and my response can be summed up as "you are not parsing English
> correctly."  The word "like" does not mean what you think it means in this
> context.  It does not mean "close relative."  It is constrained to the
> similarities expressed, and no others.  You don't seem to be reading any of
> my responses about this, though, so I'm not sure parsing is your issue.
>
> Postgresql has had arrays for years, and all RDBMS (pretty much) avoid
> persisting nulls in exactly the same way C* does - encoding their absence
> in the row header.
>
> I empathise with the recent unsubscriber.
>
>
>
> On 3 October 2016 at 15:53, Edward Capriolo <ed...@gmail.com> wrote:
>
>> My original point can be summed up as:
>>
>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>> "like" and "close relative".
>>
>> For the specifics:
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> Lets draw some lines, a relational database is clearly defined.
>>
>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>
>> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a
>> result proven in his seminal work on the relational model, equates the
>> expressive power of relational algebra
>> <https://en.wikipedia.org/wiki/Relational_algebra> and relational
>> calculus <https://en.wikipedia.org/wiki/Relational_calculus> (both of
>> which, lacking recursion, are strictly less powerful thanfirst-order
>> logic <https://en.wikipedia.org/wiki/First-order_logic>).[*citation
>> needed <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>>
>> As the relational model started to become fashionable in the early 1980s,
>> Codd fought a sometimes bitter campaign to prevent the term being misused
>> by database vendors who had merely added a relational veneer to older
>> technology. As part of this campaign, he published his 12 rules
>> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
>> constituted a relational database. This made his position in IBM
>> increasingly difficult, so he left to form his own consulting company with
>> Chris Date and others.
>>
>> Cassandra is not a relational database.
>>
>> I am have attempted to illustrate that a "row store" is defined as well.
>> I do not believe Cassandra is a "row store".
>>
>> "Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store""
>>
>> What is the definition of "row store". Is it a logical construct or a
>> physical one?
>>
>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
>> present it as rows and columns. It seems to pass the litmus test being
>> presented.
>>
>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>>> structured by a schema with the data for each row stored together in each
>>> data file. Just because it uses log structured storage, sparse fields, and
>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>> store"
>>>
>>> Postgres added flexible storage through hstore, I don't hear anyone
>>> arguing that it needs to be renamed.
>>>
>>> Any relational db could (and I'm sure one does!) allow for sparse fields
>>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>>> store?
>>>
>>> You're arguing that everything is wrong but you're not proposing an
>>> alternative, which is not productive.
>>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
>>> wrote:
>>>
>>>> Also every piece of techincal information that describes a rowstore
>>>>
>>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>>>
>>>> Does it like this:
>>>>
>>>> 001:10,Smith,Joe,40000;
>>>> 002:12,Jones,Mary,50000;
>>>> 003:11,Johnson,Cathy,44000;
>>>> 004:22,Jones,Bob,55000;
>>>>
>>>>
>>>>
>>>> The never depict a scenario where a the data looks like this on disk:
>>>>
>>>> 001:10,Smith
>>>>
>>>> 001:10,40000;
>>>>
>>>> Which is much closer to how Cassandra *stores* it's data.
>>>>
>>>>
>>>>
>>>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>>>> benedict@apache.org> wrote:
>>>>
>>>> Absolutely.  A "partitioned row store" is exactly what I would call
>>>> it.  As it happens, our README thinks the same, which is fantastic.
>>>>
>>>> I thought I'd take a look at the rest of our cohort, and didn't get far
>>>> before disappointment.  HBase literally calls itself a "
>>>> *column-oriented* store" - which is so totally wrong it's
>>>> simultaneously hilarious and tragic.
>>>>
>>>> I guess we can't blame the wider internet for
>>>> misunderstanding/misnaming us poor "wide column stores" if even one of the
>>>> major examples doesn't know what it, itself, is!
>>>>
>>>>
>>>>
>>>>
>>>> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com>
>>>> wrote:
>>>>
>>>> +1000 to what Benedict says. I usually call it a "partitioned row
>>>> store" which usually needs some extra explanation but is more accurate than
>>>> "column family" or whatever other thrift era terminology people still use.
>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com>
>>>> wrote:
>>>>
>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>> table. This definition is closer to CQL and has some academic background
>>>> (distributed hash table).
>>>>
>>>>
>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>> benedict@apache.org> wrote:
>>>>
>>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>>> thrift users no longer think they have a schema (though they do), and
>>>> thrift is being deprecated.
>>>>
>>>> I really wish everyone would kill the term "wide column store" with
>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>
>>>> Not only that, but people don't even seem to realise the term "column
>>>> store" existed long before "wide column store" and the latter is often
>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>> /what-is-nosql/
>>>>
>>>> Since it no longer applies, let's all agree as a community to forget
>>>> this awful nomenclature ever existed.
>>>>
>>>>
>>>>
>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>> joaquin@thelastpickle.com> wrote:
>>>>
>>>> Hi Mehdi,
>>>>
>>>> I can help clarify a few things.
>>>>
>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>>> million columns.
>>>>
>>>> Cassandra partitions data to certain nodes based on the partition
>>>> key(s), but does provide the option of setting zero or more clustering
>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>> key.
>>>>
>>>> When writing to Cassandra, you will need to provide the full primary
>>>> key, however, when reading from Cassandra, you only need to provide the
>>>> full partition key.
>>>>
>>>> When you only provide the partition key for a read operation, you're
>>>> able to return all columns that exist on that partition with low latency.
>>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>>
>>>> Consider the schema:
>>>>
>>>> CREATE TABLE foo (
>>>>   bar uuid,
>>>>
>>>>   boz uuid,
>>>>
>>>>   baz timeuuid,
>>>>   data1 text,
>>>>
>>>>   data2 text,
>>>>
>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>
>>>> );
>>>>
>>>>
>>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>>> define a data* field for a particular CQL row, then nothing is stored nor
>>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>>
>>>> However, all writes to the same bar/boz will end up on the same
>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>> not a partition key is stored as a column, including clustering keys (this
>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>
>>>> In this way you can get fast responses for all activity for bar/boz
>>>> either over time, or for a specific time, with roughly the same number of
>>>> disk seeks, with varying lengths on the disk scans.
>>>>
>>>> Hope that helps!
>>>>
>>>> Joaquin Casares
>>>> Consultant
>>>> Austin, TX
>>>>
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>>> wrote:
>>>>
>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>> /system/Cassandra
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso
>>>> <https://twitter.com/calonso>
>>>>
>>>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I have a theoritical question:
>>>> - Is Apache Cassandra really a column store?
>>>> Column store mean storing the data as column rather than as a rows.
>>>>
>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>
>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>>>> true for you also???
>>>>
>>>> Many thanks in advance for your reply
>>>>
>>>> Best Regards
>>>> Mehdi Bada
>>>> ----
>>>>
>>>> *Mehdi Bada* | Consultant
>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>> 96 15
>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>> mehdi.bada@dbi-services.com
>>>> www.dbi-services.com
>>>>
>>>>
>>>>
>>>>
>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>>> team
>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>

Re: Cassandra data model right definition

Posted by Benedict Elliott Smith <be...@apache.org>.
... and my response can be summed up as "you are not parsing English
correctly."  The word "like" does not mean what you think it means in this
context.  It does not mean "close relative."  It is constrained to the
similarities expressed, and no others.  You don't seem to be reading any of
my responses about this, though, so I'm not sure parsing is your issue.

Postgresql has had arrays for years, and all RDBMS (pretty much) avoid
persisting nulls in exactly the same way C* does - encoding their absence
in the row header.

I empathise with the recent unsubscriber.



On 3 October 2016 at 15:53, Edward Capriolo <ed...@gmail.com> wrote:

> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words include
> "like" and "close relative".
>
> For the specifics:
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a result
> proven in his seminal work on the relational model, equates the expressive
> power of relational algebra
> <https://en.wikipedia.org/wiki/Relational_algebra> and relational calculus
> <https://en.wikipedia.org/wiki/Relational_calculus> (both of which,
> lacking recursion, are strictly less powerful thanfirst-order logic
> <https://en.wikipedia.org/wiki/First-order_logic>).[*citation needed
> <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>
> As the relational model started to become fashionable in the early 1980s,
> Codd fought a sometimes bitter campaign to prevent the term being misused
> by database vendors who had merely added a relational veneer to older
> technology. As part of this campaign, he published his 12 rules
> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as well. I
> do not believe Cassandra is a "row store".
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
> present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>> structured by a schema with the data for each row stored together in each
>> data file. Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store"
>>
>> Postgres added flexible storage through hstore, I don't hear anyone
>> arguing that it needs to be renamed.
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> You're arguing that everything is wrong but you're not proposing an
>> alternative, which is not productive.
>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
>> wrote:
>>
>>> Also every piece of techincal information that describes a rowstore
>>>
>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>>
>>> Does it like this:
>>>
>>> 001:10,Smith,Joe,40000;
>>> 002:12,Jones,Mary,50000;
>>> 003:11,Johnson,Cathy,44000;
>>> 004:22,Jones,Bob,55000;
>>>
>>>
>>>
>>> The never depict a scenario where a the data looks like this on disk:
>>>
>>> 001:10,Smith
>>>
>>> 001:10,40000;
>>>
>>> Which is much closer to how Cassandra *stores* it's data.
>>>
>>>
>>>
>>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>>> benedict@apache.org> wrote:
>>>
>>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>>> As it happens, our README thinks the same, which is fantastic.
>>>
>>> I thought I'd take a look at the rest of our cohort, and didn't get far
>>> before disappointment.  HBase literally calls itself a "
>>> *column-oriented* store" - which is so totally wrong it's
>>> simultaneously hilarious and tragic.
>>>
>>> I guess we can't blame the wider internet for misunderstanding/misnaming
>>> us poor "wide column stores" if even one of the major examples doesn't know
>>> what it, itself, is!
>>>
>>>
>>>
>>>
>>> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>
>>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>>> which usually needs some extra explanation but is more accurate than
>>> "column family" or whatever other thrift era terminology people still use.
>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com>
>>> wrote:
>>>
>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>> table. This definition is closer to CQL and has some academic background
>>> (distributed hash table).
>>>
>>>
>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>> benedict@apache.org> wrote:
>>>
>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>> thrift users no longer think they have a schema (though they do), and
>>> thrift is being deprecated.
>>>
>>> I really wish everyone would kill the term "wide column store" with
>>> fire.  It seems to have never meant anything beyond "schema-less,
>>> row-oriented", and a "column store" means literally the opposite of this.
>>>
>>> Not only that, but people don't even seem to realise the term "column
>>> store" existed long before "wide column store" and the latter is often
>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>> /what-is-nosql/
>>>
>>> Since it no longer applies, let's all agree as a community to forget
>>> this awful nomenclature ever existed.
>>>
>>>
>>>
>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>> joaquin@thelastpickle.com> wrote:
>>>
>>> Hi Mehdi,
>>>
>>> I can help clarify a few things.
>>>
>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>> million columns.
>>>
>>> Cassandra partitions data to certain nodes based on the partition
>>> key(s), but does provide the option of setting zero or more clustering
>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>> key.
>>>
>>> When writing to Cassandra, you will need to provide the full primary
>>> key, however, when reading from Cassandra, you only need to provide the
>>> full partition key.
>>>
>>> When you only provide the partition key for a read operation, you're
>>> able to return all columns that exist on that partition with low latency.
>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>
>>> Consider the schema:
>>>
>>> CREATE TABLE foo (
>>>   bar uuid,
>>>
>>>   boz uuid,
>>>
>>>   baz timeuuid,
>>>   data1 text,
>>>
>>>   data2 text,
>>>
>>>   PRIMARY KEY ((bar, boz), baz)
>>>
>>> );
>>>
>>>
>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>> define a data* field for a particular CQL row, then nothing is stored nor
>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>
>>> However, all writes to the same bar/boz will end up on the same
>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>> not a partition key is stored as a column, including clustering keys (this
>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>
>>> In this way you can get fast responses for all activity for bar/boz
>>> either over time, or for a specific time, with roughly the same number of
>>> disk seeks, with varying lengths on the disk scans.
>>>
>>> Hope that helps!
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>> wrote:
>>>
>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>> /system/Cassandra
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> <https://twitter.com/calonso>
>>>
>>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I have a theoritical question:
>>> - Is Apache Cassandra really a column store?
>>> Column store mean storing the data as column rather than as a rows.
>>>
>>> In fact C* store the data as row, and data is partionned with row key.
>>>
>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>>> true for you also???
>>>
>>> Many thanks in advance for your reply
>>>
>>> Best Regards
>>> Mehdi Bada
>>> ----
>>>
>>> *Mehdi Bada* | Consultant
>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>>> 15
>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>> mehdi.bada@dbi-services.com
>>> www.dbi-services.com
>>>
>>>
>>>
>>>
>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>> team
>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>

Re: Cassandra data model right definition

Posted by Benedict Elliott Smith <be...@apache.org>.
I did not ascribe blame.  I only empathised with their predicament;  I
don't want to listen to either of us, either!





On 3 October 2016 at 19:45, Edward Capriolo <ed...@gmail.com> wrote:

> You know what don't "go low" and suggest the recent un-subscriber on me.
>
> If your so eager to deal with my pull request please review this one:
> I would rather you review this pull request: https://issues.
> apache.org/jira/browse/CASSANDRA-10825
>
>
>
>
>
> On Mon, Oct 3, 2016 at 1:04 PM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
>> Nobody is disputing that the docs can and should be improved to avoid
>> this misreading.  I've invited Ed to file a JIRA and/or pull request twice
>> now.
>>
>> You are of course just as welcome to do this.  Perhaps you will actually
>> do it, so we can all move on with our lives!
>>
>>
>>
>>
>> On 3 October 2016 at 17:45, Peter Lin <wo...@gmail.com> wrote:
>>
>>> I've met clients that read the cassandra docs and then said in a big
>>> meeting "it's just like relational database, it has tables just like
>>> sqlserver/oracle."
>>>
>>> I'm not putting words in other people's mouth either, but I've heard
>>> that said enough times to want to puke. Does the docs claim cassandra is
>>> relational ? it absolutely doesn't make that claim, but the docs play
>>> loosey goosey with terminology. End result is it confuses new users that
>>> aren't experts, or technology managers that try to make a case for
>>> cassandra.
>>>
>>> we can make all the excuses we want, but that doesn't change the fact
>>> the docs aren't user friendly. writing great documentation is tough and
>>> most developers hate it. It's cuz we suck at it. There I said it, "we SUCK
>>> as writing user friendly documentation". As many people have pointed out,
>>> it's not unique to Cassandra. 80% of the tech docs out there suck, starting
>>> with IBM at the top.
>>>
>>> Saying the docs suck isn't an indictment of anyone, it's just the
>>> reality of writing good documentation.
>>>
>>> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>
>>>> Nobody is claiming Cassandra is a relational I'm not sure why that
>>>> keeps coming up.
>>>> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <ed...@gmail.com>
>>>> wrote:
>>>>
>>>>> My original point can be summed up as:
>>>>>
>>>>> Do not define cassandra in terms SMILES & METAPHORS. Such words
>>>>> include "like" and "close relative".
>>>>>
>>>>> For the specifics:
>>>>>
>>>>>
>>>>> Any relational db could (and I'm sure one does!) allow for sparse
>>>>> fields as well. MySQL can be backed by rocksdb now, does that make it not a
>>>>> row store?
>>>>>
>>>>>
>>>>> Lets draw some lines, a relational database is clearly defined.
>>>>>
>>>>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>>>>
>>>>> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a
>>>>> result proven in his seminal work on the relational model, equates the
>>>>> expressive power of relational algebra
>>>>> <https://en.wikipedia.org/wiki/Relational_algebra> and relational
>>>>> calculus <https://en.wikipedia.org/wiki/Relational_calculus> (both of
>>>>> which, lacking recursion, are strictly less powerful thanfirst-order
>>>>> logic <https://en.wikipedia.org/wiki/First-order_logic>).[*citation
>>>>> needed <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>>>>>
>>>>> As the relational model started to become fashionable in the early
>>>>> 1980s, Codd fought a sometimes bitter campaign to prevent the term being
>>>>> misused by database vendors who had merely added a relational veneer to
>>>>> older technology. As part of this campaign, he published his 12 rules
>>>>> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
>>>>> constituted a relational database. This made his position in IBM
>>>>> increasingly difficult, so he left to form his own consulting company with
>>>>> Chris Date and others.
>>>>>
>>>>> Cassandra is not a relational database.
>>>>>
>>>>> I am have attempted to illustrate that a "row store" is defined as
>>>>> well. I do not believe Cassandra is a "row store".
>>>>>
>>>>>
>>>>>
>>>>> "Just because it uses log structured storage, sparse fields, and
>>>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>>>> store""
>>>>>
>>>>> What is the definition of "row store". Is it a logical construct or a
>>>>> physical one?
>>>>>
>>>>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo
>>>>> and present it as rows and columns. It seems to pass the litmus test being
>>>>> presented.
>>>>>
>>>>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com>
>>>>> wrote:
>>>>>
>>>>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>>>>> structured by a schema with the data for each row stored together in each
>>>>> data file. Just because it uses log structured storage, sparse fields, and
>>>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>>>> store"
>>>>>
>>>>> Postgres added flexible storage through hstore, I don't hear anyone
>>>>> arguing that it needs to be renamed.
>>>>>
>>>>> Any relational db could (and I'm sure one does!) allow for sparse
>>>>> fields as well. MySQL can be backed by rocksdb now, does that make it not a
>>>>> row store?
>>>>>
>>>>> You're arguing that everything is wrong but you're not proposing an
>>>>> alternative, which is not productive.
>>>>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Also every piece of techincal information that describes a rowstore
>>>>>
>>>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>>>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-orien
>>>>> ted_systems
>>>>>
>>>>> Does it like this:
>>>>>
>>>>> 001:10,Smith,Joe,40000;
>>>>> 002:12,Jones,Mary,50000;
>>>>> 003:11,Johnson,Cathy,44000;
>>>>> 004:22,Jones,Bob,55000;
>>>>>
>>>>>
>>>>>
>>>>> The never depict a scenario where a the data looks like this on disk:
>>>>>
>>>>> 001:10,Smith
>>>>>
>>>>> 001:10,40000;
>>>>>
>>>>> Which is much closer to how Cassandra *stores* it's data.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>>>>> benedict@apache.org> wrote:
>>>>>
>>>>> Absolutely.  A "partitioned row store" is exactly what I would call
>>>>> it.  As it happens, our README thinks the same, which is fantastic.
>>>>>
>>>>> I thought I'd take a look at the rest of our cohort, and didn't get
>>>>> far before disappointment.  HBase literally calls itself a "
>>>>> *column-oriented* store" - which is so totally wrong it's
>>>>> simultaneously hilarious and tragic.
>>>>>
>>>>> I guess we can't blame the wider internet for
>>>>> misunderstanding/misnaming us poor "wide column stores" if even one of the
>>>>> major examples doesn't know what it, itself, is!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com>
>>>>> wrote:
>>>>>
>>>>> +1000 to what Benedict says. I usually call it a "partitioned row
>>>>> store" which usually needs some extra explanation but is more accurate than
>>>>> "column family" or whatever other thrift era terminology people still use.
>>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>>> table. This definition is closer to CQL and has some academic background
>>>>> (distributed hash table).
>>>>>
>>>>>
>>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>>> benedict@apache.org> wrote:
>>>>>
>>>>> Cassandra is not a "wide column store" anymore.  It has a schema.
>>>>> Only thrift users no longer think they have a schema (though they do), and
>>>>> thrift is being deprecated.
>>>>>
>>>>> I really wish everyone would kill the term "wide column store" with
>>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>>
>>>>> Not only that, but people don't even seem to realise the term "column
>>>>> store" existed long before "wide column store" and the latter is often
>>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>>> /what-is-nosql/
>>>>>
>>>>> Since it no longer applies, let's all agree as a community to forget
>>>>> this awful nomenclature ever existed.
>>>>>
>>>>>
>>>>>
>>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>>> joaquin@thelastpickle.com> wrote:
>>>>>
>>>>> Hi Mehdi,
>>>>>
>>>>> I can help clarify a few things.
>>>>>
>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>>>> million columns.
>>>>>
>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>>> key.
>>>>>
>>>>> When writing to Cassandra, you will need to provide the full primary
>>>>> key, however, when reading from Cassandra, you only need to provide the
>>>>> full partition key.
>>>>>
>>>>> When you only provide the partition key for a read operation, you're
>>>>> able to return all columns that exist on that partition with low latency.
>>>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>>>
>>>>> Consider the schema:
>>>>>
>>>>> CREATE TABLE foo (
>>>>>   bar uuid,
>>>>>
>>>>>   boz uuid,
>>>>>
>>>>>   baz timeuuid,
>>>>>   data1 text,
>>>>>
>>>>>   data2 text,
>>>>>
>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>
>>>>> );
>>>>>
>>>>>
>>>>> When you write to Cassandra you will need to send bar, boz, and baz
>>>>> and optionally data*, if it's relevant for that CQL row. If you chose not
>>>>> to define a data* field for a particular CQL row, then nothing is stored
>>>>> nor allocated on disk. But I wouldn't consider that caveat to be
>>>>> "schema-less".
>>>>>
>>>>> However, all writes to the same bar/boz will end up on the same
>>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>>> not a partition key is stored as a column, including clustering keys (this
>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>
>>>>> In this way you can get fast responses for all activity for bar/boz
>>>>> either over time, or for a specific time, with roughly the same number of
>>>>> disk seeks, with varying lengths on the disk scans.
>>>>>
>>>>> Hope that helps!
>>>>>
>>>>> Joaquin Casares
>>>>> Consultant
>>>>> Austin, TX
>>>>>
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>>>> wrote:
>>>>>
>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>> /system/Cassandra
>>>>>
>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>> <https://twitter.com/calonso>
>>>>>
>>>>> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.bada@dbi-services.com
>>>>> > wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a theoritical question:
>>>>> - Is Apache Cassandra really a column store?
>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>
>>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>>
>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is
>>>>> it true for you also???
>>>>>
>>>>> Many thanks in advance for your reply
>>>>>
>>>>> Best Regards
>>>>> Mehdi Bada
>>>>> ----
>>>>>
>>>>> *Mehdi Bada* | Consultant
>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>>> 96 15
>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>> mehdi.bada@dbi-services.com
>>>>> www.dbi-services.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>>>> team
>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: Cassandra data model right definition

Posted by Edward Capriolo <ed...@gmail.com>.
You know what don't "go low" and suggest the recent un-subscriber on me.

If your so eager to deal with my pull request please review this one:
I would rather you review this pull request:
https://issues.apache.org/jira/browse/CASSANDRA-10825





On Mon, Oct 3, 2016 at 1:04 PM, Benedict Elliott Smith <be...@apache.org>
wrote:

> Nobody is disputing that the docs can and should be improved to avoid this
> misreading.  I've invited Ed to file a JIRA and/or pull request twice now.
>
> You are of course just as welcome to do this.  Perhaps you will actually
> do it, so we can all move on with our lives!
>
>
>
>
> On 3 October 2016 at 17:45, Peter Lin <wo...@gmail.com> wrote:
>
>> I've met clients that read the cassandra docs and then said in a big
>> meeting "it's just like relational database, it has tables just like
>> sqlserver/oracle."
>>
>> I'm not putting words in other people's mouth either, but I've heard that
>> said enough times to want to puke. Does the docs claim cassandra is
>> relational ? it absolutely doesn't make that claim, but the docs play
>> loosey goosey with terminology. End result is it confuses new users that
>> aren't experts, or technology managers that try to make a case for
>> cassandra.
>>
>> we can make all the excuses we want, but that doesn't change the fact the
>> docs aren't user friendly. writing great documentation is tough and most
>> developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
>> writing user friendly documentation". As many people have pointed out, it's
>> not unique to Cassandra. 80% of the tech docs out there suck, starting with
>> IBM at the top.
>>
>> Saying the docs suck isn't an indictment of anyone, it's just the reality
>> of writing good documentation.
>>
>> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>>> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
>>> coming up.
>>> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <ed...@gmail.com>
>>> wrote:
>>>
>>>> My original point can be summed up as:
>>>>
>>>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>>>> "like" and "close relative".
>>>>
>>>> For the specifics:
>>>>
>>>>
>>>> Any relational db could (and I'm sure one does!) allow for sparse
>>>> fields as well. MySQL can be backed by rocksdb now, does that make it not a
>>>> row store?
>>>>
>>>>
>>>> Lets draw some lines, a relational database is clearly defined.
>>>>
>>>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>>>
>>>> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a
>>>> result proven in his seminal work on the relational model, equates the
>>>> expressive power of relational algebra
>>>> <https://en.wikipedia.org/wiki/Relational_algebra> and relational
>>>> calculus <https://en.wikipedia.org/wiki/Relational_calculus> (both of
>>>> which, lacking recursion, are strictly less powerful thanfirst-order
>>>> logic <https://en.wikipedia.org/wiki/First-order_logic>).[*citation
>>>> needed <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>>>>
>>>> As the relational model started to become fashionable in the early
>>>> 1980s, Codd fought a sometimes bitter campaign to prevent the term being
>>>> misused by database vendors who had merely added a relational veneer to
>>>> older technology. As part of this campaign, he published his 12 rules
>>>> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
>>>> constituted a relational database. This made his position in IBM
>>>> increasingly difficult, so he left to form his own consulting company with
>>>> Chris Date and others.
>>>>
>>>> Cassandra is not a relational database.
>>>>
>>>> I am have attempted to illustrate that a "row store" is defined as
>>>> well. I do not believe Cassandra is a "row store".
>>>>
>>>>
>>>>
>>>> "Just because it uses log structured storage, sparse fields, and
>>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>>> store""
>>>>
>>>> What is the definition of "row store". Is it a logical construct or a
>>>> physical one?
>>>>
>>>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo
>>>> and present it as rows and columns. It seems to pass the litmus test being
>>>> presented.
>>>>
>>>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com>
>>>> wrote:
>>>>
>>>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>>>> structured by a schema with the data for each row stored together in each
>>>> data file. Just because it uses log structured storage, sparse fields, and
>>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>>> store"
>>>>
>>>> Postgres added flexible storage through hstore, I don't hear anyone
>>>> arguing that it needs to be renamed.
>>>>
>>>> Any relational db could (and I'm sure one does!) allow for sparse
>>>> fields as well. MySQL can be backed by rocksdb now, does that make it not a
>>>> row store?
>>>>
>>>> You're arguing that everything is wrong but you're not proposing an
>>>> alternative, which is not productive.
>>>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
>>>> wrote:
>>>>
>>>> Also every piece of techincal information that describes a rowstore
>>>>
>>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>>>
>>>> Does it like this:
>>>>
>>>> 001:10,Smith,Joe,40000;
>>>> 002:12,Jones,Mary,50000;
>>>> 003:11,Johnson,Cathy,44000;
>>>> 004:22,Jones,Bob,55000;
>>>>
>>>>
>>>>
>>>> The never depict a scenario where a the data looks like this on disk:
>>>>
>>>> 001:10,Smith
>>>>
>>>> 001:10,40000;
>>>>
>>>> Which is much closer to how Cassandra *stores* it's data.
>>>>
>>>>
>>>>
>>>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>>>> benedict@apache.org> wrote:
>>>>
>>>> Absolutely.  A "partitioned row store" is exactly what I would call
>>>> it.  As it happens, our README thinks the same, which is fantastic.
>>>>
>>>> I thought I'd take a look at the rest of our cohort, and didn't get far
>>>> before disappointment.  HBase literally calls itself a "
>>>> *column-oriented* store" - which is so totally wrong it's
>>>> simultaneously hilarious and tragic.
>>>>
>>>> I guess we can't blame the wider internet for
>>>> misunderstanding/misnaming us poor "wide column stores" if even one of the
>>>> major examples doesn't know what it, itself, is!
>>>>
>>>>
>>>>
>>>>
>>>> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com>
>>>> wrote:
>>>>
>>>> +1000 to what Benedict says. I usually call it a "partitioned row
>>>> store" which usually needs some extra explanation but is more accurate than
>>>> "column family" or whatever other thrift era terminology people still use.
>>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com>
>>>> wrote:
>>>>
>>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>>> table. This definition is closer to CQL and has some academic background
>>>> (distributed hash table).
>>>>
>>>>
>>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>>> benedict@apache.org> wrote:
>>>>
>>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>>> thrift users no longer think they have a schema (though they do), and
>>>> thrift is being deprecated.
>>>>
>>>> I really wish everyone would kill the term "wide column store" with
>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>
>>>> Not only that, but people don't even seem to realise the term "column
>>>> store" existed long before "wide column store" and the latter is often
>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>> /what-is-nosql/
>>>>
>>>> Since it no longer applies, let's all agree as a community to forget
>>>> this awful nomenclature ever existed.
>>>>
>>>>
>>>>
>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>> joaquin@thelastpickle.com> wrote:
>>>>
>>>> Hi Mehdi,
>>>>
>>>> I can help clarify a few things.
>>>>
>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>>> million columns.
>>>>
>>>> Cassandra partitions data to certain nodes based on the partition
>>>> key(s), but does provide the option of setting zero or more clustering
>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>> key.
>>>>
>>>> When writing to Cassandra, you will need to provide the full primary
>>>> key, however, when reading from Cassandra, you only need to provide the
>>>> full partition key.
>>>>
>>>> When you only provide the partition key for a read operation, you're
>>>> able to return all columns that exist on that partition with low latency.
>>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>>
>>>> Consider the schema:
>>>>
>>>> CREATE TABLE foo (
>>>>   bar uuid,
>>>>
>>>>   boz uuid,
>>>>
>>>>   baz timeuuid,
>>>>   data1 text,
>>>>
>>>>   data2 text,
>>>>
>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>
>>>> );
>>>>
>>>>
>>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>>> define a data* field for a particular CQL row, then nothing is stored nor
>>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>>
>>>> However, all writes to the same bar/boz will end up on the same
>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>> not a partition key is stored as a column, including clustering keys (this
>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>
>>>> In this way you can get fast responses for all activity for bar/boz
>>>> either over time, or for a specific time, with roughly the same number of
>>>> disk seeks, with varying lengths on the disk scans.
>>>>
>>>> Hope that helps!
>>>>
>>>> Joaquin Casares
>>>> Consultant
>>>> Austin, TX
>>>>
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>>> wrote:
>>>>
>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>> /system/Cassandra
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso
>>>> <https://twitter.com/calonso>
>>>>
>>>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I have a theoritical question:
>>>> - Is Apache Cassandra really a column store?
>>>> Column store mean storing the data as column rather than as a rows.
>>>>
>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>
>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>>>> true for you also???
>>>>
>>>> Many thanks in advance for your reply
>>>>
>>>> Best Regards
>>>> Mehdi Bada
>>>> ----
>>>>
>>>> *Mehdi Bada* | Consultant
>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>> 96 15
>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>> mehdi.bada@dbi-services.com
>>>> www.dbi-services.com
>>>>
>>>>
>>>>
>>>>
>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>>> team
>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>

Re: Cassandra data model right definition

Posted by Benedict Elliott Smith <be...@apache.org>.
Nobody is disputing that the docs can and should be improved to avoid this
misreading.  I've invited Ed to file a JIRA and/or pull request twice now.

You are of course just as welcome to do this.  Perhaps you will actually do
it, so we can all move on with our lives!




On 3 October 2016 at 17:45, Peter Lin <wo...@gmail.com> wrote:

> I've met clients that read the cassandra docs and then said in a big
> meeting "it's just like relational database, it has tables just like
> sqlserver/oracle."
>
> I'm not putting words in other people's mouth either, but I've heard that
> said enough times to want to puke. Does the docs claim cassandra is
> relational ? it absolutely doesn't make that claim, but the docs play
> loosey goosey with terminology. End result is it confuses new users that
> aren't experts, or technology managers that try to make a case for
> cassandra.
>
> we can make all the excuses we want, but that doesn't change the fact the
> docs aren't user friendly. writing great documentation is tough and most
> developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
> writing user friendly documentation". As many people have pointed out, it's
> not unique to Cassandra. 80% of the tech docs out there suck, starting with
> IBM at the top.
>
> Saying the docs suck isn't an indictment of anyone, it's just the reality
> of writing good documentation.
>
> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
>> coming up.
>> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <ed...@gmail.com>
>> wrote:
>>
>>> My original point can be summed up as:
>>>
>>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>>> "like" and "close relative".
>>>
>>> For the specifics:
>>>
>>>
>>> Any relational db could (and I'm sure one does!) allow for sparse fields
>>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>>> store?
>>>
>>>
>>> Lets draw some lines, a relational database is clearly defined.
>>>
>>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>>
>>> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a
>>> result proven in his seminal work on the relational model, equates the
>>> expressive power of relational algebra
>>> <https://en.wikipedia.org/wiki/Relational_algebra> and relational
>>> calculus <https://en.wikipedia.org/wiki/Relational_calculus> (both of
>>> which, lacking recursion, are strictly less powerful thanfirst-order
>>> logic <https://en.wikipedia.org/wiki/First-order_logic>).[*citation
>>> needed <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>>>
>>> As the relational model started to become fashionable in the early
>>> 1980s, Codd fought a sometimes bitter campaign to prevent the term being
>>> misused by database vendors who had merely added a relational veneer to
>>> older technology. As part of this campaign, he published his 12 rules
>>> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
>>> constituted a relational database. This made his position in IBM
>>> increasingly difficult, so he left to form his own consulting company with
>>> Chris Date and others.
>>>
>>> Cassandra is not a relational database.
>>>
>>> I am have attempted to illustrate that a "row store" is defined as well.
>>> I do not believe Cassandra is a "row store".
>>>
>>>
>>>
>>> "Just because it uses log structured storage, sparse fields, and
>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>> store""
>>>
>>> What is the definition of "row store". Is it a logical construct or a
>>> physical one?
>>>
>>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo
>>> and present it as rows and columns. It seems to pass the litmus test being
>>> presented.
>>>
>>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>
>>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>>> structured by a schema with the data for each row stored together in each
>>> data file. Just because it uses log structured storage, sparse fields, and
>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>> store"
>>>
>>> Postgres added flexible storage through hstore, I don't hear anyone
>>> arguing that it needs to be renamed.
>>>
>>> Any relational db could (and I'm sure one does!) allow for sparse fields
>>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>>> store?
>>>
>>> You're arguing that everything is wrong but you're not proposing an
>>> alternative, which is not productive.
>>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
>>> wrote:
>>>
>>> Also every piece of techincal information that describes a rowstore
>>>
>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>>
>>> Does it like this:
>>>
>>> 001:10,Smith,Joe,40000;
>>> 002:12,Jones,Mary,50000;
>>> 003:11,Johnson,Cathy,44000;
>>> 004:22,Jones,Bob,55000;
>>>
>>>
>>>
>>> The never depict a scenario where a the data looks like this on disk:
>>>
>>> 001:10,Smith
>>>
>>> 001:10,40000;
>>>
>>> Which is much closer to how Cassandra *stores* it's data.
>>>
>>>
>>>
>>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>>> benedict@apache.org> wrote:
>>>
>>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>>> As it happens, our README thinks the same, which is fantastic.
>>>
>>> I thought I'd take a look at the rest of our cohort, and didn't get far
>>> before disappointment.  HBase literally calls itself a "
>>> *column-oriented* store" - which is so totally wrong it's
>>> simultaneously hilarious and tragic.
>>>
>>> I guess we can't blame the wider internet for misunderstanding/misnaming
>>> us poor "wide column stores" if even one of the major examples doesn't know
>>> what it, itself, is!
>>>
>>>
>>>
>>>
>>> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>
>>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>>> which usually needs some extra explanation but is more accurate than
>>> "column family" or whatever other thrift era terminology people still use.
>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com>
>>> wrote:
>>>
>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>> table. This definition is closer to CQL and has some academic background
>>> (distributed hash table).
>>>
>>>
>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>> benedict@apache.org> wrote:
>>>
>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>> thrift users no longer think they have a schema (though they do), and
>>> thrift is being deprecated.
>>>
>>> I really wish everyone would kill the term "wide column store" with
>>> fire.  It seems to have never meant anything beyond "schema-less,
>>> row-oriented", and a "column store" means literally the opposite of this.
>>>
>>> Not only that, but people don't even seem to realise the term "column
>>> store" existed long before "wide column store" and the latter is often
>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>> /what-is-nosql/
>>>
>>> Since it no longer applies, let's all agree as a community to forget
>>> this awful nomenclature ever existed.
>>>
>>>
>>>
>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>> joaquin@thelastpickle.com> wrote:
>>>
>>> Hi Mehdi,
>>>
>>> I can help clarify a few things.
>>>
>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>> million columns.
>>>
>>> Cassandra partitions data to certain nodes based on the partition
>>> key(s), but does provide the option of setting zero or more clustering
>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>> key.
>>>
>>> When writing to Cassandra, you will need to provide the full primary
>>> key, however, when reading from Cassandra, you only need to provide the
>>> full partition key.
>>>
>>> When you only provide the partition key for a read operation, you're
>>> able to return all columns that exist on that partition with low latency.
>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>
>>> Consider the schema:
>>>
>>> CREATE TABLE foo (
>>>   bar uuid,
>>>
>>>   boz uuid,
>>>
>>>   baz timeuuid,
>>>   data1 text,
>>>
>>>   data2 text,
>>>
>>>   PRIMARY KEY ((bar, boz), baz)
>>>
>>> );
>>>
>>>
>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>> define a data* field for a particular CQL row, then nothing is stored nor
>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>
>>> However, all writes to the same bar/boz will end up on the same
>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>> not a partition key is stored as a column, including clustering keys (this
>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>
>>> In this way you can get fast responses for all activity for bar/boz
>>> either over time, or for a specific time, with roughly the same number of
>>> disk seeks, with varying lengths on the disk scans.
>>>
>>> Hope that helps!
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>> wrote:
>>>
>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>> /system/Cassandra
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> <https://twitter.com/calonso>
>>>
>>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I have a theoritical question:
>>> - Is Apache Cassandra really a column store?
>>> Column store mean storing the data as column rather than as a rows.
>>>
>>> In fact C* store the data as row, and data is partionned with row key.
>>>
>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>>> true for you also???
>>>
>>> Many thanks in advance for your reply
>>>
>>> Best Regards
>>> Mehdi Bada
>>> ----
>>>
>>> *Mehdi Bada* | Consultant
>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>>> 15
>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>> mehdi.bada@dbi-services.com
>>> www.dbi-services.com
>>>
>>>
>>>
>>>
>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>> team
>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>

Re: Cassandra data model right definition

Posted by Peter Lin <wo...@gmail.com>.
I've met clients that read the cassandra docs and then said in a big
meeting "it's just like relational database, it has tables just like
sqlserver/oracle."

I'm not putting words in other people's mouth either, but I've heard that
said enough times to want to puke. Does the docs claim cassandra is
relational ? it absolutely doesn't make that claim, but the docs play
loosey goosey with terminology. End result is it confuses new users that
aren't experts, or technology managers that try to make a case for
cassandra.

we can make all the excuses we want, but that doesn't change the fact the
docs aren't user friendly. writing great documentation is tough and most
developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
writing user friendly documentation". As many people have pointed out, it's
not unique to Cassandra. 80% of the tech docs out there suck, starting with
IBM at the top.

Saying the docs suck isn't an indictment of anyone, it's just the reality
of writing good documentation.

On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
> coming up.
> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> My original point can be summed up as:
>>
>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>> "like" and "close relative".
>>
>> For the specifics:
>>
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>>
>> Lets draw some lines, a relational database is clearly defined.
>>
>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>
>> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a
>> result proven in his seminal work on the relational model, equates the
>> expressive power of relational algebra
>> <https://en.wikipedia.org/wiki/Relational_algebra> and relational
>> calculus <https://en.wikipedia.org/wiki/Relational_calculus> (both of
>> which, lacking recursion, are strictly less powerful thanfirst-order
>> logic <https://en.wikipedia.org/wiki/First-order_logic>).[*citation
>> needed <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>>
>> As the relational model started to become fashionable in the early 1980s,
>> Codd fought a sometimes bitter campaign to prevent the term being misused
>> by database vendors who had merely added a relational veneer to older
>> technology. As part of this campaign, he published his 12 rules
>> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
>> constituted a relational database. This made his position in IBM
>> increasingly difficult, so he left to form his own consulting company with
>> Chris Date and others.
>>
>> Cassandra is not a relational database.
>>
>> I am have attempted to illustrate that a "row store" is defined as well.
>> I do not believe Cassandra is a "row store".
>>
>>
>>
>> "Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store""
>>
>> What is the definition of "row store". Is it a logical construct or a
>> physical one?
>>
>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
>> present it as rows and columns. It seems to pass the litmus test being
>> presented.
>>
>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>> structured by a schema with the data for each row stored together in each
>> data file. Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store"
>>
>> Postgres added flexible storage through hstore, I don't hear anyone
>> arguing that it needs to be renamed.
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> You're arguing that everything is wrong but you're not proposing an
>> alternative, which is not productive.
>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
>> wrote:
>>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,40000;
>> 002:12,Jones,Mary,50000;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,40000;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.
>> org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares <joaquin@thelastpickle.com
>> > wrote:
>>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>
>> Cassandra partitions data to certain nodes based on the partition key(s),
>> but does provide the option of setting zero or more clustering keys.
>> Together, the partition key(s) and clustering key(s) form the primary key.
>>
>> When writing to Cassandra, you will need to provide the full primary key,
>> however, when reading from Cassandra, you only need to provide the full
>> partition key.
>>
>> When you only provide the partition key for a read operation, you're able
>> to return all columns that exist on that partition with low latency. These
>> columns are displayed as "CQL rows" to make it easier to reason about.
>>
>> Consider the schema:
>>
>> CREATE TABLE foo (
>>   bar uuid,
>>
>>   boz uuid,
>>
>>   baz timeuuid,
>>   data1 text,
>>
>>   data2 text,
>>
>>   PRIMARY KEY ((bar, boz), baz)
>>
>> );
>>
>>
>> When you write to Cassandra you will need to send bar, boz, and baz and
>> optionally data*, if it's relevant for that CQL row. If you chose not to
>> define a data* field for a particular CQL row, then nothing is stored nor
>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>
>> However, all writes to the same bar/boz will end up on the same Cassandra
>> replica set (a configurable number of nodes) and be stored on the same
>> place(s) on disk within the SSTable(s). And on disk, each field that's not
>> a partition key is stored as a column, including clustering keys (this is
>> optimized in Cassandra 3+, but now we're getting deep into internals).
>>
>> In this way you can get fast responses for all activity for bar/boz
>> either over time, or for a specific time, with roughly the same number of
>> disk seeks, with varying lengths on the disk scans.
>>
>> Hope that helps!
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>> wrote:
>>
>> Cassandra is a Wide Column Store http://db-engines.com/
>> en/system/Cassandra
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>> wrote:
>>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>> true for you also???
>>
>> Many thanks in advance for your reply
>>
>> Best Regards
>> Mehdi Bada
>> ----
>>
>> *Mehdi Bada* | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>> 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.bada@dbi-services.com
>> www.dbi-services.com
>>
>>
>>
>>
>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>> team
>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>
>>
>>
>>
>>
>>
>>
>>

Re: Cassandra data model right definition

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
It's a row store because its schemed (vs ad hoc documents), and data (rows)
are stored together. What would you call the things you iterate over when
you query a partition? Rows. That makes it a thing that stores "rows" of
data, row store isn't some crazy stretch.
On Mon, Oct 3, 2016 at 12:33 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
> coming up.
> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <ed...@gmail.com>
> wrote:
>
> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words include
> "like" and "close relative".
>
> For the specifics:
>
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a result
> proven in his seminal work on the relational model, equates the expressive
> power of relational algebra
> <https://en.wikipedia.org/wiki/Relational_algebra> and relational calculus
> <https://en.wikipedia.org/wiki/Relational_calculus> (both of which,
> lacking recursion, are strictly less powerful thanfirst-order logic
> <https://en.wikipedia.org/wiki/First-order_logic>).[*citation needed
> <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>
> As the relational model started to become fashionable in the early 1980s,
> Codd fought a sometimes bitter campaign to prevent the term being misused
> by database vendors who had merely added a relational veneer to older
> technology. As part of this campaign, he published his 12 rules
> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as well. I
> do not believe Cassandra is a "row store".
>
>
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
> present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
> wrote:
>
> Also every piece of techincal information that describes a rowstore
>
> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>
> Does it like this:
>
> 001:10,Smith,Joe,40000;
> 002:12,Jones,Mary,50000;
> 003:11,Johnson,Cathy,44000;
> 004:22,Jones,Bob,55000;
>
>
>
> The never depict a scenario where a the data looks like this on disk:
>
> 001:10,Smith
>
> 001:10,40000;
>
> Which is much closer to how Cassandra *stores* it's data.
>
>
>
> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>
> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
> thrift users no longer think they have a schema (though they do), and
> thrift is being deprecated.
>
> I really wish everyone would kill the term "wide column store" with fire.
> It seems to have never meant anything beyond "schema-less, row-oriented",
> and a "column store" means literally the opposite of this.
>
> Not only that, but people don't even seem to realise the term "column
> store" existed long before "wide column store" and the latter is often
> abbreviated to the former, as here:
> http://www.planetcassandra.org/what-is-nosql/
>
> Since it no longer applies, let's all agree as a community to forget this
> awful nomenclature ever existed.
>
>
>
> On 30 September 2016 at 18:09, Joaquin Casares <jo...@thelastpickle.com>
> wrote:
>
> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
> have 2 billion columns, but in practice it shouldn't have more than 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition key(s),
> but does provide the option of setting zero or more clustering keys.
> Together, the partition key(s) and clustering key(s) form the primary key.
>
> When writing to Cassandra, you will need to provide the full primary key,
> however, when reading from Cassandra, you only need to provide the full
> partition key.
>
> When you only provide the partition key for a read operation, you're able
> to return all columns that exist on that partition with low latency. These
> columns are displayed as "CQL rows" to make it easier to reason about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz and
> optionally data*, if it's relevant for that CQL row. If you chose not to
> define a data* field for a particular CQL row, then nothing is stored nor
> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>
> However, all writes to the same bar/boz will end up on the same Cassandra
> replica set (a configurable number of nodes) and be stored on the same
> place(s) on disk within the SSTable(s). And on disk, each field that's not
> a partition key is stored as a column, including clustering keys (this is
> optimized in Cassandra 3+, but now we're getting deep into internals).
>
> In this way you can get fast responses for all activity for bar/boz either
> over time, or for a specific time, with roughly the same number of disk
> seeks, with varying lengths on the disk scans.
>
> Hope that helps!
>
> Joaquin Casares
> Consultant
> Austin, TX
>
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
> wrote:
>
> Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
> wrote:
>
> Hi all,
>
> I have a theoritical question:
> - Is Apache Cassandra really a column store?
> Column store mean storing the data as column rather than as a rows.
>
> In fact C* store the data as row, and data is partionned with row key.
>
> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
> true for you also???
>
> Many thanks in advance for your reply
>
> Best Regards
> Mehdi Bada
> ----
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.bada@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>
>
>
>
>
>
>
>

Re: Cassandra data model right definition

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Nobody is claiming Cassandra is a relational I'm not sure why that keeps
coming up.
On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo <ed...@gmail.com>
wrote:

> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words include
> "like" and "close relative".
>
> For the specifics:
>
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a result
> proven in his seminal work on the relational model, equates the expressive
> power of relational algebra
> <https://en.wikipedia.org/wiki/Relational_algebra> and relational calculus
> <https://en.wikipedia.org/wiki/Relational_calculus> (both of which,
> lacking recursion, are strictly less powerful thanfirst-order logic
> <https://en.wikipedia.org/wiki/First-order_logic>).[*citation needed
> <https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]
>
> As the relational model started to become fashionable in the early 1980s,
> Codd fought a sometimes bitter campaign to prevent the term being misused
> by database vendors who had merely added a relational veneer to older
> technology. As part of this campaign, he published his 12 rules
> <https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as well. I
> do not believe Cassandra is a "row store".
>
>
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
> present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
> wrote:
>
> Also every piece of techincal information that describes a rowstore
>
> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>
> Does it like this:
>
> 001:10,Smith,Joe,40000;
> 002:12,Jones,Mary,50000;
> 003:11,Johnson,Cathy,44000;
> 004:22,Jones,Bob,55000;
>
>
>
> The never depict a scenario where a the data looks like this on disk:
>
> 001:10,Smith
>
> 001:10,40000;
>
> Which is much closer to how Cassandra *stores* it's data.
>
>
>
> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>
> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
> thrift users no longer think they have a schema (though they do), and
> thrift is being deprecated.
>
> I really wish everyone would kill the term "wide column store" with fire.
> It seems to have never meant anything beyond "schema-less, row-oriented",
> and a "column store" means literally the opposite of this.
>
> Not only that, but people don't even seem to realise the term "column
> store" existed long before "wide column store" and the latter is often
> abbreviated to the former, as here:
> http://www.planetcassandra.org/what-is-nosql/
>
> Since it no longer applies, let's all agree as a community to forget this
> awful nomenclature ever existed.
>
>
>
> On 30 September 2016 at 18:09, Joaquin Casares <jo...@thelastpickle.com>
> wrote:
>
> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
> have 2 billion columns, but in practice it shouldn't have more than 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition key(s),
> but does provide the option of setting zero or more clustering keys.
> Together, the partition key(s) and clustering key(s) form the primary key.
>
> When writing to Cassandra, you will need to provide the full primary key,
> however, when reading from Cassandra, you only need to provide the full
> partition key.
>
> When you only provide the partition key for a read operation, you're able
> to return all columns that exist on that partition with low latency. These
> columns are displayed as "CQL rows" to make it easier to reason about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz and
> optionally data*, if it's relevant for that CQL row. If you chose not to
> define a data* field for a particular CQL row, then nothing is stored nor
> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>
> However, all writes to the same bar/boz will end up on the same Cassandra
> replica set (a configurable number of nodes) and be stored on the same
> place(s) on disk within the SSTable(s). And on disk, each field that's not
> a partition key is stored as a column, including clustering keys (this is
> optimized in Cassandra 3+, but now we're getting deep into internals).
>
> In this way you can get fast responses for all activity for bar/boz either
> over time, or for a specific time, with roughly the same number of disk
> seeks, with varying lengths on the disk scans.
>
> Hope that helps!
>
> Joaquin Casares
> Consultant
> Austin, TX
>
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
> wrote:
>
> Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
> wrote:
>
> Hi all,
>
> I have a theoritical question:
> - Is Apache Cassandra really a column store?
> Column store mean storing the data as column rather than as a rows.
>
> In fact C* store the data as row, and data is partionned with row key.
>
> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
> true for you also???
>
> Many thanks in advance for your reply
>
> Best Regards
> Mehdi Bada
> ----
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.bada@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>
>
>
>
>
>
>
>

Re: Cassandra data model right definition

Posted by Edward Capriolo <ed...@gmail.com>.
My original point can be summed up as:

Do not define cassandra in terms SMILES & METAPHORS. Such words include
"like" and "close relative".

For the specifics:

Any relational db could (and I'm sure one does!) allow for sparse fields as
well. MySQL can be backed by rocksdb now, does that make it not a row store?

Lets draw some lines, a relational database is clearly defined.

https://en.wikipedia.org/wiki/Edgar_F._Codd

Codd's theorem <https://en.wikipedia.org/wiki/Codd%27s_theorem>, a result
proven in his seminal work on the relational model, equates the expressive
power of relational algebra
<https://en.wikipedia.org/wiki/Relational_algebra> and relational calculus
<https://en.wikipedia.org/wiki/Relational_calculus> (both of which, lacking
recursion, are strictly less powerful thanfirst-order logic
<https://en.wikipedia.org/wiki/First-order_logic>).[*citation needed
<https://en.wikipedia.org/wiki/Wikipedia:Citation_needed>*]

As the relational model started to become fashionable in the early 1980s,
Codd fought a sometimes bitter campaign to prevent the term being misused
by database vendors who had merely added a relational veneer to older
technology. As part of this campaign, he published his 12 rules
<https://en.wikipedia.org/wiki/Codd%27s_12_rules> to define what
constituted a relational database. This made his position in IBM
increasingly difficult, so he left to form his own consulting company with
Chris Date and others.

Cassandra is not a relational database.

I am have attempted to illustrate that a "row store" is defined as well. I
do not believe Cassandra is a "row store".

"Just because it uses log structured storage, sparse fields, and
semi-flexible collections doesn't disqualify it from calling it a "row
store""

What is the definition of "row store". Is it a logical construct or a
physical one?

Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
present it as rows and columns. It seems to pass the litmus test being
presented.

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage





On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,40000;
>> 002:12,Jones,Mary,50000;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,40000;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.
>> org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares <joaquin@thelastpickle.com
>> > wrote:
>>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>
>> Cassandra partitions data to certain nodes based on the partition key(s),
>> but does provide the option of setting zero or more clustering keys.
>> Together, the partition key(s) and clustering key(s) form the primary key.
>>
>> When writing to Cassandra, you will need to provide the full primary key,
>> however, when reading from Cassandra, you only need to provide the full
>> partition key.
>>
>> When you only provide the partition key for a read operation, you're able
>> to return all columns that exist on that partition with low latency. These
>> columns are displayed as "CQL rows" to make it easier to reason about.
>>
>> Consider the schema:
>>
>> CREATE TABLE foo (
>>   bar uuid,
>>
>>   boz uuid,
>>
>>   baz timeuuid,
>>   data1 text,
>>
>>   data2 text,
>>
>>   PRIMARY KEY ((bar, boz), baz)
>>
>> );
>>
>>
>> When you write to Cassandra you will need to send bar, boz, and baz and
>> optionally data*, if it's relevant for that CQL row. If you chose not to
>> define a data* field for a particular CQL row, then nothing is stored nor
>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>
>> However, all writes to the same bar/boz will end up on the same Cassandra
>> replica set (a configurable number of nodes) and be stored on the same
>> place(s) on disk within the SSTable(s). And on disk, each field that's not
>> a partition key is stored as a column, including clustering keys (this is
>> optimized in Cassandra 3+, but now we're getting deep into internals).
>>
>> In this way you can get fast responses for all activity for bar/boz
>> either over time, or for a specific time, with roughly the same number of
>> disk seeks, with varying lengths on the disk scans.
>>
>> Hope that helps!
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>> wrote:
>>
>> Cassandra is a Wide Column Store http://db-engines.com/
>> en/system/Cassandra
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>> wrote:
>>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>> true for you also???
>>
>> Many thanks in advance for your reply
>>
>> Best Regards
>> Mehdi Bada
>> ----
>>
>> *Mehdi Bada* | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>> 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.bada@dbi-services.com
>> www.dbi-services.com
>>
>>
>>
>>
>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>> team
>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>
>>
>>
>>
>>
>>
>>
>>

Re: Cassandra data model right definition

Posted by Russell Bradberry <rb...@gmail.com>.
A couple things I would like to note:

1. Cassandra does not determine how data is stored on disk, the compaction
strategy does.  One could, in theory, (and I believe some are trying) could
create a column-store compaction strategy.  There is a large effort in the
database community overall to separate the query execution from the storage
engine, it is becoming increasingly more incorrect to say a database is an
"X store" database.

2. "X-store" is not used, and never has been, to describe how data is
represented or queried.  When most database storage engines describe their
storage as "X-store" they are referring to contiguous bytes on disk.  In
traditional rows-store engines, on a single node, the definition is as
follows: "All data for a row is stored as a single block of contiguous
bytes on disk".  Traditional column-stores are also defined as "All data
for a column is stored contiguously on disk".  Old-style Cassandra was a
key-value column-family store in that "all data for a family of columns
belonging to a given key were stored contiguously on disk"

So when talking about Cassandra and all currently merged compaction
strategies, yes, it fits the definition of a row-store in that "All data
for a row is stored as contiguous bytes on disk", however, it goes further
because "All data for all rows in a given partition are stored as
contiguous bytes on disk".  So at the highest level one could say it is a
"Partition-store" but that is pretty vague.   I think it is deserving of a
different naming definition which is why I like the term
"Partitioned-row-store"  which gives insight into the fact that it is rows
being stored on disk, in a partitioned format.

PS.
To address the pedants, yes, by these definitions you would have to assume
that a partition resides in a single SSTable. While most compaction
strategies try hard to achieve this it currently only exists in one that I
know. You could call it a
"Partitioned-row-depenendent-upon-compaction-strategy-store" but that is
just terrible.



On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
>
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,40000;
>> 002:12,Jones,Mary,50000;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,40000;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.
>> org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares <joaquin@thelastpickle.com
>> > wrote:
>>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>
>> Cassandra partitions data to certain nodes based on the partition key(s),
>> but does provide the option of setting zero or more clustering keys.
>> Together, the partition key(s) and clustering key(s) form the primary key.
>>
>> When writing to Cassandra, you will need to provide the full primary key,
>> however, when reading from Cassandra, you only need to provide the full
>> partition key.
>>
>> When you only provide the partition key for a read operation, you're able
>> to return all columns that exist on that partition with low latency. These
>> columns are displayed as "CQL rows" to make it easier to reason about.
>>
>> Consider the schema:
>>
>> CREATE TABLE foo (
>>   bar uuid,
>>
>>   boz uuid,
>>
>>   baz timeuuid,
>>   data1 text,
>>
>>   data2 text,
>>
>>   PRIMARY KEY ((bar, boz), baz)
>>
>> );
>>
>>
>> When you write to Cassandra you will need to send bar, boz, and baz and
>> optionally data*, if it's relevant for that CQL row. If you chose not to
>> define a data* field for a particular CQL row, then nothing is stored nor
>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>
>> However, all writes to the same bar/boz will end up on the same Cassandra
>> replica set (a configurable number of nodes) and be stored on the same
>> place(s) on disk within the SSTable(s). And on disk, each field that's not
>> a partition key is stored as a column, including clustering keys (this is
>> optimized in Cassandra 3+, but now we're getting deep into internals).
>>
>> In this way you can get fast responses for all activity for bar/boz
>> either over time, or for a specific time, with roughly the same number of
>> disk seeks, with varying lengths on the disk scans.
>>
>> Hope that helps!
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>> wrote:
>>
>> Cassandra is a Wide Column Store http://db-engines.com/
>> en/system/Cassandra
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>> wrote:
>>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>> true for you also???
>>
>> Many thanks in advance for your reply
>>
>> Best Regards
>> Mehdi Bada
>> ----
>>
>> *Mehdi Bada* | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>> 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.bada@dbi-services.com
>> www.dbi-services.com
>>
>>
>>
>>
>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>> team
>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>
>>
>>
>>
>>
>>
>>
>>

Re: Cassandra data model right definition

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Sorry Ed, but you're really stretching here. A table in Cassandra is
structured by a schema with the data for each row stored together in each
data file. Just because it uses log structured storage, sparse fields, and
semi-flexible collections doesn't disqualify it from calling it a "row
store"

Postgres added flexible storage through hstore, I don't hear anyone arguing
that it needs to be renamed.

Any relational db could (and I'm sure one does!) allow for sparse fields as
well. MySQL can be backed by rocksdb now, does that make it not a row store?

You're arguing that everything is wrong but you're not proposing an
alternative, which is not productive.
On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo <ed...@gmail.com>
wrote:

> Also every piece of techincal information that describes a rowstore
>
> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>
> Does it like this:
>
> 001:10,Smith,Joe,40000;
> 002:12,Jones,Mary,50000;
> 003:11,Johnson,Cathy,44000;
> 004:22,Jones,Bob,55000;
>
>
>
> The never depict a scenario where a the data looks like this on disk:
>
> 001:10,Smith
>
> 001:10,40000;
>
> Which is much closer to how Cassandra *stores* it's data.
>
>
>
> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>
> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
> thrift users no longer think they have a schema (though they do), and
> thrift is being deprecated.
>
> I really wish everyone would kill the term "wide column store" with fire.
> It seems to have never meant anything beyond "schema-less, row-oriented",
> and a "column store" means literally the opposite of this.
>
> Not only that, but people don't even seem to realise the term "column
> store" existed long before "wide column store" and the latter is often
> abbreviated to the former, as here:
> http://www.planetcassandra.org/what-is-nosql/
>
> Since it no longer applies, let's all agree as a community to forget this
> awful nomenclature ever existed.
>
>
>
> On 30 September 2016 at 18:09, Joaquin Casares <jo...@thelastpickle.com>
> wrote:
>
> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
> have 2 billion columns, but in practice it shouldn't have more than 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition key(s),
> but does provide the option of setting zero or more clustering keys.
> Together, the partition key(s) and clustering key(s) form the primary key.
>
> When writing to Cassandra, you will need to provide the full primary key,
> however, when reading from Cassandra, you only need to provide the full
> partition key.
>
> When you only provide the partition key for a read operation, you're able
> to return all columns that exist on that partition with low latency. These
> columns are displayed as "CQL rows" to make it easier to reason about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz and
> optionally data*, if it's relevant for that CQL row. If you chose not to
> define a data* field for a particular CQL row, then nothing is stored nor
> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>
> However, all writes to the same bar/boz will end up on the same Cassandra
> replica set (a configurable number of nodes) and be stored on the same
> place(s) on disk within the SSTable(s). And on disk, each field that's not
> a partition key is stored as a column, including clustering keys (this is
> optimized in Cassandra 3+, but now we're getting deep into internals).
>
> In this way you can get fast responses for all activity for bar/boz either
> over time, or for a specific time, with roughly the same number of disk
> seeks, with varying lengths on the disk scans.
>
> Hope that helps!
>
> Joaquin Casares
> Consultant
> Austin, TX
>
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
> wrote:
>
> Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
> wrote:
>
> Hi all,
>
> I have a theoritical question:
> - Is Apache Cassandra really a column store?
> Column store mean storing the data as column rather than as a rows.
>
> In fact C* store the data as row, and data is partionned with row key.
>
> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
> true for you also???
>
> Many thanks in advance for your reply
>
> Best Regards
> Mehdi Bada
> ----
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.bada@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>
>
>
>
>
>
>
>

Re: Cassandra data model right definition

Posted by Edward Capriolo <ed...@gmail.com>.
Also every piece of techincal information that describes a rowstore

http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems

Does it like this:

001:10,Smith,Joe,40000;
002:12,Jones,Mary,50000;
003:11,Johnson,Cathy,44000;
004:22,Jones,Bob,55000;



The never depict a scenario where a the data looks like this on disk:

001:10,Smith

001:10,40000;

Which is much closer to how Cassandra *stores* it's data.



On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <benedict@apache.org
> wrote:

> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>>
>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>> table. This definition is closer to CQL and has some academic background
>>> (distributed hash table).
>>>
>>>
>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>> benedict@apache.org> wrote:
>>>
>>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>>> thrift users no longer think they have a schema (though they do), and
>>>> thrift is being deprecated.
>>>>
>>>> I really wish everyone would kill the term "wide column store" with
>>>> fire.  It seems to have never meant anything beyond "schema-less,
>>>> row-oriented", and a "column store" means literally the opposite of this.
>>>>
>>>> Not only that, but people don't even seem to realise the term "column
>>>> store" existed long before "wide column store" and the latter is often
>>>> abbreviated to the former, as here: http://www.planetcassandra.org
>>>> /what-is-nosql/
>>>>
>>>> Since it no longer applies, let's all agree as a community to forget
>>>> this awful nomenclature ever existed.
>>>>
>>>>
>>>>
>>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>>> joaquin@thelastpickle.com> wrote:
>>>>
>>>>> Hi Mehdi,
>>>>>
>>>>> I can help clarify a few things.
>>>>>
>>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>>>> million columns.
>>>>>
>>>>> Cassandra partitions data to certain nodes based on the partition
>>>>> key(s), but does provide the option of setting zero or more clustering
>>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>>> key.
>>>>>
>>>>> When writing to Cassandra, you will need to provide the full primary
>>>>> key, however, when reading from Cassandra, you only need to provide the
>>>>> full partition key.
>>>>>
>>>>> When you only provide the partition key for a read operation, you're
>>>>> able to return all columns that exist on that partition with low latency.
>>>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>>>
>>>>> Consider the schema:
>>>>>
>>>>> CREATE TABLE foo (
>>>>>   bar uuid,
>>>>>
>>>>>   boz uuid,
>>>>>
>>>>>   baz timeuuid,
>>>>>   data1 text,
>>>>>
>>>>>   data2 text,
>>>>>
>>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>>
>>>>> );
>>>>>
>>>>>
>>>>> When you write to Cassandra you will need to send bar, boz, and baz
>>>>> and optionally data*, if it's relevant for that CQL row. If you chose not
>>>>> to define a data* field for a particular CQL row, then nothing is stored
>>>>> nor allocated on disk. But I wouldn't consider that caveat to be
>>>>> "schema-less".
>>>>>
>>>>> However, all writes to the same bar/boz will end up on the same
>>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>>> not a partition key is stored as a column, including clustering keys (this
>>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>>
>>>>> In this way you can get fast responses for all activity for bar/boz
>>>>> either over time, or for a specific time, with roughly the same number of
>>>>> disk seeks, with varying lengths on the disk scans.
>>>>>
>>>>> Hope that helps!
>>>>>
>>>>> Joaquin Casares
>>>>> Consultant
>>>>> Austin, TX
>>>>>
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>>>> wrote:
>>>>>
>>>>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>>>>> /system/Cassandra
>>>>>>
>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>> <https://twitter.com/calonso>
>>>>>>
>>>>>> On 30 September 2016 at 18:24, Mehdi Bada <
>>>>>> mehdi.bada@dbi-services.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a theoritical question:
>>>>>>> - Is Apache Cassandra really a column store?
>>>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>>>
>>>>>>> In fact C* store the data as row, and data is partionned with row
>>>>>>> key.
>>>>>>>
>>>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is
>>>>>>> it true for you also???
>>>>>>>
>>>>>>> Many thanks in advance for your reply
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Mehdi Bada
>>>>>>> ----
>>>>>>>
>>>>>>> *Mehdi Bada* | Consultant
>>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32
>>>>>>> 422 96 15
>>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>>> mehdi.bada@dbi-services.com
>>>>>>> www.dbi-services.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join
>>>>>>> the team
>>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: Cassandra data model right definition

Posted by Russell Bradberry <rb...@gmail.com>.
I agree 100%, this misunderstanding really bothers me as well.  I like the term “Partitioned Row Store” even though I am guilty of using the legacy “Column-Family Store” from darker times.  Even databases like Scylla which is supposed to be an Apache Cassandra clone tout themselves as a column-store, which is just utterly backwards as you mentioned.

 

From: Benedict Elliott Smith <be...@apache.org>
Reply-To: <us...@cassandra.apache.org>
Date: Friday, September 30, 2016 at 5:12 PM
To: <us...@cassandra.apache.org>
Subject: Re: Cassandra data model right definition

 

Absolutely.  A "partitioned row store" is exactly what I would call it.  As it happens, our README thinks the same, which is fantastic.  

 

I thought I'd take a look at the rest of our cohort, and didn't get far before disappointment.  HBase literally calls itself a "column-oriented store" - which is so totally wrong it's simultaneously hilarious and tragic.  

 

I guess we can't blame the wider internet for misunderstanding/misnaming us poor "wide column stores" if even one of the major examples doesn't know what it, itself, is!

 

 

 

 

On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com> wrote:

+1000 to what Benedict says. I usually call it a "partitioned row store" which usually needs some extra explanation but is more accurate than "column family" or whatever other thrift era terminology people still use. 

On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:

I used to present Cassandra as a NoSQL datastore with "distributed" table. This definition is closer to CQL and has some academic background (distributed hash table).

 

 

On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <be...@apache.org> wrote:

Cassandra is not a "wide column store" anymore.  It has a schema.  Only thrift users no longer think they have a schema (though they do), and thrift is being deprecated.

 

I really wish everyone would kill the term "wide column store" with fire.  It seems to have never meant anything beyond "schema-less, row-oriented", and a "column store" means literally the opposite of this.

 

Not only that, but people don't even seem to realise the term "column store" existed long before "wide column store" and the latter is often abbreviated to the former, as here: http://www.planetcassandra.org/what-is-nosql/ 

 

Since it no longer applies, let's all agree as a community to forget this awful nomenclature ever existed.

 

 

 

On 30 September 2016 at 18:09, Joaquin Casares <jo...@thelastpickle.com> wrote:

Hi Mehdi,

 

I can help clarify a few things.

 

As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can have 2 billion columns, but in practice it shouldn't have more than 100 million columns.

 

Cassandra partitions data to certain nodes based on the partition key(s), but does provide the option of setting zero or more clustering keys. Together, the partition key(s) and clustering key(s) form the primary key.

 

When writing to Cassandra, you will need to provide the full primary key, however, when reading from Cassandra, you only need to provide the full partition key.

 

When you only provide the partition key for a read operation, you're able to return all columns that exist on that partition with low latency. These columns are displayed as "CQL rows" to make it easier to reason about.

 

Consider the schema:

 

CREATE TABLE foo (

  bar uuid,

  boz uuid,

  baz timeuuid,

  data1 text,

  data2 text,

  PRIMARY KEY ((bar, boz), baz)

);

 

When you write to Cassandra you will need to send bar, boz, and baz and optionally data*, if it's relevant for that CQL row. If you chose not to define a data* field for a particular CQL row, then nothing is stored nor allocated on disk. But I wouldn't consider that caveat to be "schema-less".

 

However, all writes to the same bar/boz will end up on the same Cassandra replica set (a configurable number of nodes) and be stored on the same place(s) on disk within the SSTable(s). And on disk, each field that's not a partition key is stored as a column, including clustering keys (this is optimized in Cassandra 3+, but now we're getting deep into internals).

 

In this way you can get fast responses for all activity for bar/boz either over time, or for a specific time, with roughly the same number of disk seeks, with varying lengths on the disk scans.

 

Hope that helps!


Joaquin Casares

Consultant

Austin, TX

 

Apache Cassandra Consulting

http://www.thelastpickle.com

 

On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com> wrote:

Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra


Carlos Alonso | Software Engineer | @calonso

 

On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com> wrote:

Hi all, 

 

I have a theoritical question: 

- Is Apache Cassandra really a column store?

Column store mean storing the data as column rather than as a rows. 

 

In fact C* store the data as row, and data is partionned with row key.

 

Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it true for you also???

 

Many thanks in advance for your reply

 

Best Regards 

Mehdi Bada

----

 

Mehdi Bada | Consultant
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 

dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
mehdi.bada@dbi-services.com 

www.dbi-services.com

 

 

 

⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team

 

 

 

 

 


Re: Cassandra data model right definition

Posted by Benedict Elliott Smith <be...@apache.org>.
Absolutely.  A "partitioned row store" is exactly what I would call it.  As
it happens, our README thinks the same, which is fantastic.

I thought I'd take a look at the rest of our cohort, and didn't get far
before disappointment.  HBase literally calls itself a
"*column-oriented* store"
- which is so totally wrong it's simultaneously hilarious and tragic.

I guess we can't blame the wider internet for misunderstanding/misnaming us
poor "wide column stores" if even one of the major examples doesn't know
what it, itself, is!




On 30 September 2016 at 21:47, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:
>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> benedict@apache.org> wrote:
>>
>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>> thrift users no longer think they have a schema (though they do), and
>>> thrift is being deprecated.
>>>
>>> I really wish everyone would kill the term "wide column store" with
>>> fire.  It seems to have never meant anything beyond "schema-less,
>>> row-oriented", and a "column store" means literally the opposite of this.
>>>
>>> Not only that, but people don't even seem to realise the term "column
>>> store" existed long before "wide column store" and the latter is often
>>> abbreviated to the former, as here: http://www.planetcassandra.
>>> org/what-is-nosql/
>>>
>>> Since it no longer applies, let's all agree as a community to forget
>>> this awful nomenclature ever existed.
>>>
>>>
>>>
>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>> joaquin@thelastpickle.com> wrote:
>>>
>>>> Hi Mehdi,
>>>>
>>>> I can help clarify a few things.
>>>>
>>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>>> million columns.
>>>>
>>>> Cassandra partitions data to certain nodes based on the partition
>>>> key(s), but does provide the option of setting zero or more clustering
>>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>>> key.
>>>>
>>>> When writing to Cassandra, you will need to provide the full primary
>>>> key, however, when reading from Cassandra, you only need to provide the
>>>> full partition key.
>>>>
>>>> When you only provide the partition key for a read operation, you're
>>>> able to return all columns that exist on that partition with low latency.
>>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>>
>>>> Consider the schema:
>>>>
>>>> CREATE TABLE foo (
>>>>   bar uuid,
>>>>
>>>>   boz uuid,
>>>>
>>>>   baz timeuuid,
>>>>   data1 text,
>>>>
>>>>   data2 text,
>>>>
>>>>   PRIMARY KEY ((bar, boz), baz)
>>>>
>>>> );
>>>>
>>>>
>>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>>> define a data* field for a particular CQL row, then nothing is stored nor
>>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>>
>>>> However, all writes to the same bar/boz will end up on the same
>>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>>> not a partition key is stored as a column, including clustering keys (this
>>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>>
>>>> In this way you can get fast responses for all activity for bar/boz
>>>> either over time, or for a specific time, with roughly the same number of
>>>> disk seeks, with varying lengths on the disk scans.
>>>>
>>>> Hope that helps!
>>>>
>>>> Joaquin Casares
>>>> Consultant
>>>> Austin, TX
>>>>
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>>> wrote:
>>>>
>>>>> Cassandra is a Wide Column Store http://db-engines.com/
>>>>> en/system/Cassandra
>>>>>
>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>> <https://twitter.com/calonso>
>>>>>
>>>>> On 30 September 2016 at 18:24, Mehdi Bada <mehdi.bada@dbi-services.com
>>>>> > wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a theoritical question:
>>>>>> - Is Apache Cassandra really a column store?
>>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>>
>>>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>>>
>>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is
>>>>>> it true for you also???
>>>>>>
>>>>>> Many thanks in advance for your reply
>>>>>>
>>>>>> Best Regards
>>>>>> Mehdi Bada
>>>>>> ----
>>>>>>
>>>>>> *Mehdi Bada* | Consultant
>>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>>>> 96 15
>>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>>> mehdi.bada@dbi-services.com
>>>>>> www.dbi-services.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join
>>>>>> the team
>>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Re: Cassandra data model right definition

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
+1000 to what Benedict says. I usually call it a "partitioned row store"
which usually needs some extra explanation but is more accurate than
"column family" or whatever other thrift era terminology people still use.
On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <do...@gmail.com> wrote:

> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> benedict@apache.org> wrote:
>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here:
>> http://www.planetcassandra.org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares <joaquin@thelastpickle.com
>> > wrote:
>>
>>> Hi Mehdi,
>>>
>>> I can help clarify a few things.
>>>
>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>> million columns.
>>>
>>> Cassandra partitions data to certain nodes based on the partition
>>> key(s), but does provide the option of setting zero or more clustering
>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>> key.
>>>
>>> When writing to Cassandra, you will need to provide the full primary
>>> key, however, when reading from Cassandra, you only need to provide the
>>> full partition key.
>>>
>>> When you only provide the partition key for a read operation, you're
>>> able to return all columns that exist on that partition with low latency.
>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>
>>> Consider the schema:
>>>
>>> CREATE TABLE foo (
>>>   bar uuid,
>>>
>>>   boz uuid,
>>>
>>>   baz timeuuid,
>>>   data1 text,
>>>
>>>   data2 text,
>>>
>>>   PRIMARY KEY ((bar, boz), baz)
>>>
>>> );
>>>
>>>
>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>> define a data* field for a particular CQL row, then nothing is stored nor
>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>
>>> However, all writes to the same bar/boz will end up on the same
>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>> not a partition key is stored as a column, including clustering keys (this
>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>
>>> In this way you can get fast responses for all activity for bar/boz
>>> either over time, or for a specific time, with roughly the same number of
>>> disk seeks, with varying lengths on the disk scans.
>>>
>>> Hope that helps!
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>>> wrote:
>>>
>>>> Cassandra is a Wide Column Store
>>>> http://db-engines.com/en/system/Cassandra
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso
>>>> <https://twitter.com/calonso>
>>>>
>>>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a theoritical question:
>>>>> - Is Apache Cassandra really a column store?
>>>>> Column store mean storing the data as column rather than as a rows.
>>>>>
>>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>>
>>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is
>>>>> it true for you also???
>>>>>
>>>>> Many thanks in advance for your reply
>>>>>
>>>>> Best Regards
>>>>> Mehdi Bada
>>>>> ----
>>>>>
>>>>> *Mehdi Bada* | Consultant
>>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>>> 96 15
>>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>>> mehdi.bada@dbi-services.com
>>>>> www.dbi-services.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>>>> team
>>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Cassandra data model right definition

Posted by DuyHai Doan <do...@gmail.com>.
I used to present Cassandra as a NoSQL datastore with "distributed" table.
This definition is closer to CQL and has some academic background
(distributed hash table).


On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <benedict@apache.org
> wrote:

> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
> thrift users no longer think they have a schema (though they do), and
> thrift is being deprecated.
>
> I really wish everyone would kill the term "wide column store" with fire.
> It seems to have never meant anything beyond "schema-less, row-oriented",
> and a "column store" means literally the opposite of this.
>
> Not only that, but people don't even seem to realise the term "column
> store" existed long before "wide column store" and the latter is often
> abbreviated to the former, as here: http://www.planetcassandra.
> org/what-is-nosql/
>
> Since it no longer applies, let's all agree as a community to forget this
> awful nomenclature ever existed.
>
>
>
> On 30 September 2016 at 18:09, Joaquin Casares <jo...@thelastpickle.com>
> wrote:
>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>
>> Cassandra partitions data to certain nodes based on the partition key(s),
>> but does provide the option of setting zero or more clustering keys.
>> Together, the partition key(s) and clustering key(s) form the primary key.
>>
>> When writing to Cassandra, you will need to provide the full primary key,
>> however, when reading from Cassandra, you only need to provide the full
>> partition key.
>>
>> When you only provide the partition key for a read operation, you're able
>> to return all columns that exist on that partition with low latency. These
>> columns are displayed as "CQL rows" to make it easier to reason about.
>>
>> Consider the schema:
>>
>> CREATE TABLE foo (
>>   bar uuid,
>>
>>   boz uuid,
>>
>>   baz timeuuid,
>>   data1 text,
>>
>>   data2 text,
>>
>>   PRIMARY KEY ((bar, boz), baz)
>>
>> );
>>
>>
>> When you write to Cassandra you will need to send bar, boz, and baz and
>> optionally data*, if it's relevant for that CQL row. If you chose not to
>> define a data* field for a particular CQL row, then nothing is stored nor
>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>
>> However, all writes to the same bar/boz will end up on the same Cassandra
>> replica set (a configurable number of nodes) and be stored on the same
>> place(s) on disk within the SSTable(s). And on disk, each field that's not
>> a partition key is stored as a column, including clustering keys (this is
>> optimized in Cassandra 3+, but now we're getting deep into internals).
>>
>> In this way you can get fast responses for all activity for bar/boz
>> either over time, or for a specific time, with roughly the same number of
>> disk seeks, with varying lengths on the disk scans.
>>
>> Hope that helps!
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
>> wrote:
>>
>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>> /system/Cassandra
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> <https://twitter.com/calonso>
>>>
>>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a theoritical question:
>>>> - Is Apache Cassandra really a column store?
>>>> Column store mean storing the data as column rather than as a rows.
>>>>
>>>> In fact C* store the data as row, and data is partionned with row key.
>>>>
>>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>>>> true for you also???
>>>>
>>>> Many thanks in advance for your reply
>>>>
>>>> Best Regards
>>>> Mehdi Bada
>>>> ----
>>>>
>>>> *Mehdi Bada* | Consultant
>>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
>>>> 96 15
>>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>>> mehdi.bada@dbi-services.com
>>>> www.dbi-services.com
>>>>
>>>>
>>>>
>>>>
>>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>>> team
>>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>>
>>>
>>>
>>
>

Re: Cassandra data model right definition

Posted by Benedict Elliott Smith <be...@apache.org>.
Cassandra is not a "wide column store" anymore.  It has a schema.  Only
thrift users no longer think they have a schema (though they do), and
thrift is being deprecated.

I really wish everyone would kill the term "wide column store" with fire.
It seems to have never meant anything beyond "schema-less, row-oriented",
and a "column store" means literally the opposite of this.

Not only that, but people don't even seem to realise the term "column
store" existed long before "wide column store" and the latter is often
abbreviated to the former, as here:
http://www.planetcassandra.org/what-is-nosql/

Since it no longer applies, let's all agree as a community to forget this
awful nomenclature ever existed.



On 30 September 2016 at 18:09, Joaquin Casares <jo...@thelastpickle.com>
wrote:

> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
> have 2 billion columns, but in practice it shouldn't have more than 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition key(s),
> but does provide the option of setting zero or more clustering keys.
> Together, the partition key(s) and clustering key(s) form the primary key.
>
> When writing to Cassandra, you will need to provide the full primary key,
> however, when reading from Cassandra, you only need to provide the full
> partition key.
>
> When you only provide the partition key for a read operation, you're able
> to return all columns that exist on that partition with low latency. These
> columns are displayed as "CQL rows" to make it easier to reason about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz and
> optionally data*, if it's relevant for that CQL row. If you chose not to
> define a data* field for a particular CQL row, then nothing is stored nor
> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>
> However, all writes to the same bar/boz will end up on the same Cassandra
> replica set (a configurable number of nodes) and be stored on the same
> place(s) on disk within the SSTable(s). And on disk, each field that's not
> a partition key is stored as a column, including clustering keys (this is
> optimized in Cassandra 3+, but now we're getting deep into internals).
>
> In this way you can get fast responses for all activity for bar/boz either
> over time, or for a specific time, with roughly the same number of disk
> seeks, with varying lengths on the disk scans.
>
> Hope that helps!
>
> Joaquin Casares
> Consultant
> Austin, TX
>
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com>
> wrote:
>
>> Cassandra is a Wide Column Store http://db-engines.com/en
>> /system/Cassandra
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I have a theoritical question:
>>> - Is Apache Cassandra really a column store?
>>> Column store mean storing the data as column rather than as a rows.
>>>
>>> In fact C* store the data as row, and data is partionned with row key.
>>>
>>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>>> true for you also???
>>>
>>> Many thanks in advance for your reply
>>>
>>> Best Regards
>>> Mehdi Bada
>>> ----
>>>
>>> *Mehdi Bada* | Consultant
>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>>> 15
>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>> mehdi.bada@dbi-services.com
>>> www.dbi-services.com
>>>
>>>
>>>
>>>
>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>> team
>>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>>
>>
>>
>

Re: Cassandra data model right definition

Posted by Joaquin Casares <jo...@thelastpickle.com>.
Hi Mehdi,

I can help clarify a few things.

As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
have 2 billion columns, but in practice it shouldn't have more than 100
million columns.

Cassandra partitions data to certain nodes based on the partition key(s),
but does provide the option of setting zero or more clustering keys.
Together, the partition key(s) and clustering key(s) form the primary key.

When writing to Cassandra, you will need to provide the full primary key,
however, when reading from Cassandra, you only need to provide the full
partition key.

When you only provide the partition key for a read operation, you're able
to return all columns that exist on that partition with low latency. These
columns are displayed as "CQL rows" to make it easier to reason about.

Consider the schema:

CREATE TABLE foo (
  bar uuid,

  boz uuid,

  baz timeuuid,
  data1 text,

  data2 text,

  PRIMARY KEY ((bar, boz), baz)

);


When you write to Cassandra you will need to send bar, boz, and baz and
optionally data*, if it's relevant for that CQL row. If you chose not to
define a data* field for a particular CQL row, then nothing is stored nor
allocated on disk. But I wouldn't consider that caveat to be "schema-less".

However, all writes to the same bar/boz will end up on the same Cassandra
replica set (a configurable number of nodes) and be stored on the same
place(s) on disk within the SSTable(s). And on disk, each field that's not
a partition key is stored as a column, including clustering keys (this is
optimized in Cassandra 3+, but now we're getting deep into internals).

In this way you can get fast responses for all activity for bar/boz either
over time, or for a specific time, with roughly the same number of disk
seeks, with varying lengths on the disk scans.

Hope that helps!

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting
http://www.thelastpickle.com

On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <in...@mrcalonso.com> wrote:

> Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
> wrote:
>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
>> true for you also???
>>
>> Many thanks in advance for your reply
>>
>> Best Regards
>> Mehdi Bada
>> ----
>>
>> *Mehdi Bada* | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>> 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.bada@dbi-services.com
>> www.dbi-services.com
>>
>>
>>
>>
>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>> team
>> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>>
>
>

Re: Cassandra data model right definition

Posted by Oskar Kjellin <os...@gmail.com>.
It's not that easy as I recall this email thread https://groups.google.com/forum/m/#!topic/nosql-databases/ZLdgwCT_PNU

/Oskar 

> On 30 Sep 2016, at 18:40, Carlos Alonso <in...@mrcalonso.com> wrote:
> 
> Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra
> 
> Carlos Alonso | Software Engineer | @calonso
> 
>> On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com> wrote:
>> Hi all, 
>> 
>> I have a theoritical question: 
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows. 
>> 
>> In fact C* store the data as row, and data is partionned with row key.
>> 
>> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it true for you also???
>> 
>> Many thanks in advance for your reply
>> 
>> Best Regards 
>> Mehdi Bada
>> ----
>> 
>> Mehdi Bada | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.bada@dbi-services.com
>> www.dbi-services.com
>> 
>> 
>> 
>> 
>> ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team
> 

Re: Cassandra data model right definition

Posted by Carlos Alonso <in...@mrcalonso.com>.
Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 30 September 2016 at 18:24, Mehdi Bada <me...@dbi-services.com>
wrote:

> Hi all,
>
> I have a theoritical question:
> - Is Apache Cassandra really a column store?
> Column store mean storing the data as column rather than as a rows.
>
> In fact C* store the data as row, and data is partionned with row key.
>
> Finally, for me, Cassandra is a row oriented schema less DBMS.... Is it
> true for you also???
>
> Many thanks in advance for your reply
>
> Best Regards
> Mehdi Bada
> ----
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.bada@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> <http://www.dbi-services.com/fr/dbi-services-et-ses-collaborateurs/offres-emplois-opportunites-carrieres/>*
>