You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by chetan verma <ch...@gmail.com> on 2015/01/20 19:24:32 UTC

Dynamic Columns

Hi,

I am starting a new project with cassandra as database.
I have unstructured data so I need dynamic columns,
though in CQL3 we can achive this via Collections but there are some
downsides to it.
1. Collections are used to store small amount of data.
2. The maximum size of an item in a collection is 64K.
3. Cassandra reads a collection in its entirety.
4. Restrictions on number of items in collections is 64,000

And no support to get single column by map key, which is possible via
cassandra cli.
Please suggest whether I should use CQL3 or Thrift and which driver is best.

-- 
*Regards,*
*Chetan Verma*
*+91 99860 86634*

Re: Dynamic Columns

Posted by Xu Zhongxing <xu...@163.com>.
The original dynamic column idea in Google BigTable paper is a mapping of:


(row key, raw bytes) -> raw bytes


The restriction imposed by CQL is, as far as I understand, you need to have a type for each column. 


If the value types involved in the schema is limited, e.g. text or int or timestamp, we can approximate the raw bytes mapping by setting up a few value columns of explicit type.





At 2015-01-21 10:46:27, "Peter Lin" <wo...@gmail.com> wrote:



the thing is, CQL only handles some types of dynamic column use cases. There's plenty of examples on datastax.com that shows how to do CQL style dynamic columns.


based on what was described by Chetan, I don't feel CQL3 is a perfect fit for what he wants to do. To use CQL3, he'd have to change his approach.

In my temporal database, I use both Thrift and CQL. They compliment each other very nice. I don't understand why people have to put down Thrift or pretend it supports 100% of the use cases. Lots of people who started using Cassandra pre CQL and had no problems using thrift. Yes you have to understand more and the learning curve is steeper, but taking time to learn the internals of cassandra is a good thing.


Using CQL3 lists or maps, it would force the query to load the enter collection, but that is by design. To get the full power of the old style of dynamic columns, thrift is a better fit. I hope CQL continues to improve so that it supports 100% of the existing use cases.





On Tue, Jan 20, 2015 at 8:50 PM, Xu Zhongxing <xu...@163.com> wrote:

I approximate dynamic columns by data_key and data_value columns.
Is there a better way to get dynamic columns in CQL 3?

At 2015-01-21 09:41:02, "Peter Lin" <wo...@gmail.com> wrote:



I think that table example misses the point of chetan's functional requirement. he actually needs dynamic columns.



On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing <xu...@163.com> wrote:

Maybe this is the closest thing to "dynamic columns" in CQL 3.


create table reivew (
    product_id bigint,
    created_at timestamp,
    data_key text,
    data_tvalue text,
    data_ivalue int,
    primary key ((priduct_id, created_at), data_key)
);


data_tvalue and data_ivalue is optional.


At 2015-01-21 04:44:07, "chetan verma" <ch...@gmail.com> wrote:

Hi,


Adding to previous mail. For example: We have a column family named review (with some arbitrary data in map).


CREATE TABLE review(
product_id bigint,
created_at timestamp,
data_int map<text, int>,
data_text map<text, text>,
PRIMARY KEY (product_id, created_at)
);


Assume that these 2 maps I use to store arbitrary data (i.e. data_int and data_text for int and text values)
when we see output on cassandra-cli, it looks like in a partition as :
<clustering_key>:data_int:map_key as column name and value as map value.
suppose I need to get this value, I couldn't do that with CQL3 but in thrift its possible. Any Solution?


On Wed, Jan 21, 2015 at 1:06 AM, chetan verma <ch...@gmail.com> wrote:

Hi,


Most of the time I will  be querying on product_id and created_at, but for analytic I need to query almost on all column.
Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean 
columns for certain keys which is possible with thrift. Please suggest.


On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield <jl...@datastax.com> wrote:

Hello,


There are probably lots of options to this challenge.  The more details around your use case that you can provide, the easier it will be for this group to offer advice.


A few follow-up questions: 
  - How will you query this data?  
  - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns?


Depending on the answers to these questions, you have several options, of which here are a few:
Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty
Could use a clustering column to store a columns type and another col (potentially clustering) to store the value
i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue));
where attname stores the name of the attribute/column and attvalue stores the value of that attribute
have seen users use this model and create a "main" attribute row within a partition that stores the values associated with col4...n
Could store multiple collections
Others probably have ideas as well
You may want to look in the archives for a similar discussion topic.  Believe this item was asked a few months ago as well.



Jonathan Lacefield

Solution Architect |(404) 822 3487 | jlacefield@datastax.com





On Tue, Jan 20, 2015 at 1:40 PM, chetan verma <ch...@gmail.com> wrote:

Hi,


I am creating a review system. for instance lets assume following are the attibutes of system:


Review{
id bigint,
product_id bigint,
created_at timestamp,
summary text,
description text,
pros set<text>,
cons set<text>,
feature_rating map<text, int>
etc....
}
I created partition key as product_id (so that all the reviews for a given product will reside on same node)
and clustering key as created_at and id (Desc) so that  reviews will be sorted by time.


I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above.
Could you please let me know the best way.


On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <jl...@datastax.com> wrote:

Hello,


  Have you looked at solving this challenge with clustering columns?  Also, please describe the problem set details for more specific advice from this group.


  Starting new projects on Thrift isn't the recommended approach.  


Jonathan



Jonathan Lacefield

Solution Architect |(404) 822 3487 | jlacefield@datastax.com





On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <ch...@gmail.com> wrote:

Hi,


I am starting a new project with cassandra as database.
I have unstructured data so I need dynamic columns, 
though in CQL3 we can achive this via Collections but there are some downsides to it.
1. Collections are used to store small amount of data.
2. The maximum size of an item in a collection is 64K.
3. Cassandra reads a collection in its entirety.
4. Restrictions on number of items in collections is 64,000


And no support to get single column by map key, which is possible via cassandra cli.
Please suggest whether I should use CQL3 or Thrift and which driver is best.


--

Regards,
Chetan Verma
+91 99860 86634







--

Regards,
Chetan Verma
+91 99860 86634







--

Regards,
Chetan Verma
+91 99860 86634





--

Regards,
Chetan Verma
+91 99860 86634




Re: Re: Dynamic Columns

Posted by Peter Lin <wo...@gmail.com>.
I've written my fair share of crappy code, which became legacy. then I or
someone else was left with supporting it and something newer. Isn't that
the nature of software development.

I forget who said this quote first, but I'm gonna borrow it "only pretty
code is code that is in your head. once it's written, it becomes crap." I
tell my son this all the time. When we start a project we have no clue what
we should have known, so we make a butt load of mistakes. If we're lucky,
by the third or forth version it's not so smelly, but in the mean time we
have to keep supporting the stuff. Not because we want to, but because
we're the ones that put the users through it. Atleast that's how I see it.

having said that, at some point, the really old stuff should be deprecated
and cleaned out. It totally makes sense to remove thrift at some point. I
don't know when that is, but every piece of software eventually dies or is
abandoned. Except for Cobol. That thing will be around 200 yrs from now



On Wed, Jan 21, 2015 at 6:57 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Jan 21, 2015 at 2:09 PM, Peter Lin <wo...@gmail.com> wrote:
>
>> on the topic of multiple incompatible API's I recommend you look at
>> SqlServer and Sybase. Most of the legacy RDBMS have multiple incompatible
>> API. Though in some cases, it is/was unavoidable.
>>
>
> My bet is that the small development team responsible for Cassandra does
> not have anything like the number of contractual obligations that
> commercial databases from the 1980s had. In other words, I believe having
> two persistent, non-pluggable (this attribute probably excludes various
> "legacy" APIs?) APIs is far more "avoidable" in the Cassandra case than in
> the historic cases you cite. I could certainly be wrong... people who
> disagree with my assessment now have a way to make me pay for my wrongness
> by making me donate $20 to the Apache Foundation on Jan 1, 2019. [1] :D
>
> =Rob
> [1] Project committers/others with material ability (Datastax...) to
> affect outcome ineligible.
>
>

Re: Re: Dynamic Columns

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Jan 21, 2015 at 2:09 PM, Peter Lin <wo...@gmail.com> wrote:

> on the topic of multiple incompatible API's I recommend you look at
> SqlServer and Sybase. Most of the legacy RDBMS have multiple incompatible
> API. Though in some cases, it is/was unavoidable.
>

My bet is that the small development team responsible for Cassandra does
not have anything like the number of contractual obligations that
commercial databases from the 1980s had. In other words, I believe having
two persistent, non-pluggable (this attribute probably excludes various
"legacy" APIs?) APIs is far more "avoidable" in the Cassandra case than in
the historic cases you cite. I could certainly be wrong... people who
disagree with my assessment now have a way to make me pay for my wrongness
by making me donate $20 to the Apache Foundation on Jan 1, 2019. [1] :D

=Rob
[1] Project committers/others with material ability (Datastax...) to affect
outcome ineligible.

Re: Re: Dynamic Columns

Posted by Peter Lin <wo...@gmail.com>.
everyone is different. I also recommend users take time to understanding
every tool they use as much as time allows. We don't always have the luxury
of time, but I see no point recommending laziness.

I'm probably insane, since I also spend time reading papers on CRDT, paxos,
query compilers, machine learning and other topics I find fun.

on the topic of multiple incompatible API's I recommend you look at
SqlServer and Sybase. Most of the legacy RDBMS have multiple incompatible
API. Though in some cases, it is/was unavoidable.

On Wed, Jan 21, 2015 at 4:47 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Jan 21, 2015 at 9:19 AM, Peter Lin <wo...@gmail.com> wrote:
>
>>
>> I consistently recommend new users learn and understand both Thrift and
>> CQL.
>>
>
> FWIW, I consider this a disservice to new users. New users should use CQL,
> and not deploy against a deprecated-in-all-but-name API. Understanding
> non-CQL *storage* might be valuable, understanding the Thrift interface to
> storage is anti-valuable.
>
> Despite the dissembling public statements regarding Thrift "not going
> anywhere" it is obvious to me that no other databases exist with two
> non-pluggable and incompatible APIs for a reason. The pain of maintaining
> these two APIs will eventually become not worth the backwards
> compatibility. At this time it will be deprecated and then shortly
> thereafter removed; I expect this to happen at latest by EOY 2018. [1]
>
> =Rob
> [1] If anyone strongly disagrees, I am taking $20 cash bets, with any
> proceeds donated to the Apache Foundation.
>
>

Re: Re: Dynamic Columns

Posted by Peter Lin <wo...@gmail.com>.
I apologize if I've offended you, but I clearly stated CQL3 supports
dynamic columns. How it supports dynamic columns is different. If I'm
reading you correctly, I believe we agree both thrift and CQL3 support
dynamic columns. Where we differ that I feel the coverage for existing
thrift use cases isn't 100%. That may be right or wrong, but it is my
impression. I agree with you that CQL3 supports the majority of dynamic
column use cases, but in a slightly different way. There are cases like
mine which fit better in thrift.

Could I rip out all the stuff I did and replace it with CQL3 with a major
redesign? Yes, I could but honestly I see some downsides with that
proposition.

1. for modeling tools like mine an object API is a far better fit in my
bias opinion
2. text based languages like SQL and CQL could "in theory" provide similar
object safety, but it's so much work that most people don't bother. This is
from first hand experience building 3 orms and using most of the open
source orms in the java space. I've also used several orms in .Net and they
all suffer from this pain point. There's a reason why microsoft created
Linq.
3. the structure and syntax of SQL  and all variations of SQL are not
ideally suited to complex data structures that are graphs. A temporal
entity is an object graph that may be shallow (3-8 levels) or deep (15+).
SQL is ideally suited to tables. CQL in this regard is more flexible and
supports collections, but it's still not ideal for things like insurance
policies. Look at the Acord standard for property insurance, if you want to
get a better understanding. For example, a temporal record using ORM could
result in 500 rows of data in a dozen tables for a small entity to 50K+
rows for a large entity. The mailing list isn't the right place to go into
the theory and practice of temporal databases, but a lot of the design
choices I made is based on formal logic.



On Wed, Jan 21, 2015 at 4:06 PM, Sylvain Lebresne <sy...@datastax.com>
wrote:

> On Wed, Jan 21, 2015 at 6:19 PM, Peter Lin <wo...@gmail.com> wrote:
>
>> the dynamic column can't be part of the primary key. The temporal entity
>> key can be the default UUID or the user can choose the field in their
>> object. Within our framework, we have concept of temporal links between one
>> or more temporal entities. Poluting the primary key with the dynamic column
>> wouldn't work.
>>
>
> Not totally sure I understand. Are you talking about the underlying
> storage space used? If you are, we can discuss it (it's not too hard to
> remedy it in CQL, I was mainly trying to illustrating my point, not
> pretending this was a drop-in solution for your use case) but it's more of
> a performance discussion, and I think we've somewhat quit the realm of
> "there's things CQL3 doesn't support".
>
>
>> Please excuse the confusing RDB comparison. My point is that Cassandra's
>> dynamic column feature is the "unique" feature that makes it better than
>> traditional RDB or newSql like VoltDB for building temporal databases. With
>> databases that require static schema + alter table for managing schema
>> evolution, it makes it harder and results in down time.
>>
>
> Here again you seem you imply that CQL doesn't support dynamic columns, or
> has a somewhat inferior support, but that's just not true.
>
>
>> One of the challenges of data management over time is evolving the data
>> model and making queries simple. If the record is 5 years old, it probably
>> has a difference schema than a record inserted this week. With temporal
>> databases, every update is an insert, so it's a little bit more complex
>> than just "use a blob". There's a whole level of complication with temporal
>> data and CQL3 custom types isn't clear to me. I've read the CQL3
>> documentation on the custom types several times and it is rather poor. It
>> gives me the impression there's still work needed to get custom types in
>> good shape.
>>
>
> I'm sorry but that's a bit of hand waving. Custom types (and by that I
> mean user-provided AbstractType implementations) works in CQL *exactly*
> like in thrift: they are not in a better or worse shape than in thrift. And
> while the documentation on CQL3 is indeed poor on this part, so is the
> thrift documentation on the same subject (besides, I don't think you're
> whole point is about saying that documentation could be improved). Again,
> what you can do in thrift, you can do in CQL.
>

Honestly I haven't I tried to use CQL3 user provided type. I read the
specification several times and had a ton of questions along with several
other people that were trying to under what it meant. If you want people to
use it, the documentation needs to improve. I did give a good faith effort
and spent a week trying to understand what the spec is trying to say, but
it only resulted in more questions. So yes, I am hand waving because it
left me frustrated. Having been part of apache community for many years,
writing great docs is hard and most of us hate doing it. Just to be clear,
I'm not blaming anyone for poor docs. I'm just as guilty as everyone else
when it comes to docs.


>
>
>> I consistently recommend new users learn and understand both Thrift and
>> CQL.
>>
>
> I understand that you do this with the best of intentions and don't take
> it the wrong way but it is my opinion that you are counterproductive by
> doing so, and this for 2 reasons:
> 1) you don't only recommend users to learn both API, you justify that
> advice by affirming that there is a whole family of important use cases
> that thrift supports and CQL do not. Except that I pretend tat this
> affirmation is technically incorrect, and so far I haven't seen much
> example proving me wrong.
>

honestly the only use cases that matter to me is my use case. I know a lot
of people that use temporal databases in financial and insurance sector.
They all kludge together broken designs starting with static schema and
alter the schema when it evolves. With dynamic columns of either flavor
(cql3 & thrift), people can avoid many of the issues. I happen to prefer
thrift for specific parts of my project and CQL3 for the rest of it. I see
nothing wrong with picking the right tool that fits each use case.

Honestly I don't care who is right or wrong, I care about sharing
knowledge. When I'm wrong, I freely admit it and thank people for pointing
it out.


> 2) there is a wealth of evidence that trying to learn both thrift and CQL
> confuses the hell out of new users. Which is btw not surprising, both API
> presents the same concepts in seemingly different way (even though they do
> are the same concepts) and even have conflicting vocabulary, so it's
> obviously confusing when you try to learn those concepts in the first
> place. Trying to learn CQL when you know thrift well is fine, and why not
> learn thrift once you know and understand CQL well, but learning both is
> imo a bad advice. It could maybe (maybe) be justified if what you say about
> having whole family of use cases not being doable with CQL was true, but
> it's not.
>
>>
>> For the record, doing this kind of stuff in a relational database sucks
>> horribly.
>>
>
> I don't know what that has to do with CQL to be honest. If you're doing
> relational with CQL you're doing it wrong. And please note that I'm not
> saying CQL is the perfect API for modeling temporal data. But I don't get
> how thrift, which is very crude API, is a much better API at that than CQL
> (or, again, how it allows you to do things you can't with CQL).
>
>
I think you're reading too much into it. Since I did a horrible job
explaining it, I'll try again. My point is this. People who come from a SQL
world prefer CQL because it is conceptually similar and less scary. From my
experience, projects that need dynamic columns have a lot of subtlety and
it isn't always clear which approach is best. It may be that CQL3 dynamic
columns is perfectly fine. But here's the thing, unless someone takes the
time to learn and study the subject thoroughly, it's a blind guess. The
point isn't to use Cassandra as a relational database, even if some people
are basically doing that. I share my experience in the hopes that others
can avoid my mistakes



> --
> Sylvain
>
>>
>>

Re: Re: Dynamic Columns

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Wed, Jan 21, 2015 at 6:19 PM, Peter Lin <wo...@gmail.com> wrote:

> the dynamic column can't be part of the primary key. The temporal entity
> key can be the default UUID or the user can choose the field in their
> object. Within our framework, we have concept of temporal links between one
> or more temporal entities. Poluting the primary key with the dynamic column
> wouldn't work.
>

Not totally sure I understand. Are you talking about the underlying storage
space used? If you are, we can discuss it (it's not too hard to remedy it
in CQL, I was mainly trying to illustrating my point, not pretending this
was a drop-in solution for your use case) but it's more of a performance
discussion, and I think we've somewhat quit the realm of "there's things
CQL3 doesn't support".


> Please excuse the confusing RDB comparison. My point is that Cassandra's
> dynamic column feature is the "unique" feature that makes it better than
> traditional RDB or newSql like VoltDB for building temporal databases. With
> databases that require static schema + alter table for managing schema
> evolution, it makes it harder and results in down time.
>

Here again you seem you imply that CQL doesn't support dynamic columns, or
has a somewhat inferior support, but that's just not true.


> One of the challenges of data management over time is evolving the data
> model and making queries simple. If the record is 5 years old, it probably
> has a difference schema than a record inserted this week. With temporal
> databases, every update is an insert, so it's a little bit more complex
> than just "use a blob". There's a whole level of complication with temporal
> data and CQL3 custom types isn't clear to me. I've read the CQL3
> documentation on the custom types several times and it is rather poor. It
> gives me the impression there's still work needed to get custom types in
> good shape.
>

I'm sorry but that's a bit of hand waving. Custom types (and by that I mean
user-provided AbstractType implementations) works in CQL *exactly* like in
thrift: they are not in a better or worse shape than in thrift. And while
the documentation on CQL3 is indeed poor on this part, so is the thrift
documentation on the same subject (besides, I don't think you're whole
point is about saying that documentation could be improved). Again, what
you can do in thrift, you can do in CQL.


> I consistently recommend new users learn and understand both Thrift and
> CQL.
>

I understand that you do this with the best of intentions and don't take it
the wrong way but it is my opinion that you are counterproductive by doing
so, and this for 2 reasons:
1) you don't only recommend users to learn both API, you justify that
advice by affirming that there is a whole family of important use cases
that thrift supports and CQL do not. Except that I pretend tat this
affirmation is technically incorrect, and so far I haven't seen much
example proving me wrong.
2) there is a wealth of evidence that trying to learn both thrift and CQL
confuses the hell out of new users. Which is btw not surprising, both API
presents the same concepts in seemingly different way (even though they do
are the same concepts) and even have conflicting vocabulary, so it's
obviously confusing when you try to learn those concepts in the first
place. Trying to learn CQL when you know thrift well is fine, and why not
learn thrift once you know and understand CQL well, but learning both is
imo a bad advice. It could maybe (maybe) be justified if what you say about
having whole family of use cases not being doable with CQL was true, but
it's not.

--
Sylvain


>
>
>
> On Wed, Jan 21, 2015 at 11:45 AM, Sylvain Lebresne <sy...@datastax.com>
> wrote:
>
>> On Wed, Jan 21, 2015 at 4:44 PM, Peter Lin <wo...@gmail.com> wrote:
>>
>>> I don't remember other people's examples in detail due to my shitty
>>> memory, so I'd rather not misquote.
>>>
>>
>> Fair enough, but maybe you shouldn't use "people's examples you don't
>> remenber" as argument then. Those examples might be wrong or outdated and
>> that kind of stuff creates confusion for everyone.
>>
>>
>>>
>>> In my case, I mix static and dynamic columns in a single column family
>>> with primitives and objects. The objects are temporal object graphs with a
>>> known type. Doing this type of stuff is basically transparent for me, since
>>> I'm using thrift and our data modeler generates helper classes. Our tooling
>>> seamlessly convert the bytes back to the target object. We have a few
>>> standard static columns related to temporal metadata. At any time, dynamic
>>> columns can be added and they can be primitives or objects.
>>>
>>
>> I don't see anything in that that cannot be done with CQL. You can mix
>> static and dynamic columns in CQL thanks to static columns. More precisely,
>> you can do what you're describing with a table looking a bit like this:
>>   CREATE TABLE t (
>>     key blob,
>>     static my_static_column_1 int,
>>     static my_static_column_2 float,
>>     static my_static_column_3 blob,
>>     ....,
>>     dynamic_column_name blob,
>>     dynamic_column_value blob,
>>     PRIMARY KEY (key, dynamic_column_name);
>>   )
>>
>> And your helper classes will serialize your objects as they probably do
>> today (if you use a custom comparator, you can do that too). And let it be
>> clear that I'm not pretending that doing it this way is tremendously
>> simpler than thrift. But I'm saying that 1) it's possible and 2) while it's
>> not meaningfully simpler than thriftMy , it's not really harder either (and
>> in fact, it's actually less verbose with CQL than with raw thrift).
>>
>>
>>>
>>> For the record, doing this kind of stuff in a relational database sucks
>>> horribly.
>>>
>>
>> I don't know what that has to do with CQL to be honest. If you're doing
>> relational with CQL you're doing it wrong. And please note that I'm not
>> saying CQL is the perfect API for modeling temporal data. But I don't get
>> how thrift, which is very crude API, is a much better API at that than CQL
>> (or, again, how it allows you to do things you can't with CQL).
>>
>> --
>> Sylvain
>>
>
>

Re: Re: Dynamic Columns

Posted by Eric Stevens <mi...@gmail.com>.
> are you really recommending I throw 4 years of work out and completely
rewrite code that works and has been tested?

Our codebase was about 3 years old, and we finished migrating it to CQL not
that long ago.  It can definitely be frustrating to have to touch stable
code to modernize it.  Our design allowed us to focus on one or two tables
at a time.  We were able to do drop-in replacements for each DAO that
presented the same outward interface (though the DAOs each had to be
rewritten wholesale).

Critically, our business logic did not need to change at all to support the
new paradigm, so we could have great confidence that the change is
minimally disruptive.  Even our DAO unit tests were mostly only updated to
preload test data in a different format.

Something I didn't anticipate was that our changelog for the project
included twice as many deletes as adds - our overall code complexity went
down, both in number of lines of code, and also as measured in terms of
cyclomatic complexity.  Mean time to feature completion has been reduced as
well (which we can measure quite directly since for a little while we're
also maintaining parallel development of a legacy version that uses Thrift
exclusively).

> Could I fit a square peg into a round hole? Yes, but does that make any
sense?

I get your point, though I've always struggled with this particular idiom.
It was actually a design philosophy in fine woodworking in early America,
used as a way to have an especially strong joint that used no fasteners.
Known for being popular with the Shakers who had a reputation for producing
the highest quality of items.

I'd suggest fasteners come in many shapes and sizes and techniques.
Sometimes it's a peg, or a screw, or a rivet, sometimes it's dovetails, and
sometimes its fingers.  You're definitely right, Thrift and CQL are
dramatically different shapes.  I'm *certain* there are situations where
one or the other makes it easier to reason about or solve a particular
problem.  A different interface is not necessarily better or worse.

There are some pretty compelling ways CQL beats Thrift.  Reduced
application complexity (as I've observed in our case) is a pretty
compelling one.  But also new features not necessarily needing updates to
existing client libraries is also pretty awesome.  You can also take
advantage of a much more consistent application layer interaction across
languages and drivers, so you can more easily engage in multilanguage
projects without the context switching that comes from remembering nearly
as many nuances of *this* driver over *that* driver, and so on.  The native
protocol backing CQL also has a significant parallelism and performance
gain over Thrift's interface.


On Thu, Jan 22, 2015 at 6:36 AM, Peter Lin <wo...@gmail.com> wrote:

> @jack thanks for taking time to respond. I agree I could totally redesign
> and rewrite it to fit in the newer CQL3 model, but are you really
> recommending I throw 4 years of work out and completely rewrite code that
> works and has been tested?
>
> Ignoring the practical aspects for now and exploring the topic a bit
> further. Since not everyone has spent 5+ years designing and building
> temporal databases, it's probably good to go over some fundamental theory
> at the risk of boring the hell out of everyone.
>
> 1. a temporal record ideally should have 1 unique key for the entire life.
> I've done other approaches in the past with composite keys in RDBMS and it
> sucks. Could it work? Yes, but I already know from first hand experience
> how much pain that causes when temporal records need to be moved around, or
> when you need to audit the data for law suits.
> 2. the life of a temporal record may have n versions and no two versions
> are guaranteed to be identical in structure and definitely not in content
> 3. the temporal metadata about each entity like version, previous version,
> branch, previous branch, create date, last transaction and version counter
> are required for each record. Those are the only required static columns
> and they are managed by the framework. User's aren't suppose to manually
> screw with those metadata columns, but obviously they could by going into
> cqlsh.
> 4. at any time, import and export of a single temporal record with all
> versions and metadata could occur, so optimal storage and design is
> critical. for example, if someone wants to copy or move 100,000 records
> from one cluster to another cluster and retain the history.
> 5. the user can query for all or any number of versions of a temporal
> record by version number or branch. For example, it's common to get version
> 10 and do a comparison against version 12. It's just like doing a diff in
> SVN, git, cvs, but for business data. Unlike text files, a diff on business
> records is a bit more complex, since it's an object graph.
> 6. versioning and branching of temporal records relies on business logic,
> which can be the stock algorithm or defined by the user
> 7. Saving and retrieving data has to be predictable and quick. This means
> ideally all versions are in the same row and on the same node. Pre CQL and
> composite keys, storing data in different rows meant it could be on
> different nodes. Thankfully with composite keys, Cassandra will use the
> first column as the partition key.
>
> In terms of adding dynamic_column_name as part of the composite key, that
> isn't ideal in my use case for several reasons.
>
> 1. a record might not have any dynamic columns at all. The user decides
> this. The only thing the framework requires is an unique key that doesn't
> collide. If the user chooses their own key instead of a UUID, the system
> checks for collision before saving a new record.
>
> 2. we use dynamic columns to provide projections, aka views of a temporal
> entity. This means we can extract fields nested deep in the graph and store
> it as a dynamic column to avoid reading the entire object. Unlike other
> kinds of use cases of dynamic column, the column name and value will vary.
> I know it's popular to use dynamic columns to store time series data like
> user click stream, but that has the same type.
>
> 3. we allow the user to index secondary columns, but on read we always use
> the value in the object. We also integrated solr to give us more advanced
> indexing features.
>
> 4. we provide an object API to make temporal queries easy. It's
> modeled/inspired by JPA/Hibernate. We could have invented another text
> query language or tried to use tsql2, but an object API feels more
> intuitive to me.
>
> Could I fit a square peg into a round hole? Yes, but does that make any
> sense? If I was building a whole new temporal database from scratch, I
> might do things different. I couldn't use CQL3 back in 2008/2009, so I
> couldn't have used it. Aside from all of that, an object API is more
> natural for temporal databases. The model really is an object graph and not
> separate database tables stitched together. Any change to any part of the
> record requires versioning it and handling it correctly. Having built
> temporal databases on RDBMS, using SQL meant building a general purpose
> object API to make things easier. This is due to the need to be database
> agnostic, so we couldn't use the object API that is available in some
> databases. Hopefully that helps provide context and details. I don't expect
> people to have a deep understanding of temporal database from my ramblings,
> given it took me over 8 years to learn all of this stuff.
>
>
> On Thu, Jan 22, 2015 at 12:51 AM, Jack Krupansky <jack.krupansky@gmail.com
> > wrote:
>
>> Peter,
>>
>> At least from your description, the proposed use of the clustering column
>> name seems at first blush to fully fit the bill. The point is not that the
>> resulting clustered primary key is used to reference an object, but that a
>> SELECT on the partition key references the entire object, which will be a
>> sequence of CQL3 rows in a partition, and then the clustering column key is
>> added when you wish to access that specific aspect of the object. What's
>> missing? Again, just store the partition key to reference the full object -
>> no pollution required!
>>
>> And please note that any number of clustering columns can be specified,
>> so more structured "dynamic columns" can be supported. For example, you
>> could have a timestamp as a separate clustering column to maintain temporal
>> state of the database. The partition key can also be structured from
>> multiple columns as a composite partition key as well.
>>
>> As far as all these static columns, consider them optional and merely an
>> optimization. If you wish to have a 100% opaque object model, you wouldn't
>> have any static columns and the only non-primary key column would be the
>> blob value field. Every object attribute would be specified using another
>> clustering column name and blob value. Presto, everything you need for a
>> pure, opaque, fully-generalized object management system - all with just
>> CQL3. Maybe we should include such an example in the doc and with the
>> project to more strongly emphasize this capability to fully model
>> arbitrarily complex object structures - including temporal structures.
>>
>> Anything else missing?
>>
>> As a general proposition, you can use the term "clustering column" in
>> CQL3 wherever you might have used "dynamic column" in Thrift. The point in
>> CQL3 is not to eliminate a useful feature, dynamic column, but to repackage
>> the feature to make a lot more sense for the vast majority of use cases.
>> Maybe there are some cases that doesn't exactly fit as well as desired, but
>> feel free to specifically identify such cases so that we can elaborate how
>> we think they are covered or at least covered well enough for most users.
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, Jan 21, 2015 at 12:19 PM, Peter Lin <wo...@gmail.com> wrote:
>>
>>>
>>> the example you provided does not work for for my use case.
>>>
>>>   CREATE TABLE t (
>>>     key blob,
>>>     static my_static_column_1 int,
>>>     static my_static_column_2 float,
>>>     static my_static_column_3 blob,
>>>     ....,
>>>     dynamic_column_name blob,
>>>     dynamic_column_value blob,
>>>     PRIMARY KEY (key, dynamic_column_name);
>>>   )
>>>
>>> the dynamic column can't be part of the primary key. The temporal entity
>>> key can be the default UUID or the user can choose the field in their
>>> object. Within our framework, we have concept of temporal links between one
>>> or more temporal entities. Poluting the primary key with the dynamic column
>>> wouldn't work.
>>>
>>> Please excuse the confusing RDB comparison. My point is that Cassandra's
>>> dynamic column feature is the "unique" feature that makes it better than
>>> traditional RDB or newSql like VoltDB for building temporal databases. With
>>> databases that require static schema + alter table for managing schema
>>> evolution, it makes it harder and results in down time.
>>>
>>> One of the challenges of data management over time is evolving the data
>>> model and making queries simple. If the record is 5 years old, it probably
>>> has a difference schema than a record inserted this week. With temporal
>>> databases, every update is an insert, so it's a little bit more complex
>>> than just "use a blob". There's a whole level of complication with temporal
>>> data and CQL3 custom types isn't clear to me. I've read the CQL3
>>> documentation on the custom types several times and it is rather poor. It
>>> gives me the impression there's still work needed to get custom types in
>>> good shape.
>>>
>>> With regard to examples others have told me, your advice is fair. A few
>>> minutes with google and some blogs should pop up. The reason I bring these
>>> things up isn't to put down CQL. It's because I care and want to help
>>> improve Cassandra by sharing my experience. I consistently recommend new
>>> users learn and understand both Thrift and CQL.
>>>
>>>
>>>
>>> On Wed, Jan 21, 2015 at 11:45 AM, Sylvain Lebresne <sylvain@datastax.com
>>> > wrote:
>>>
>>>> On Wed, Jan 21, 2015 at 4:44 PM, Peter Lin <wo...@gmail.com> wrote:
>>>>
>>>>> I don't remember other people's examples in detail due to my shitty
>>>>> memory, so I'd rather not misquote.
>>>>>
>>>>
>>>> Fair enough, but maybe you shouldn't use "people's examples you don't
>>>> remenber" as argument then. Those examples might be wrong or outdated and
>>>> that kind of stuff creates confusion for everyone.
>>>>
>>>>
>>>>>
>>>>> In my case, I mix static and dynamic columns in a single column family
>>>>> with primitives and objects. The objects are temporal object graphs with a
>>>>> known type. Doing this type of stuff is basically transparent for me, since
>>>>> I'm using thrift and our data modeler generates helper classes. Our tooling
>>>>> seamlessly convert the bytes back to the target object. We have a few
>>>>> standard static columns related to temporal metadata. At any time, dynamic
>>>>> columns can be added and they can be primitives or objects.
>>>>>
>>>>
>>>> I don't see anything in that that cannot be done with CQL. You can mix
>>>> static and dynamic columns in CQL thanks to static columns. More precisely,
>>>> you can do what you're describing with a table looking a bit like this:
>>>>   CREATE TABLE t (
>>>>     key blob,
>>>>     static my_static_column_1 int,
>>>>     static my_static_column_2 float,
>>>>     static my_static_column_3 blob,
>>>>     ....,
>>>>     dynamic_column_name blob,
>>>>     dynamic_column_value blob,
>>>>     PRIMARY KEY (key, dynamic_column_name);
>>>>   )
>>>>
>>>> And your helper classes will serialize your objects as they probably do
>>>> today (if you use a custom comparator, you can do that too). And let it be
>>>> clear that I'm not pretending that doing it this way is tremendously
>>>> simpler than thrift. But I'm saying that 1) it's possible and 2) while it's
>>>> not meaningfully simpler than thriftMy , it's not really harder either (and
>>>> in fact, it's actually less verbose with CQL than with raw thrift).
>>>>
>>>>
>>>>>
>>>>> For the record, doing this kind of stuff in a relational database
>>>>> sucks horribly.
>>>>>
>>>>
>>>> I don't know what that has to do with CQL to be honest. If you're doing
>>>> relational with CQL you're doing it wrong. And please note that I'm not
>>>> saying CQL is the perfect API for modeling temporal data. But I don't get
>>>> how thrift, which is very crude API, is a much better API at that than CQL
>>>> (or, again, how it allows you to do things you can't with CQL).
>>>>
>>>> --
>>>> Sylvain
>>>>
>>>
>>>
>>
>

Re: Re: Dynamic Columns

Posted by Sylvain Lebresne <sy...@datastax.com>.
> Where we differ that I feel the coverage for existing thrift use cases
isn't
> 100%. That may be right or wrong, but it is my impression.

Here's my problem: either CQL covers all existing thrift use cases or it
does
not (in which case the non supported use case should be pointed out). It's a
technical question, not one that is matter of "impression" or "feeling". I'm
fine with you saying that, in your personal opinion, some use cases feels
more
natural/more direct in Thrift: you're entitled to your opinions. But when
your
initial emails on this thread start with "the thing is, CQL only handles
some
types of dynamic column use cases", or say things like "I hope CQL
continues to
improve so that it supports 100% of the existing use cases", then I'm sorry
but
it doesn't sound like you're just expressing some personal preference. And
since, I'm claiming, those statements are false, I can't force you but I
would
really appreciate that you refrain from propagating such falsehood (unless
of
course you can actually substantiate them by actual facts) because it's
confusing, especially for new users.

--
Sylvain

Re: Re: Dynamic Columns

Posted by Peter Lin <wo...@gmail.com>.
@jack thanks for taking time to respond. I agree I could totally redesign
and rewrite it to fit in the newer CQL3 model, but are you really
recommending I throw 4 years of work out and completely rewrite code that
works and has been tested?

Ignoring the practical aspects for now and exploring the topic a bit
further. Since not everyone has spent 5+ years designing and building
temporal databases, it's probably good to go over some fundamental theory
at the risk of boring the hell out of everyone.

1. a temporal record ideally should have 1 unique key for the entire life.
I've done other approaches in the past with composite keys in RDBMS and it
sucks. Could it work? Yes, but I already know from first hand experience
how much pain that causes when temporal records need to be moved around, or
when you need to audit the data for law suits.
2. the life of a temporal record may have n versions and no two versions
are guaranteed to be identical in structure and definitely not in content
3. the temporal metadata about each entity like version, previous version,
branch, previous branch, create date, last transaction and version counter
are required for each record. Those are the only required static columns
and they are managed by the framework. User's aren't suppose to manually
screw with those metadata columns, but obviously they could by going into
cqlsh.
4. at any time, import and export of a single temporal record with all
versions and metadata could occur, so optimal storage and design is
critical. for example, if someone wants to copy or move 100,000 records
from one cluster to another cluster and retain the history.
5. the user can query for all or any number of versions of a temporal
record by version number or branch. For example, it's common to get version
10 and do a comparison against version 12. It's just like doing a diff in
SVN, git, cvs, but for business data. Unlike text files, a diff on business
records is a bit more complex, since it's an object graph.
6. versioning and branching of temporal records relies on business logic,
which can be the stock algorithm or defined by the user
7. Saving and retrieving data has to be predictable and quick. This means
ideally all versions are in the same row and on the same node. Pre CQL and
composite keys, storing data in different rows meant it could be on
different nodes. Thankfully with composite keys, Cassandra will use the
first column as the partition key.

In terms of adding dynamic_column_name as part of the composite key, that
isn't ideal in my use case for several reasons.

1. a record might not have any dynamic columns at all. The user decides
this. The only thing the framework requires is an unique key that doesn't
collide. If the user chooses their own key instead of a UUID, the system
checks for collision before saving a new record.

2. we use dynamic columns to provide projections, aka views of a temporal
entity. This means we can extract fields nested deep in the graph and store
it as a dynamic column to avoid reading the entire object. Unlike other
kinds of use cases of dynamic column, the column name and value will vary.
I know it's popular to use dynamic columns to store time series data like
user click stream, but that has the same type.

3. we allow the user to index secondary columns, but on read we always use
the value in the object. We also integrated solr to give us more advanced
indexing features.

4. we provide an object API to make temporal queries easy. It's
modeled/inspired by JPA/Hibernate. We could have invented another text
query language or tried to use tsql2, but an object API feels more
intuitive to me.

Could I fit a square peg into a round hole? Yes, but does that make any
sense? If I was building a whole new temporal database from scratch, I
might do things different. I couldn't use CQL3 back in 2008/2009, so I
couldn't have used it. Aside from all of that, an object API is more
natural for temporal databases. The model really is an object graph and not
separate database tables stitched together. Any change to any part of the
record requires versioning it and handling it correctly. Having built
temporal databases on RDBMS, using SQL meant building a general purpose
object API to make things easier. This is due to the need to be database
agnostic, so we couldn't use the object API that is available in some
databases. Hopefully that helps provide context and details. I don't expect
people to have a deep understanding of temporal database from my ramblings,
given it took me over 8 years to learn all of this stuff.


On Thu, Jan 22, 2015 at 12:51 AM, Jack Krupansky <ja...@gmail.com>
wrote:

> Peter,
>
> At least from your description, the proposed use of the clustering column
> name seems at first blush to fully fit the bill. The point is not that the
> resulting clustered primary key is used to reference an object, but that a
> SELECT on the partition key references the entire object, which will be a
> sequence of CQL3 rows in a partition, and then the clustering column key is
> added when you wish to access that specific aspect of the object. What's
> missing? Again, just store the partition key to reference the full object -
> no pollution required!
>
> And please note that any number of clustering columns can be specified, so
> more structured "dynamic columns" can be supported. For example, you could
> have a timestamp as a separate clustering column to maintain temporal state
> of the database. The partition key can also be structured from multiple
> columns as a composite partition key as well.
>
> As far as all these static columns, consider them optional and merely an
> optimization. If you wish to have a 100% opaque object model, you wouldn't
> have any static columns and the only non-primary key column would be the
> blob value field. Every object attribute would be specified using another
> clustering column name and blob value. Presto, everything you need for a
> pure, opaque, fully-generalized object management system - all with just
> CQL3. Maybe we should include such an example in the doc and with the
> project to more strongly emphasize this capability to fully model
> arbitrarily complex object structures - including temporal structures.
>
> Anything else missing?
>
> As a general proposition, you can use the term "clustering column" in CQL3
> wherever you might have used "dynamic column" in Thrift. The point in CQL3
> is not to eliminate a useful feature, dynamic column, but to repackage the
> feature to make a lot more sense for the vast majority of use cases. Maybe
> there are some cases that doesn't exactly fit as well as desired, but feel
> free to specifically identify such cases so that we can elaborate how we
> think they are covered or at least covered well enough for most users.
>
>
> -- Jack Krupansky
>
> On Wed, Jan 21, 2015 at 12:19 PM, Peter Lin <wo...@gmail.com> wrote:
>
>>
>> the example you provided does not work for for my use case.
>>
>>   CREATE TABLE t (
>>     key blob,
>>     static my_static_column_1 int,
>>     static my_static_column_2 float,
>>     static my_static_column_3 blob,
>>     ....,
>>     dynamic_column_name blob,
>>     dynamic_column_value blob,
>>     PRIMARY KEY (key, dynamic_column_name);
>>   )
>>
>> the dynamic column can't be part of the primary key. The temporal entity
>> key can be the default UUID or the user can choose the field in their
>> object. Within our framework, we have concept of temporal links between one
>> or more temporal entities. Poluting the primary key with the dynamic column
>> wouldn't work.
>>
>> Please excuse the confusing RDB comparison. My point is that Cassandra's
>> dynamic column feature is the "unique" feature that makes it better than
>> traditional RDB or newSql like VoltDB for building temporal databases. With
>> databases that require static schema + alter table for managing schema
>> evolution, it makes it harder and results in down time.
>>
>> One of the challenges of data management over time is evolving the data
>> model and making queries simple. If the record is 5 years old, it probably
>> has a difference schema than a record inserted this week. With temporal
>> databases, every update is an insert, so it's a little bit more complex
>> than just "use a blob". There's a whole level of complication with temporal
>> data and CQL3 custom types isn't clear to me. I've read the CQL3
>> documentation on the custom types several times and it is rather poor. It
>> gives me the impression there's still work needed to get custom types in
>> good shape.
>>
>> With regard to examples others have told me, your advice is fair. A few
>> minutes with google and some blogs should pop up. The reason I bring these
>> things up isn't to put down CQL. It's because I care and want to help
>> improve Cassandra by sharing my experience. I consistently recommend new
>> users learn and understand both Thrift and CQL.
>>
>>
>>
>> On Wed, Jan 21, 2015 at 11:45 AM, Sylvain Lebresne <sy...@datastax.com>
>> wrote:
>>
>>> On Wed, Jan 21, 2015 at 4:44 PM, Peter Lin <wo...@gmail.com> wrote:
>>>
>>>> I don't remember other people's examples in detail due to my shitty
>>>> memory, so I'd rather not misquote.
>>>>
>>>
>>> Fair enough, but maybe you shouldn't use "people's examples you don't
>>> remenber" as argument then. Those examples might be wrong or outdated and
>>> that kind of stuff creates confusion for everyone.
>>>
>>>
>>>>
>>>> In my case, I mix static and dynamic columns in a single column family
>>>> with primitives and objects. The objects are temporal object graphs with a
>>>> known type. Doing this type of stuff is basically transparent for me, since
>>>> I'm using thrift and our data modeler generates helper classes. Our tooling
>>>> seamlessly convert the bytes back to the target object. We have a few
>>>> standard static columns related to temporal metadata. At any time, dynamic
>>>> columns can be added and they can be primitives or objects.
>>>>
>>>
>>> I don't see anything in that that cannot be done with CQL. You can mix
>>> static and dynamic columns in CQL thanks to static columns. More precisely,
>>> you can do what you're describing with a table looking a bit like this:
>>>   CREATE TABLE t (
>>>     key blob,
>>>     static my_static_column_1 int,
>>>     static my_static_column_2 float,
>>>     static my_static_column_3 blob,
>>>     ....,
>>>     dynamic_column_name blob,
>>>     dynamic_column_value blob,
>>>     PRIMARY KEY (key, dynamic_column_name);
>>>   )
>>>
>>> And your helper classes will serialize your objects as they probably do
>>> today (if you use a custom comparator, you can do that too). And let it be
>>> clear that I'm not pretending that doing it this way is tremendously
>>> simpler than thrift. But I'm saying that 1) it's possible and 2) while it's
>>> not meaningfully simpler than thriftMy , it's not really harder either (and
>>> in fact, it's actually less verbose with CQL than with raw thrift).
>>>
>>>
>>>>
>>>> For the record, doing this kind of stuff in a relational database sucks
>>>> horribly.
>>>>
>>>
>>> I don't know what that has to do with CQL to be honest. If you're doing
>>> relational with CQL you're doing it wrong. And please note that I'm not
>>> saying CQL is the perfect API for modeling temporal data. But I don't get
>>> how thrift, which is very crude API, is a much better API at that than CQL
>>> (or, again, how it allows you to do things you can't with CQL).
>>>
>>> --
>>> Sylvain
>>>
>>
>>
>

Re: Re: Dynamic Columns

Posted by Jack Krupansky <ja...@gmail.com>.
Peter,

At least from your description, the proposed use of the clustering column
name seems at first blush to fully fit the bill. The point is not that the
resulting clustered primary key is used to reference an object, but that a
SELECT on the partition key references the entire object, which will be a
sequence of CQL3 rows in a partition, and then the clustering column key is
added when you wish to access that specific aspect of the object. What's
missing? Again, just store the partition key to reference the full object -
no pollution required!

And please note that any number of clustering columns can be specified, so
more structured "dynamic columns" can be supported. For example, you could
have a timestamp as a separate clustering column to maintain temporal state
of the database. The partition key can also be structured from multiple
columns as a composite partition key as well.

As far as all these static columns, consider them optional and merely an
optimization. If you wish to have a 100% opaque object model, you wouldn't
have any static columns and the only non-primary key column would be the
blob value field. Every object attribute would be specified using another
clustering column name and blob value. Presto, everything you need for a
pure, opaque, fully-generalized object management system - all with just
CQL3. Maybe we should include such an example in the doc and with the
project to more strongly emphasize this capability to fully model
arbitrarily complex object structures - including temporal structures.

Anything else missing?

As a general proposition, you can use the term "clustering column" in CQL3
wherever you might have used "dynamic column" in Thrift. The point in CQL3
is not to eliminate a useful feature, dynamic column, but to repackage the
feature to make a lot more sense for the vast majority of use cases. Maybe
there are some cases that doesn't exactly fit as well as desired, but feel
free to specifically identify such cases so that we can elaborate how we
think they are covered or at least covered well enough for most users.


-- Jack Krupansky

On Wed, Jan 21, 2015 at 12:19 PM, Peter Lin <wo...@gmail.com> wrote:

>
> the example you provided does not work for for my use case.
>
>   CREATE TABLE t (
>     key blob,
>     static my_static_column_1 int,
>     static my_static_column_2 float,
>     static my_static_column_3 blob,
>     ....,
>     dynamic_column_name blob,
>     dynamic_column_value blob,
>     PRIMARY KEY (key, dynamic_column_name);
>   )
>
> the dynamic column can't be part of the primary key. The temporal entity
> key can be the default UUID or the user can choose the field in their
> object. Within our framework, we have concept of temporal links between one
> or more temporal entities. Poluting the primary key with the dynamic column
> wouldn't work.
>
> Please excuse the confusing RDB comparison. My point is that Cassandra's
> dynamic column feature is the "unique" feature that makes it better than
> traditional RDB or newSql like VoltDB for building temporal databases. With
> databases that require static schema + alter table for managing schema
> evolution, it makes it harder and results in down time.
>
> One of the challenges of data management over time is evolving the data
> model and making queries simple. If the record is 5 years old, it probably
> has a difference schema than a record inserted this week. With temporal
> databases, every update is an insert, so it's a little bit more complex
> than just "use a blob". There's a whole level of complication with temporal
> data and CQL3 custom types isn't clear to me. I've read the CQL3
> documentation on the custom types several times and it is rather poor. It
> gives me the impression there's still work needed to get custom types in
> good shape.
>
> With regard to examples others have told me, your advice is fair. A few
> minutes with google and some blogs should pop up. The reason I bring these
> things up isn't to put down CQL. It's because I care and want to help
> improve Cassandra by sharing my experience. I consistently recommend new
> users learn and understand both Thrift and CQL.
>
>
>
> On Wed, Jan 21, 2015 at 11:45 AM, Sylvain Lebresne <sy...@datastax.com>
> wrote:
>
>> On Wed, Jan 21, 2015 at 4:44 PM, Peter Lin <wo...@gmail.com> wrote:
>>
>>> I don't remember other people's examples in detail due to my shitty
>>> memory, so I'd rather not misquote.
>>>
>>
>> Fair enough, but maybe you shouldn't use "people's examples you don't
>> remenber" as argument then. Those examples might be wrong or outdated and
>> that kind of stuff creates confusion for everyone.
>>
>>
>>>
>>> In my case, I mix static and dynamic columns in a single column family
>>> with primitives and objects. The objects are temporal object graphs with a
>>> known type. Doing this type of stuff is basically transparent for me, since
>>> I'm using thrift and our data modeler generates helper classes. Our tooling
>>> seamlessly convert the bytes back to the target object. We have a few
>>> standard static columns related to temporal metadata. At any time, dynamic
>>> columns can be added and they can be primitives or objects.
>>>
>>
>> I don't see anything in that that cannot be done with CQL. You can mix
>> static and dynamic columns in CQL thanks to static columns. More precisely,
>> you can do what you're describing with a table looking a bit like this:
>>   CREATE TABLE t (
>>     key blob,
>>     static my_static_column_1 int,
>>     static my_static_column_2 float,
>>     static my_static_column_3 blob,
>>     ....,
>>     dynamic_column_name blob,
>>     dynamic_column_value blob,
>>     PRIMARY KEY (key, dynamic_column_name);
>>   )
>>
>> And your helper classes will serialize your objects as they probably do
>> today (if you use a custom comparator, you can do that too). And let it be
>> clear that I'm not pretending that doing it this way is tremendously
>> simpler than thrift. But I'm saying that 1) it's possible and 2) while it's
>> not meaningfully simpler than thriftMy , it's not really harder either (and
>> in fact, it's actually less verbose with CQL than with raw thrift).
>>
>>
>>>
>>> For the record, doing this kind of stuff in a relational database sucks
>>> horribly.
>>>
>>
>> I don't know what that has to do with CQL to be honest. If you're doing
>> relational with CQL you're doing it wrong. And please note that I'm not
>> saying CQL is the perfect API for modeling temporal data. But I don't get
>> how thrift, which is very crude API, is a much better API at that than CQL
>> (or, again, how it allows you to do things you can't with CQL).
>>
>> --
>> Sylvain
>>
>
>

Re: Re: Dynamic Columns

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Jan 21, 2015 at 9:19 AM, Peter Lin <wo...@gmail.com> wrote:

>
> I consistently recommend new users learn and understand both Thrift and
> CQL.
>

FWIW, I consider this a disservice to new users. New users should use CQL,
and not deploy against a deprecated-in-all-but-name API. Understanding
non-CQL *storage* might be valuable, understanding the Thrift interface to
storage is anti-valuable.

Despite the dissembling public statements regarding Thrift "not going
anywhere" it is obvious to me that no other databases exist with two
non-pluggable and incompatible APIs for a reason. The pain of maintaining
these two APIs will eventually become not worth the backwards
compatibility. At this time it will be deprecated and then shortly
thereafter removed; I expect this to happen at latest by EOY 2018. [1]

=Rob
[1] If anyone strongly disagrees, I am taking $20 cash bets, with any
proceeds donated to the Apache Foundation.

Re: Re: Dynamic Columns

Posted by Peter Lin <wo...@gmail.com>.
the example you provided does not work for for my use case.

  CREATE TABLE t (
    key blob,
    static my_static_column_1 int,
    static my_static_column_2 float,
    static my_static_column_3 blob,
    ....,
    dynamic_column_name blob,
    dynamic_column_value blob,
    PRIMARY KEY (key, dynamic_column_name);
  )

the dynamic column can't be part of the primary key. The temporal entity
key can be the default UUID or the user can choose the field in their
object. Within our framework, we have concept of temporal links between one
or more temporal entities. Poluting the primary key with the dynamic column
wouldn't work.

Please excuse the confusing RDB comparison. My point is that Cassandra's
dynamic column feature is the "unique" feature that makes it better than
traditional RDB or newSql like VoltDB for building temporal databases. With
databases that require static schema + alter table for managing schema
evolution, it makes it harder and results in down time.

One of the challenges of data management over time is evolving the data
model and making queries simple. If the record is 5 years old, it probably
has a difference schema than a record inserted this week. With temporal
databases, every update is an insert, so it's a little bit more complex
than just "use a blob". There's a whole level of complication with temporal
data and CQL3 custom types isn't clear to me. I've read the CQL3
documentation on the custom types several times and it is rather poor. It
gives me the impression there's still work needed to get custom types in
good shape.

With regard to examples others have told me, your advice is fair. A few
minutes with google and some blogs should pop up. The reason I bring these
things up isn't to put down CQL. It's because I care and want to help
improve Cassandra by sharing my experience. I consistently recommend new
users learn and understand both Thrift and CQL.



On Wed, Jan 21, 2015 at 11:45 AM, Sylvain Lebresne <sy...@datastax.com>
wrote:

> On Wed, Jan 21, 2015 at 4:44 PM, Peter Lin <wo...@gmail.com> wrote:
>
>> I don't remember other people's examples in detail due to my shitty
>> memory, so I'd rather not misquote.
>>
>
> Fair enough, but maybe you shouldn't use "people's examples you don't
> remenber" as argument then. Those examples might be wrong or outdated and
> that kind of stuff creates confusion for everyone.
>
>
>>
>> In my case, I mix static and dynamic columns in a single column family
>> with primitives and objects. The objects are temporal object graphs with a
>> known type. Doing this type of stuff is basically transparent for me, since
>> I'm using thrift and our data modeler generates helper classes. Our tooling
>> seamlessly convert the bytes back to the target object. We have a few
>> standard static columns related to temporal metadata. At any time, dynamic
>> columns can be added and they can be primitives or objects.
>>
>
> I don't see anything in that that cannot be done with CQL. You can mix
> static and dynamic columns in CQL thanks to static columns. More precisely,
> you can do what you're describing with a table looking a bit like this:
>   CREATE TABLE t (
>     key blob,
>     static my_static_column_1 int,
>     static my_static_column_2 float,
>     static my_static_column_3 blob,
>     ....,
>     dynamic_column_name blob,
>     dynamic_column_value blob,
>     PRIMARY KEY (key, dynamic_column_name);
>   )
>
> And your helper classes will serialize your objects as they probably do
> today (if you use a custom comparator, you can do that too). And let it be
> clear that I'm not pretending that doing it this way is tremendously
> simpler than thrift. But I'm saying that 1) it's possible and 2) while it's
> not meaningfully simpler than thriftMy , it's not really harder either (and
> in fact, it's actually less verbose with CQL than with raw thrift).
>
>
>>
>> For the record, doing this kind of stuff in a relational database sucks
>> horribly.
>>
>
> I don't know what that has to do with CQL to be honest. If you're doing
> relational with CQL you're doing it wrong. And please note that I'm not
> saying CQL is the perfect API for modeling temporal data. But I don't get
> how thrift, which is very crude API, is a much better API at that than CQL
> (or, again, how it allows you to do things you can't with CQL).
>
> --
> Sylvain
>

Re: Re: Dynamic Columns

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Wed, Jan 21, 2015 at 4:44 PM, Peter Lin <wo...@gmail.com> wrote:

> I don't remember other people's examples in detail due to my shitty
> memory, so I'd rather not misquote.
>

Fair enough, but maybe you shouldn't use "people's examples you don't
remenber" as argument then. Those examples might be wrong or outdated and
that kind of stuff creates confusion for everyone.


>
> In my case, I mix static and dynamic columns in a single column family
> with primitives and objects. The objects are temporal object graphs with a
> known type. Doing this type of stuff is basically transparent for me, since
> I'm using thrift and our data modeler generates helper classes. Our tooling
> seamlessly convert the bytes back to the target object. We have a few
> standard static columns related to temporal metadata. At any time, dynamic
> columns can be added and they can be primitives or objects.
>

I don't see anything in that that cannot be done with CQL. You can mix
static and dynamic columns in CQL thanks to static columns. More precisely,
you can do what you're describing with a table looking a bit like this:
  CREATE TABLE t (
    key blob,
    static my_static_column_1 int,
    static my_static_column_2 float,
    static my_static_column_3 blob,
    ....,
    dynamic_column_name blob,
    dynamic_column_value blob,
    PRIMARY KEY (key, dynamic_column_name);
  )

And your helper classes will serialize your objects as they probably do
today (if you use a custom comparator, you can do that too). And let it be
clear that I'm not pretending that doing it this way is tremendously
simpler than thrift. But I'm saying that 1) it's possible and 2) while it's
not meaningfully simpler than thriftMy , it's not really harder either (and
in fact, it's actually less verbose with CQL than with raw thrift).


>
> For the record, doing this kind of stuff in a relational database sucks
> horribly.
>

I don't know what that has to do with CQL to be honest. If you're doing
relational with CQL you're doing it wrong. And please note that I'm not
saying CQL is the perfect API for modeling temporal data. But I don't get
how thrift, which is very crude API, is a much better API at that than CQL
(or, again, how it allows you to do things you can't with CQL).

--
Sylvain

Re: Re: Dynamic Columns

Posted by Peter Lin <wo...@gmail.com>.
I don't remember other people's examples in detail due to my shitty memory,
so I'd rather not misquote.

In my case, I mix static and dynamic columns in a single column family with
primitives and objects. The objects are temporal object graphs with a known
type. Doing this type of stuff is basically transparent for me, since I'm
using thrift and our data modeler generates helper classes. Our tooling
seamlessly convert the bytes back to the target object. We have a few
standard static columns related to temporal metadata. At any time, dynamic
columns can be added and they can be primitives or objects. The framework
we built uses CQL for basic queries and views the user defines.

We model the schema in a GUI modeler and the framework provides a query API
to access a specific version or versions of any record. The design borrows
heavily from temporal logic and active databases.

For the record, doing this kind of stuff in a relational database sucks
horribly. The reason I chose to build a temporal database on Cassandra is
because I've done it on oracle/sqlserver in the past. Last year I submitted
a talk about our temporal database for the datastax conference, but it was
rejected since there were too many submissions. I know spotify also built a
temporal database on Cassandra and they gave a talk on what they did.

peter


On Wed, Jan 21, 2015 at 10:13 AM, Sylvain Lebresne <sy...@datastax.com>
wrote:

>
> I've chatted with several long time users of Cassandra and there's things
>> CQL3 doesn't support.
>>
>
> Would you care to elaborate then? Maybe a simple example of something (or
> multiple things since you used plural) in thrift that cannot be supported
> in CQL?
> And please note that I'm *not* saying that all existing thrift table can
> be seemlessly used from CQL: there is indeed a few cases for which that's
> not the case. But that does not mean those cases cannot easily be in CQL
> from scratch.
>

Re: Re: Dynamic Columns

Posted by Sylvain Lebresne <sy...@datastax.com>.
> I've chatted with several long time users of Cassandra and there's things
> CQL3 doesn't support.
>

Would you care to elaborate then? Maybe a simple example of something (or
multiple things since you used plural) in thrift that cannot be supported
in CQL?
And please note that I'm *not* saying that all existing thrift table can be
seemlessly used from CQL: there is indeed a few cases for which that's not
the case. But that does not mean those cases cannot easily be in CQL from
scratch.

Re: Re: Dynamic Columns

Posted by Peter Lin <wo...@gmail.com>.
I've studied the source code and I don't believe that statement is true.
I've chatted with several long time users of Cassandra and there's things
CQL3 doesn't support.

Like I've said before. Thrift and CQL3 compliment each other. I totally
understand some committers don't want the overhead due to time and resource
limitations. On more than one occassion, people have offered to help and
work on thrift, but were rejected. There's logs in jira.

For the record, it's great that CQL was created to make life easier for new
users. But here's the thing that annoys me. There's users that just want to
save and query data, but there's people out there like me that are building
tools for Cassandra. For tool builders, having object API like thrift is
invaluable. If we look at relational databases, we see many of them have 2
separate API for that reason. Microsoft SqlServer has SQL and object API.
Having both makes it easier to build tools. It's a shame to ignore all the
lessons RDBMS can teach us and suffer NIH syndrome. I've built several data
modeling tools over the years including ORM's.

We built our own data modeling tool for the temporal database I built on
Cassandra, so this isn't just some hypothetical complaint. This is from
many years of first hand experience. I understand my needs often don't and
won't line up with what's in Cassandra's roadmap. But that's the great
thing about open source. Should thrift go away permanently I'll just fork
Cassandra and do my own thing.


On Wed, Jan 21, 2015 at 8:53 AM, Sylvain Lebresne <sy...@datastax.com>
wrote:

> On Wed, Jan 21, 2015 at 3:46 AM, Peter Lin <wo...@gmail.com> wrote:
>
>>
>>  I don't understand why people [...] pretend it supports 100% of the use
>> cases.
>>
>
> Have you consider the possibly that it's actually true and you're just
> wrong by lack of knowledge?
>
> --
> Sylvain
>

Re: Re: Dynamic Columns

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Wed, Jan 21, 2015 at 3:46 AM, Peter Lin <wo...@gmail.com> wrote:

>
>  I don't understand why people [...] pretend it supports 100% of the use
> cases.
>

Have you consider the possibly that it's actually true and you're just
wrong by lack of knowledge?

--
Sylvain

Re: Re: Dynamic Columns

Posted by Jonathan Lacefield <jl...@datastax.com>.
Hello,

  Peter highlighted the tradeoff between Thrift and CQL3 nicely in this
case, i.e. requiring a different design approach for this solution.
Collections do not sound like a good fit for your current challenge, but is
there a different way to design/solve your challenge using CQL techniques?

  It is recommended to leverage CQL for new projects as this is the
direction that Cassandra is heading and where the majority of effort is
being applied from a development perspective.

  Sounds like you have a decision to make.  Leverage Thrift and the Dynamic
Column approach to solving this problem.  Or, rethink the design approach
and leverage CQL.

  Please let the mailing list know the direction you choose.

Jonathan

[image: datastax_logo.png]

Jonathan Lacefield

Solution Architect | (404) 822 3487 | jlacefield@datastax.com

[image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax> <https://github.com/datastax/>

On Tue, Jan 20, 2015 at 9:46 PM, Peter Lin <wo...@gmail.com> wrote:

>
> the thing is, CQL only handles some types of dynamic column use cases.
> There's plenty of examples on datastax.com that shows how to do CQL style
> dynamic columns.
>
> based on what was described by Chetan, I don't feel CQL3 is a perfect fit
> for what he wants to do. To use CQL3, he'd have to change his approach.
>
> In my temporal database, I use both Thrift and CQL. They compliment each
> other very nice. I don't understand why people have to put down Thrift or
> pretend it supports 100% of the use cases. Lots of people who started using
> Cassandra pre CQL and had no problems using thrift. Yes you have to
> understand more and the learning curve is steeper, but taking time to learn
> the internals of cassandra is a good thing.
>
> Using CQL3 lists or maps, it would force the query to load the enter
> collection, but that is by design. To get the full power of the old style
> of dynamic columns, thrift is a better fit. I hope CQL continues to improve
> so that it supports 100% of the existing use cases.
>
>
>
> On Tue, Jan 20, 2015 at 8:50 PM, Xu Zhongxing <xu...@163.com>
> wrote:
>
>> I approximate dynamic columns by data_key and data_value columns.
>> Is there a better way to get dynamic columns in CQL 3?
>>
>> At 2015-01-21 09:41:02, "Peter Lin" <wo...@gmail.com> wrote:
>>
>>
>> I think that table example misses the point of chetan's functional
>> requirement. he actually needs dynamic columns.
>>
>> On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing <xu...@163.com>
>> wrote:
>>
>>> Maybe this is the closest thing to "dynamic columns" in CQL 3.
>>>
>>> create table reivew (
>>>     product_id bigint,
>>>     created_at timestamp,
>>>     data_key text,
>>>     data_tvalue text,
>>>     data_ivalue int,
>>>     primary key ((priduct_id, created_at), data_key)
>>> );
>>>
>>> data_tvalue and data_ivalue is optional.
>>>
>>> At 2015-01-21 04:44:07, "chetan verma" <ch...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Adding to previous mail. For example: We have a column family named
>>> review (with some arbitrary data in map).
>>>
>>> CREATE TABLE review(
>>> product_id bigint,
>>> created_at timestamp,
>>> data_int map<text, int>,
>>> data_text map<text, text>,
>>> PRIMARY KEY (product_id, created_at)
>>> );
>>>
>>> Assume that these 2 maps I use to store arbitrary data (i.e. data_int
>>> and data_text for int and text values)
>>> when we see output on cassandra-cli, it looks like in a partition as :
>>> <clustering_key>:data_int:map_key as column name and value as map value.
>>> suppose I need to get this value, I couldn't do that with CQL3 but in
>>> thrift its possible. Any Solution?
>>>
>>> On Wed, Jan 21, 2015 at 1:06 AM, chetan verma <ch...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Most of the time I will  be querying on product_id and created_at, but
>>>> for analytic I need to query almost on all column.
>>>> Multiple collections ideas is good but the only is cassandra reads a
>>>> collection entirely, what if I need a slice of it, I mean
>>>> columns for certain keys which is possible with thrift. Please suggest.
>>>>
>>>> On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield <
>>>> jlacefield@datastax.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> There are probably lots of options to this challenge.  The more
>>>>> details around your use case that you can provide, the easier it will be
>>>>> for this group to offer advice.
>>>>>
>>>>> A few follow-up questions:
>>>>>   - How will you query this data?
>>>>>   - Do your queries require filtering on specific columns other than
>>>>> product_id and created_at, i.e. the dynamic columns?
>>>>>
>>>>> Depending on the answers to these questions, you have several options,
>>>>> of which here are a few:
>>>>>
>>>>>    - Cassandra efficiently stores sparse data, so you could create
>>>>>    columns and not populate them, without much of a penalty
>>>>>    - Could use a clustering column to store a columns type and
>>>>>    another col (potentially clustering) to store the value
>>>>>       - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text,
>>>>>       col4...n, PRIMARY KEY (col1, attname, attvalue));
>>>>>       - where attname stores the name of the attribute/column and
>>>>>       attvalue stores the value of that attribute
>>>>>       - have seen users use this model and create a "main" attribute
>>>>>       row within a partition that stores the values associated with col4...n
>>>>>    - Could store multiple collections
>>>>>    - Others probably have ideas as well
>>>>>
>>>>> You may want to look in the archives for a similar discussion topic.
>>>>> Believe this item was asked a few months ago as well.
>>>>>
>>>>> [image: datastax_logo.png]
>>>>>
>>>>> Jonathan Lacefield
>>>>>
>>>>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>>>>
>>>>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>>> <https://twitter.com/datastax> [image: g+.png]
>>>>> <https://plus.google.com/+Datastax/about>
>>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>>
>>>>> On Tue, Jan 20, 2015 at 1:40 PM, chetan verma <chetanverma82@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am creating a review system. for instance lets assume following are
>>>>>> the attibutes of system:
>>>>>>
>>>>>> Review{
>>>>>> id bigint,
>>>>>> product_id bigint,
>>>>>> created_at timestamp,
>>>>>> summary text,
>>>>>> description text,
>>>>>> pros set<text>,
>>>>>> cons set<text>,
>>>>>> feature_rating map<text, int>
>>>>>> etc....
>>>>>> }
>>>>>> I created partition key as product_id (so that all the reviews for a
>>>>>> given product will reside on same node)
>>>>>> and clustering key as created_at and id (Desc) so that  reviews will
>>>>>> be sorted by time.
>>>>>>
>>>>>> I can have more column and that requirement I want to fulfil by
>>>>>> dynamic columns but there are limitations to it explained above.
>>>>>> Could you please let me know the best way.
>>>>>>
>>>>>> On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <
>>>>>> jlacefield@datastax.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>>   Have you looked at solving this challenge with clustering
>>>>>>> columns?  Also, please describe the problem set details for more specific
>>>>>>> advice from this group.
>>>>>>>
>>>>>>>   Starting new projects on Thrift isn't the recommended approach.
>>>>>>>
>>>>>>> Jonathan
>>>>>>>
>>>>>>> [image: datastax_logo.png]
>>>>>>>
>>>>>>> Jonathan Lacefield
>>>>>>>
>>>>>>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>>>>>>
>>>>>>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>>>>>>> facebook.png] <https://www.facebook.com/datastax> [image:
>>>>>>> twitter.png] <https://twitter.com/datastax> [image: g+.png]
>>>>>>> <https://plus.google.com/+Datastax/about>
>>>>>>> <http://feeds.feedburner.com/datastax>
>>>>>>> <https://github.com/datastax/>
>>>>>>>
>>>>>>> On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <
>>>>>>> chetanverma82@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am starting a new project with cassandra as database.
>>>>>>>> I have unstructured data so I need dynamic columns,
>>>>>>>> though in CQL3 we can achive this via Collections but there are
>>>>>>>> some downsides to it.
>>>>>>>> 1. Collections are used to store small amount of data.
>>>>>>>> 2. The maximum size of an item in a collection is 64K.
>>>>>>>> 3. Cassandra reads a collection in its entirety.
>>>>>>>> 4. Restrictions on number of items in collections is 64,000
>>>>>>>>
>>>>>>>> And no support to get single column by map key, which is possible
>>>>>>>> via cassandra cli.
>>>>>>>> Please suggest whether I should use CQL3 or Thrift and which driver
>>>>>>>> is best.
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Regards,*
>>>>>>>> *Chetan Verma*
>>>>>>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Regards,*
>>>>>> *Chetan Verma*
>>>>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Regards,*
>>>> *Chetan Verma*
>>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>>
>>>
>>>
>>>
>>> --
>>> *Regards,*
>>> *Chetan Verma*
>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>
>>>
>>
>

Re: Re: Dynamic Columns

Posted by Peter Lin <wo...@gmail.com>.
the thing is, CQL only handles some types of dynamic column use cases.
There's plenty of examples on datastax.com that shows how to do CQL style
dynamic columns.

based on what was described by Chetan, I don't feel CQL3 is a perfect fit
for what he wants to do. To use CQL3, he'd have to change his approach.

In my temporal database, I use both Thrift and CQL. They compliment each
other very nice. I don't understand why people have to put down Thrift or
pretend it supports 100% of the use cases. Lots of people who started using
Cassandra pre CQL and had no problems using thrift. Yes you have to
understand more and the learning curve is steeper, but taking time to learn
the internals of cassandra is a good thing.

Using CQL3 lists or maps, it would force the query to load the enter
collection, but that is by design. To get the full power of the old style
of dynamic columns, thrift is a better fit. I hope CQL continues to improve
so that it supports 100% of the existing use cases.



On Tue, Jan 20, 2015 at 8:50 PM, Xu Zhongxing <xu...@163.com> wrote:

> I approximate dynamic columns by data_key and data_value columns.
> Is there a better way to get dynamic columns in CQL 3?
>
> At 2015-01-21 09:41:02, "Peter Lin" <wo...@gmail.com> wrote:
>
>
> I think that table example misses the point of chetan's functional
> requirement. he actually needs dynamic columns.
>
> On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing <xu...@163.com>
> wrote:
>
>> Maybe this is the closest thing to "dynamic columns" in CQL 3.
>>
>> create table reivew (
>>     product_id bigint,
>>     created_at timestamp,
>>     data_key text,
>>     data_tvalue text,
>>     data_ivalue int,
>>     primary key ((priduct_id, created_at), data_key)
>> );
>>
>> data_tvalue and data_ivalue is optional.
>>
>> At 2015-01-21 04:44:07, "chetan verma" <ch...@gmail.com> wrote:
>>
>> Hi,
>>
>> Adding to previous mail. For example: We have a column family named
>> review (with some arbitrary data in map).
>>
>> CREATE TABLE review(
>> product_id bigint,
>> created_at timestamp,
>> data_int map<text, int>,
>> data_text map<text, text>,
>> PRIMARY KEY (product_id, created_at)
>> );
>>
>> Assume that these 2 maps I use to store arbitrary data (i.e. data_int and
>> data_text for int and text values)
>> when we see output on cassandra-cli, it looks like in a partition as :
>> <clustering_key>:data_int:map_key as column name and value as map value.
>> suppose I need to get this value, I couldn't do that with CQL3 but in
>> thrift its possible. Any Solution?
>>
>> On Wed, Jan 21, 2015 at 1:06 AM, chetan verma <ch...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Most of the time I will  be querying on product_id and created_at, but
>>> for analytic I need to query almost on all column.
>>> Multiple collections ideas is good but the only is cassandra reads a
>>> collection entirely, what if I need a slice of it, I mean
>>> columns for certain keys which is possible with thrift. Please suggest.
>>>
>>> On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield <
>>> jlacefield@datastax.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> There are probably lots of options to this challenge.  The more details
>>>> around your use case that you can provide, the easier it will be for this
>>>> group to offer advice.
>>>>
>>>> A few follow-up questions:
>>>>   - How will you query this data?
>>>>   - Do your queries require filtering on specific columns other than
>>>> product_id and created_at, i.e. the dynamic columns?
>>>>
>>>> Depending on the answers to these questions, you have several options,
>>>> of which here are a few:
>>>>
>>>>    - Cassandra efficiently stores sparse data, so you could create
>>>>    columns and not populate them, without much of a penalty
>>>>    - Could use a clustering column to store a columns type and another
>>>>    col (potentially clustering) to store the value
>>>>       - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text,
>>>>       col4...n, PRIMARY KEY (col1, attname, attvalue));
>>>>       - where attname stores the name of the attribute/column and
>>>>       attvalue stores the value of that attribute
>>>>       - have seen users use this model and create a "main" attribute
>>>>       row within a partition that stores the values associated with col4...n
>>>>    - Could store multiple collections
>>>>    - Others probably have ideas as well
>>>>
>>>> You may want to look in the archives for a similar discussion topic.
>>>> Believe this item was asked a few months ago as well.
>>>>
>>>> [image: datastax_logo.png]
>>>>
>>>> Jonathan Lacefield
>>>>
>>>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>>>
>>>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>> <https://twitter.com/datastax> [image: g+.png]
>>>> <https://plus.google.com/+Datastax/about>
>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>
>>>> On Tue, Jan 20, 2015 at 1:40 PM, chetan verma <ch...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am creating a review system. for instance lets assume following are
>>>>> the attibutes of system:
>>>>>
>>>>> Review{
>>>>> id bigint,
>>>>> product_id bigint,
>>>>> created_at timestamp,
>>>>> summary text,
>>>>> description text,
>>>>> pros set<text>,
>>>>> cons set<text>,
>>>>> feature_rating map<text, int>
>>>>> etc....
>>>>> }
>>>>> I created partition key as product_id (so that all the reviews for a
>>>>> given product will reside on same node)
>>>>> and clustering key as created_at and id (Desc) so that  reviews will
>>>>> be sorted by time.
>>>>>
>>>>> I can have more column and that requirement I want to fulfil by
>>>>> dynamic columns but there are limitations to it explained above.
>>>>> Could you please let me know the best way.
>>>>>
>>>>> On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <
>>>>> jlacefield@datastax.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>>   Have you looked at solving this challenge with clustering columns?
>>>>>> Also, please describe the problem set details for more specific advice from
>>>>>> this group.
>>>>>>
>>>>>>   Starting new projects on Thrift isn't the recommended approach.
>>>>>>
>>>>>> Jonathan
>>>>>>
>>>>>> [image: datastax_logo.png]
>>>>>>
>>>>>> Jonathan Lacefield
>>>>>>
>>>>>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>>>>>
>>>>>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>>>>>> facebook.png] <https://www.facebook.com/datastax> [image:
>>>>>> twitter.png] <https://twitter.com/datastax> [image: g+.png]
>>>>>> <https://plus.google.com/+Datastax/about>
>>>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>>>
>>>>>> On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <
>>>>>> chetanverma82@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am starting a new project with cassandra as database.
>>>>>>> I have unstructured data so I need dynamic columns,
>>>>>>> though in CQL3 we can achive this via Collections but there are some
>>>>>>> downsides to it.
>>>>>>> 1. Collections are used to store small amount of data.
>>>>>>> 2. The maximum size of an item in a collection is 64K.
>>>>>>> 3. Cassandra reads a collection in its entirety.
>>>>>>> 4. Restrictions on number of items in collections is 64,000
>>>>>>>
>>>>>>> And no support to get single column by map key, which is possible
>>>>>>> via cassandra cli.
>>>>>>> Please suggest whether I should use CQL3 or Thrift and which driver
>>>>>>> is best.
>>>>>>>
>>>>>>> --
>>>>>>> *Regards,*
>>>>>>> *Chetan Verma*
>>>>>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Regards,*
>>>>> *Chetan Verma*
>>>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Regards,*
>>> *Chetan Verma*
>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>
>>
>>
>>
>> --
>> *Regards,*
>> *Chetan Verma*
>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>
>>
>

Re:Re: Dynamic Columns

Posted by Xu Zhongxing <xu...@163.com>.
I approximate dynamic columns by data_key and data_value columns.
Is there a better way to get dynamic columns in CQL 3?

At 2015-01-21 09:41:02, "Peter Lin" <wo...@gmail.com> wrote:



I think that table example misses the point of chetan's functional requirement. he actually needs dynamic columns.



On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing <xu...@163.com> wrote:

Maybe this is the closest thing to "dynamic columns" in CQL 3.


create table reivew (
    product_id bigint,
    created_at timestamp,
    data_key text,
    data_tvalue text,
    data_ivalue int,
    primary key ((priduct_id, created_at), data_key)
);


data_tvalue and data_ivalue is optional.


At 2015-01-21 04:44:07, "chetan verma" <ch...@gmail.com> wrote:

Hi,


Adding to previous mail. For example: We have a column family named review (with some arbitrary data in map).


CREATE TABLE review(
product_id bigint,
created_at timestamp,
data_int map<text, int>,
data_text map<text, text>,
PRIMARY KEY (product_id, created_at)
);


Assume that these 2 maps I use to store arbitrary data (i.e. data_int and data_text for int and text values)
when we see output on cassandra-cli, it looks like in a partition as :
<clustering_key>:data_int:map_key as column name and value as map value.
suppose I need to get this value, I couldn't do that with CQL3 but in thrift its possible. Any Solution?


On Wed, Jan 21, 2015 at 1:06 AM, chetan verma <ch...@gmail.com> wrote:

Hi,


Most of the time I will  be querying on product_id and created_at, but for analytic I need to query almost on all column.
Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean 
columns for certain keys which is possible with thrift. Please suggest.


On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield <jl...@datastax.com> wrote:

Hello,


There are probably lots of options to this challenge.  The more details around your use case that you can provide, the easier it will be for this group to offer advice.


A few follow-up questions: 
  - How will you query this data?  
  - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns?


Depending on the answers to these questions, you have several options, of which here are a few:
Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty
Could use a clustering column to store a columns type and another col (potentially clustering) to store the value
i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue));
where attname stores the name of the attribute/column and attvalue stores the value of that attribute
have seen users use this model and create a "main" attribute row within a partition that stores the values associated with col4...n
Could store multiple collections
Others probably have ideas as well
You may want to look in the archives for a similar discussion topic.  Believe this item was asked a few months ago as well.



Jonathan Lacefield

Solution Architect |(404) 822 3487 | jlacefield@datastax.com





On Tue, Jan 20, 2015 at 1:40 PM, chetan verma <ch...@gmail.com> wrote:

Hi,


I am creating a review system. for instance lets assume following are the attibutes of system:


Review{
id bigint,
product_id bigint,
created_at timestamp,
summary text,
description text,
pros set<text>,
cons set<text>,
feature_rating map<text, int>
etc....
}
I created partition key as product_id (so that all the reviews for a given product will reside on same node)
and clustering key as created_at and id (Desc) so that  reviews will be sorted by time.


I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above.
Could you please let me know the best way.


On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <jl...@datastax.com> wrote:

Hello,


  Have you looked at solving this challenge with clustering columns?  Also, please describe the problem set details for more specific advice from this group.


  Starting new projects on Thrift isn't the recommended approach.  


Jonathan



Jonathan Lacefield

Solution Architect |(404) 822 3487 | jlacefield@datastax.com





On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <ch...@gmail.com> wrote:

Hi,


I am starting a new project with cassandra as database.
I have unstructured data so I need dynamic columns, 
though in CQL3 we can achive this via Collections but there are some downsides to it.
1. Collections are used to store small amount of data.
2. The maximum size of an item in a collection is 64K.
3. Cassandra reads a collection in its entirety.
4. Restrictions on number of items in collections is 64,000


And no support to get single column by map key, which is possible via cassandra cli.
Please suggest whether I should use CQL3 or Thrift and which driver is best.


--

Regards,
Chetan Verma
+91 99860 86634







--

Regards,
Chetan Verma
+91 99860 86634







--

Regards,
Chetan Verma
+91 99860 86634





--

Regards,
Chetan Verma
+91 99860 86634


Re: Dynamic Columns

Posted by Peter Lin <wo...@gmail.com>.
I think that table example misses the point of chetan's functional
requirement. he actually needs dynamic columns.

On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing <xu...@163.com> wrote:

> Maybe this is the closest thing to "dynamic columns" in CQL 3.
>
> create table reivew (
>     product_id bigint,
>     created_at timestamp,
>     data_key text,
>     data_tvalue text,
>     data_ivalue int,
>     primary key ((priduct_id, created_at), data_key)
> );
>
> data_tvalue and data_ivalue is optional.
>
> At 2015-01-21 04:44:07, "chetan verma" <ch...@gmail.com> wrote:
>
> Hi,
>
> Adding to previous mail. For example: We have a column family named review
> (with some arbitrary data in map).
>
> CREATE TABLE review(
> product_id bigint,
> created_at timestamp,
> data_int map<text, int>,
> data_text map<text, text>,
> PRIMARY KEY (product_id, created_at)
> );
>
> Assume that these 2 maps I use to store arbitrary data (i.e. data_int and
> data_text for int and text values)
> when we see output on cassandra-cli, it looks like in a partition as :
> <clustering_key>:data_int:map_key as column name and value as map value.
> suppose I need to get this value, I couldn't do that with CQL3 but in
> thrift its possible. Any Solution?
>
> On Wed, Jan 21, 2015 at 1:06 AM, chetan verma <ch...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Most of the time I will  be querying on product_id and created_at, but
>> for analytic I need to query almost on all column.
>> Multiple collections ideas is good but the only is cassandra reads a
>> collection entirely, what if I need a slice of it, I mean
>> columns for certain keys which is possible with thrift. Please suggest.
>>
>> On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield <
>> jlacefield@datastax.com> wrote:
>>
>>> Hello,
>>>
>>> There are probably lots of options to this challenge.  The more details
>>> around your use case that you can provide, the easier it will be for this
>>> group to offer advice.
>>>
>>> A few follow-up questions:
>>>   - How will you query this data?
>>>   - Do your queries require filtering on specific columns other than
>>> product_id and created_at, i.e. the dynamic columns?
>>>
>>> Depending on the answers to these questions, you have several options,
>>> of which here are a few:
>>>
>>>    - Cassandra efficiently stores sparse data, so you could create
>>>    columns and not populate them, without much of a penalty
>>>    - Could use a clustering column to store a columns type and another
>>>    col (potentially clustering) to store the value
>>>       - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text,
>>>       col4...n, PRIMARY KEY (col1, attname, attvalue));
>>>       - where attname stores the name of the attribute/column and
>>>       attvalue stores the value of that attribute
>>>       - have seen users use this model and create a "main" attribute
>>>       row within a partition that stores the values associated with col4...n
>>>    - Could store multiple collections
>>>    - Others probably have ideas as well
>>>
>>> You may want to look in the archives for a similar discussion topic.
>>> Believe this item was asked a few months ago as well.
>>>
>>> [image: datastax_logo.png]
>>>
>>> Jonathan Lacefield
>>>
>>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>>
>>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>
>>> On Tue, Jan 20, 2015 at 1:40 PM, chetan verma <ch...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am creating a review system. for instance lets assume following are
>>>> the attibutes of system:
>>>>
>>>> Review{
>>>> id bigint,
>>>> product_id bigint,
>>>> created_at timestamp,
>>>> summary text,
>>>> description text,
>>>> pros set<text>,
>>>> cons set<text>,
>>>> feature_rating map<text, int>
>>>> etc....
>>>> }
>>>> I created partition key as product_id (so that all the reviews for a
>>>> given product will reside on same node)
>>>> and clustering key as created_at and id (Desc) so that  reviews will be
>>>> sorted by time.
>>>>
>>>> I can have more column and that requirement I want to fulfil by dynamic
>>>> columns but there are limitations to it explained above.
>>>> Could you please let me know the best way.
>>>>
>>>> On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <
>>>> jlacefield@datastax.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>>   Have you looked at solving this challenge with clustering columns?
>>>>> Also, please describe the problem set details for more specific advice from
>>>>> this group.
>>>>>
>>>>>   Starting new projects on Thrift isn't the recommended approach.
>>>>>
>>>>> Jonathan
>>>>>
>>>>> [image: datastax_logo.png]
>>>>>
>>>>> Jonathan Lacefield
>>>>>
>>>>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>>>>
>>>>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>>> <https://twitter.com/datastax> [image: g+.png]
>>>>> <https://plus.google.com/+Datastax/about>
>>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>>
>>>>> On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <chetanverma82@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am starting a new project with cassandra as database.
>>>>>> I have unstructured data so I need dynamic columns,
>>>>>> though in CQL3 we can achive this via Collections but there are some
>>>>>> downsides to it.
>>>>>> 1. Collections are used to store small amount of data.
>>>>>> 2. The maximum size of an item in a collection is 64K.
>>>>>> 3. Cassandra reads a collection in its entirety.
>>>>>> 4. Restrictions on number of items in collections is 64,000
>>>>>>
>>>>>> And no support to get single column by map key, which is possible via
>>>>>> cassandra cli.
>>>>>> Please suggest whether I should use CQL3 or Thrift and which driver
>>>>>> is best.
>>>>>>
>>>>>> --
>>>>>> *Regards,*
>>>>>> *Chetan Verma*
>>>>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Regards,*
>>>> *Chetan Verma*
>>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>>
>>>
>>>
>>
>>
>> --
>> *Regards,*
>> *Chetan Verma*
>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>
>
>
>
> --
> *Regards,*
> *Chetan Verma*
> *+91 99860 86634 <%2B91%2099860%2086634>*
>
>

Re: Dynamic Columns

Posted by Xu Zhongxing <xu...@163.com>.
Maybe this is the closest thing to "dynamic columns" in CQL 3.


create table reivew (
    product_id bigint,
    created_at timestamp,
    data_key text,
    data_tvalue text,
    data_ivalue int,
    primary key ((priduct_id, created_at), data_key)
);


data_tvalue and data_ivalue is optional.


At 2015-01-21 04:44:07, "chetan verma" <ch...@gmail.com> wrote:

Hi,


Adding to previous mail. For example: We have a column family named review (with some arbitrary data in map).


CREATE TABLE review(
product_id bigint,
created_at timestamp,
data_int map<text, int>,
data_text map<text, text>,
PRIMARY KEY (product_id, created_at)
);


Assume that these 2 maps I use to store arbitrary data (i.e. data_int and data_text for int and text values)
when we see output on cassandra-cli, it looks like in a partition as :
<clustering_key>:data_int:map_key as column name and value as map value.
suppose I need to get this value, I couldn't do that with CQL3 but in thrift its possible. Any Solution?


On Wed, Jan 21, 2015 at 1:06 AM, chetan verma <ch...@gmail.com> wrote:

Hi,


Most of the time I will  be querying on product_id and created_at, but for analytic I need to query almost on all column.
Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean 
columns for certain keys which is possible with thrift. Please suggest.


On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield <jl...@datastax.com> wrote:

Hello,


There are probably lots of options to this challenge.  The more details around your use case that you can provide, the easier it will be for this group to offer advice.


A few follow-up questions: 
  - How will you query this data?  
  - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns?


Depending on the answers to these questions, you have several options, of which here are a few:
Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty
Could use a clustering column to store a columns type and another col (potentially clustering) to store the value
i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue));
where attname stores the name of the attribute/column and attvalue stores the value of that attribute
have seen users use this model and create a "main" attribute row within a partition that stores the values associated with col4...n
Could store multiple collections
Others probably have ideas as well
You may want to look in the archives for a similar discussion topic.  Believe this item was asked a few months ago as well.



Jonathan Lacefield

Solution Architect |(404) 822 3487 | jlacefield@datastax.com





On Tue, Jan 20, 2015 at 1:40 PM, chetan verma <ch...@gmail.com> wrote:

Hi,


I am creating a review system. for instance lets assume following are the attibutes of system:


Review{
id bigint,
product_id bigint,
created_at timestamp,
summary text,
description text,
pros set<text>,
cons set<text>,
feature_rating map<text, int>
etc....
}
I created partition key as product_id (so that all the reviews for a given product will reside on same node)
and clustering key as created_at and id (Desc) so that  reviews will be sorted by time.


I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above.
Could you please let me know the best way.


On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <jl...@datastax.com> wrote:

Hello,


  Have you looked at solving this challenge with clustering columns?  Also, please describe the problem set details for more specific advice from this group.


  Starting new projects on Thrift isn't the recommended approach.  


Jonathan



Jonathan Lacefield

Solution Architect |(404) 822 3487 | jlacefield@datastax.com





On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <ch...@gmail.com> wrote:

Hi,


I am starting a new project with cassandra as database.
I have unstructured data so I need dynamic columns, 
though in CQL3 we can achive this via Collections but there are some downsides to it.
1. Collections are used to store small amount of data.
2. The maximum size of an item in a collection is 64K.
3. Cassandra reads a collection in its entirety.
4. Restrictions on number of items in collections is 64,000


And no support to get single column by map key, which is possible via cassandra cli.
Please suggest whether I should use CQL3 or Thrift and which driver is best.


--

Regards,
Chetan Verma
+91 99860 86634







--

Regards,
Chetan Verma
+91 99860 86634







--

Regards,
Chetan Verma
+91 99860 86634





--

Regards,
Chetan Verma
+91 99860 86634

Re: Dynamic Columns

Posted by chetan verma <ch...@gmail.com>.
Hi,

Adding to previous mail. For example: We have a column family named review
(with some arbitrary data in map).

CREATE TABLE review(
product_id bigint,
created_at timestamp,
data_int map<text, int>,
data_text map<text, text>,
PRIMARY KEY (product_id, created_at)
);

Assume that these 2 maps I use to store arbitrary data (i.e. data_int and
data_text for int and text values)
when we see output on cassandra-cli, it looks like in a partition as :
<clustering_key>:data_int:map_key as column name and value as map value.
suppose I need to get this value, I couldn't do that with CQL3 but in
thrift its possible. Any Solution?

On Wed, Jan 21, 2015 at 1:06 AM, chetan verma <ch...@gmail.com>
wrote:

> Hi,
>
> Most of the time I will  be querying on product_id and created_at, but for
> analytic I need to query almost on all column.
> Multiple collections ideas is good but the only is cassandra reads a
> collection entirely, what if I need a slice of it, I mean
> columns for certain keys which is possible with thrift. Please suggest.
>
> On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield <
> jlacefield@datastax.com> wrote:
>
>> Hello,
>>
>> There are probably lots of options to this challenge.  The more details
>> around your use case that you can provide, the easier it will be for this
>> group to offer advice.
>>
>> A few follow-up questions:
>>   - How will you query this data?
>>   - Do your queries require filtering on specific columns other than
>> product_id and created_at, i.e. the dynamic columns?
>>
>> Depending on the answers to these questions, you have several options, of
>> which here are a few:
>>
>>    - Cassandra efficiently stores sparse data, so you could create
>>    columns and not populate them, without much of a penalty
>>    - Could use a clustering column to store a columns type and another
>>    col (potentially clustering) to store the value
>>       - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text,
>>       col4...n, PRIMARY KEY (col1, attname, attvalue));
>>       - where attname stores the name of the attribute/column and
>>       attvalue stores the value of that attribute
>>       - have seen users use this model and create a "main" attribute row
>>       within a partition that stores the values associated with col4...n
>>    - Could store multiple collections
>>    - Others probably have ideas as well
>>
>> You may want to look in the archives for a similar discussion topic.
>> Believe this item was asked a few months ago as well.
>>
>> [image: datastax_logo.png]
>>
>> Jonathan Lacefield
>>
>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>
>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>
>> On Tue, Jan 20, 2015 at 1:40 PM, chetan verma <ch...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am creating a review system. for instance lets assume following are
>>> the attibutes of system:
>>>
>>> Review{
>>> id bigint,
>>> product_id bigint,
>>> created_at timestamp,
>>> summary text,
>>> description text,
>>> pros set<text>,
>>> cons set<text>,
>>> feature_rating map<text, int>
>>> etc....
>>> }
>>> I created partition key as product_id (so that all the reviews for a
>>> given product will reside on same node)
>>> and clustering key as created_at and id (Desc) so that  reviews will be
>>> sorted by time.
>>>
>>> I can have more column and that requirement I want to fulfil by dynamic
>>> columns but there are limitations to it explained above.
>>> Could you please let me know the best way.
>>>
>>> On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <
>>> jlacefield@datastax.com> wrote:
>>>
>>>> Hello,
>>>>
>>>>   Have you looked at solving this challenge with clustering columns?
>>>> Also, please describe the problem set details for more specific advice from
>>>> this group.
>>>>
>>>>   Starting new projects on Thrift isn't the recommended approach.
>>>>
>>>> Jonathan
>>>>
>>>> [image: datastax_logo.png]
>>>>
>>>> Jonathan Lacefield
>>>>
>>>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>>>
>>>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>> <https://twitter.com/datastax> [image: g+.png]
>>>> <https://plus.google.com/+Datastax/about>
>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>
>>>> On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <ch...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am starting a new project with cassandra as database.
>>>>> I have unstructured data so I need dynamic columns,
>>>>> though in CQL3 we can achive this via Collections but there are some
>>>>> downsides to it.
>>>>> 1. Collections are used to store small amount of data.
>>>>> 2. The maximum size of an item in a collection is 64K.
>>>>> 3. Cassandra reads a collection in its entirety.
>>>>> 4. Restrictions on number of items in collections is 64,000
>>>>>
>>>>> And no support to get single column by map key, which is possible via
>>>>> cassandra cli.
>>>>> Please suggest whether I should use CQL3 or Thrift and which driver is
>>>>> best.
>>>>>
>>>>> --
>>>>> *Regards,*
>>>>> *Chetan Verma*
>>>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Regards,*
>>> *Chetan Verma*
>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>
>>
>>
>
>
> --
> *Regards,*
> *Chetan Verma*
> *+91 99860 86634 <%2B91%2099860%2086634>*
>



-- 
*Regards,*
*Chetan Verma*
*+91 99860 86634*

Re: Dynamic Columns

Posted by chetan verma <ch...@gmail.com>.
Hi,

Most of the time I will  be querying on product_id and created_at, but for
analytic I need to query almost on all column.
Multiple collections ideas is good but the only is cassandra reads a
collection entirely, what if I need a slice of it, I mean
columns for certain keys which is possible with thrift. Please suggest.

On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield <
jlacefield@datastax.com> wrote:

> Hello,
>
> There are probably lots of options to this challenge.  The more details
> around your use case that you can provide, the easier it will be for this
> group to offer advice.
>
> A few follow-up questions:
>   - How will you query this data?
>   - Do your queries require filtering on specific columns other than
> product_id and created_at, i.e. the dynamic columns?
>
> Depending on the answers to these questions, you have several options, of
> which here are a few:
>
>    - Cassandra efficiently stores sparse data, so you could create
>    columns and not populate them, without much of a penalty
>    - Could use a clustering column to store a columns type and another
>    col (potentially clustering) to store the value
>       - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text,
>       col4...n, PRIMARY KEY (col1, attname, attvalue));
>       - where attname stores the name of the attribute/column and
>       attvalue stores the value of that attribute
>       - have seen users use this model and create a "main" attribute row
>       within a partition that stores the values associated with col4...n
>    - Could store multiple collections
>    - Others probably have ideas as well
>
> You may want to look in the archives for a similar discussion topic.
> Believe this item was asked a few months ago as well.
>
> [image: datastax_logo.png]
>
> Jonathan Lacefield
>
> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>
> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>
> On Tue, Jan 20, 2015 at 1:40 PM, chetan verma <ch...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am creating a review system. for instance lets assume following are the
>> attibutes of system:
>>
>> Review{
>> id bigint,
>> product_id bigint,
>> created_at timestamp,
>> summary text,
>> description text,
>> pros set<text>,
>> cons set<text>,
>> feature_rating map<text, int>
>> etc....
>> }
>> I created partition key as product_id (so that all the reviews for a
>> given product will reside on same node)
>> and clustering key as created_at and id (Desc) so that  reviews will be
>> sorted by time.
>>
>> I can have more column and that requirement I want to fulfil by dynamic
>> columns but there are limitations to it explained above.
>> Could you please let me know the best way.
>>
>> On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <
>> jlacefield@datastax.com> wrote:
>>
>>> Hello,
>>>
>>>   Have you looked at solving this challenge with clustering columns?
>>> Also, please describe the problem set details for more specific advice from
>>> this group.
>>>
>>>   Starting new projects on Thrift isn't the recommended approach.
>>>
>>> Jonathan
>>>
>>> [image: datastax_logo.png]
>>>
>>> Jonathan Lacefield
>>>
>>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>>
>>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>
>>> On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <ch...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am starting a new project with cassandra as database.
>>>> I have unstructured data so I need dynamic columns,
>>>> though in CQL3 we can achive this via Collections but there are some
>>>> downsides to it.
>>>> 1. Collections are used to store small amount of data.
>>>> 2. The maximum size of an item in a collection is 64K.
>>>> 3. Cassandra reads a collection in its entirety.
>>>> 4. Restrictions on number of items in collections is 64,000
>>>>
>>>> And no support to get single column by map key, which is possible via
>>>> cassandra cli.
>>>> Please suggest whether I should use CQL3 or Thrift and which driver is
>>>> best.
>>>>
>>>> --
>>>> *Regards,*
>>>> *Chetan Verma*
>>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>>
>>>
>>>
>>
>>
>> --
>> *Regards,*
>> *Chetan Verma*
>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>
>
>


-- 
*Regards,*
*Chetan Verma*
*+91 99860 86634*

Re: Dynamic Columns

Posted by Jonathan Lacefield <jl...@datastax.com>.
Hello,

There are probably lots of options to this challenge.  The more details
around your use case that you can provide, the easier it will be for this
group to offer advice.

A few follow-up questions:
  - How will you query this data?
  - Do your queries require filtering on specific columns other than
product_id and created_at, i.e. the dynamic columns?

Depending on the answers to these questions, you have several options, of
which here are a few:

   - Cassandra efficiently stores sparse data, so you could create columns
   and not populate them, without much of a penalty
   - Could use a clustering column to store a columns type and another col
   (potentially clustering) to store the value
      - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text,
      col4...n, PRIMARY KEY (col1, attname, attvalue));
      - where attname stores the name of the attribute/column and attvalue
      stores the value of that attribute
      - have seen users use this model and create a "main" attribute row
      within a partition that stores the values associated with col4...n
   - Could store multiple collections
   - Others probably have ideas as well

You may want to look in the archives for a similar discussion topic.
Believe this item was asked a few months ago as well.

[image: datastax_logo.png]

Jonathan Lacefield

Solution Architect | (404) 822 3487 | jlacefield@datastax.com

[image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax> <https://github.com/datastax/>

On Tue, Jan 20, 2015 at 1:40 PM, chetan verma <ch...@gmail.com>
wrote:

> Hi,
>
> I am creating a review system. for instance lets assume following are the
> attibutes of system:
>
> Review{
> id bigint,
> product_id bigint,
> created_at timestamp,
> summary text,
> description text,
> pros set<text>,
> cons set<text>,
> feature_rating map<text, int>
> etc....
> }
> I created partition key as product_id (so that all the reviews for a given
> product will reside on same node)
> and clustering key as created_at and id (Desc) so that  reviews will be
> sorted by time.
>
> I can have more column and that requirement I want to fulfil by dynamic
> columns but there are limitations to it explained above.
> Could you please let me know the best way.
>
> On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <
> jlacefield@datastax.com> wrote:
>
>> Hello,
>>
>>   Have you looked at solving this challenge with clustering columns?
>> Also, please describe the problem set details for more specific advice from
>> this group.
>>
>>   Starting new projects on Thrift isn't the recommended approach.
>>
>> Jonathan
>>
>> [image: datastax_logo.png]
>>
>> Jonathan Lacefield
>>
>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>
>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>
>> On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <ch...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am starting a new project with cassandra as database.
>>> I have unstructured data so I need dynamic columns,
>>> though in CQL3 we can achive this via Collections but there are some
>>> downsides to it.
>>> 1. Collections are used to store small amount of data.
>>> 2. The maximum size of an item in a collection is 64K.
>>> 3. Cassandra reads a collection in its entirety.
>>> 4. Restrictions on number of items in collections is 64,000
>>>
>>> And no support to get single column by map key, which is possible via
>>> cassandra cli.
>>> Please suggest whether I should use CQL3 or Thrift and which driver is
>>> best.
>>>
>>> --
>>> *Regards,*
>>> *Chetan Verma*
>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>
>>
>>
>
>
> --
> *Regards,*
> *Chetan Verma*
> *+91 99860 86634 <%2B91%2099860%2086634>*
>

Re: Dynamic Columns

Posted by chetan verma <ch...@gmail.com>.
Could you please explain how we can achieve dynamic column behavior by
clustering columns.

On Wed, Jan 21, 2015 at 12:10 AM, chetan verma <ch...@gmail.com>
wrote:

> Hi,
>
> I am creating a review system. for instance lets assume following are the
> attibutes of system:
>
> Review{
> id bigint,
> product_id bigint,
> created_at timestamp,
> summary text,
> description text,
> pros set<text>,
> cons set<text>,
> feature_rating map<text, int>
> etc....
> }
> I created partition key as product_id (so that all the reviews for a given
> product will reside on same node)
> and clustering key as created_at and id (Desc) so that  reviews will be
> sorted by time.
>
> I can have more column and that requirement I want to fulfil by dynamic
> columns but there are limitations to it explained above.
> Could you please let me know the best way.
>
> On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <
> jlacefield@datastax.com> wrote:
>
>> Hello,
>>
>>   Have you looked at solving this challenge with clustering columns?
>> Also, please describe the problem set details for more specific advice from
>> this group.
>>
>>   Starting new projects on Thrift isn't the recommended approach.
>>
>> Jonathan
>>
>> [image: datastax_logo.png]
>>
>> Jonathan Lacefield
>>
>> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>>
>> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>
>> On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <ch...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am starting a new project with cassandra as database.
>>> I have unstructured data so I need dynamic columns,
>>> though in CQL3 we can achive this via Collections but there are some
>>> downsides to it.
>>> 1. Collections are used to store small amount of data.
>>> 2. The maximum size of an item in a collection is 64K.
>>> 3. Cassandra reads a collection in its entirety.
>>> 4. Restrictions on number of items in collections is 64,000
>>>
>>> And no support to get single column by map key, which is possible via
>>> cassandra cli.
>>> Please suggest whether I should use CQL3 or Thrift and which driver is
>>> best.
>>>
>>> --
>>> *Regards,*
>>> *Chetan Verma*
>>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>>
>>
>>
>
>
> --
> *Regards,*
> *Chetan Verma*
> *+91 99860 86634 <%2B91%2099860%2086634>*
>



-- 
*Regards,*
*Chetan Verma*
*+91 99860 86634*

Re: Dynamic Columns

Posted by chetan verma <ch...@gmail.com>.
Hi,

I am creating a review system. for instance lets assume following are the
attibutes of system:

Review{
id bigint,
product_id bigint,
created_at timestamp,
summary text,
description text,
pros set<text>,
cons set<text>,
feature_rating map<text, int>
etc....
}
I created partition key as product_id (so that all the reviews for a given
product will reside on same node)
and clustering key as created_at and id (Desc) so that  reviews will be
sorted by time.

I can have more column and that requirement I want to fulfil by dynamic
columns but there are limitations to it explained above.
Could you please let me know the best way.

On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield <
jlacefield@datastax.com> wrote:

> Hello,
>
>   Have you looked at solving this challenge with clustering columns?
> Also, please describe the problem set details for more specific advice from
> this group.
>
>   Starting new projects on Thrift isn't the recommended approach.
>
> Jonathan
>
> [image: datastax_logo.png]
>
> Jonathan Lacefield
>
> Solution Architect | (404) 822 3487 | jlacefield@datastax.com
>
> [image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>
> On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <ch...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am starting a new project with cassandra as database.
>> I have unstructured data so I need dynamic columns,
>> though in CQL3 we can achive this via Collections but there are some
>> downsides to it.
>> 1. Collections are used to store small amount of data.
>> 2. The maximum size of an item in a collection is 64K.
>> 3. Cassandra reads a collection in its entirety.
>> 4. Restrictions on number of items in collections is 64,000
>>
>> And no support to get single column by map key, which is possible via
>> cassandra cli.
>> Please suggest whether I should use CQL3 or Thrift and which driver is
>> best.
>>
>> --
>> *Regards,*
>> *Chetan Verma*
>> *+91 99860 86634 <%2B91%2099860%2086634>*
>>
>
>


-- 
*Regards,*
*Chetan Verma*
*+91 99860 86634*

Re: Dynamic Columns

Posted by Jonathan Lacefield <jl...@datastax.com>.
Hello,

  Have you looked at solving this challenge with clustering columns?  Also,
please describe the problem set details for more specific advice from this
group.

  Starting new projects on Thrift isn't the recommended approach.

Jonathan

[image: datastax_logo.png]

Jonathan Lacefield

Solution Architect | (404) 822 3487 | jlacefield@datastax.com

[image: linkedin.png] <http://www.linkedin.com/in/jlacefield/> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax> <https://github.com/datastax/>

On Tue, Jan 20, 2015 at 1:24 PM, chetan verma <ch...@gmail.com>
wrote:

> Hi,
>
> I am starting a new project with cassandra as database.
> I have unstructured data so I need dynamic columns,
> though in CQL3 we can achive this via Collections but there are some
> downsides to it.
> 1. Collections are used to store small amount of data.
> 2. The maximum size of an item in a collection is 64K.
> 3. Cassandra reads a collection in its entirety.
> 4. Restrictions on number of items in collections is 64,000
>
> And no support to get single column by map key, which is possible via
> cassandra cli.
> Please suggest whether I should use CQL3 or Thrift and which driver is
> best.
>
> --
> *Regards,*
> *Chetan Verma*
> *+91 99860 86634 <%2B91%2099860%2086634>*
>