You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Hiller, Dean" <De...@nrel.gov> on 2013/09/13 19:47:31 UTC

is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

I was just wondering if cassandra had any special CF that every row exists on every node for smaller tables that we would want to leverage in map/reduce.  The table row count is less than 500k and we are ok with slow updates to the table, but this would make M/R blazingly fast since for every row, we read into this table.

Thanks,
Dean

RE: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

Posted by java8964 java8964 <ja...@hotmail.com>.

Or some configuration in Cassandra integration part of Hadoop, that tells Cassandra we know this table is small enough, make it a distribute Cache in hadoop, in all the MR jobs generated in Cassandra.

Date: Fri, 13 Sep 2013 14:06:50 -0700
Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?
From: rcoli@eventbrite.com
To: user@cassandra.apache.org

On Fri, Sep 13, 2013 at 11:15 AM, Hiller, Dean <De...@nrel.gov> wrote:

When I add nodes though, I would kind of be screwed there, right?  Is there an RF=${nodecount}…that would be neat.

Increasing replication factor is well understood, and in this case you could pre-load the entire dataset onto the new node instead of having it bootstrap.
But the DC-per-node idea is.. kinda interesting..

=Rob

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Sep 13, 2013 at 11:15 AM, Hiller, Dean <De...@nrel.gov> wrote:

> When I add nodes though, I would kind of be screwed there, right?  Is
> there an RF=${nodecount}…that would be neat.
>

Increasing replication factor is well understood, and in this case you
could pre-load the entire dataset onto the new node instead of having it
bootstrap.

But the DC-per-node idea is.. kinda interesting..

=Rob

Re: Revisit with another spin: is there any type of table existing on all nodes?

Posted by Sylvain Lebresne <sy...@datastax.com>.

On Wed, Sep 18, 2013 at 3:09 PM, Hiller, Dean <De...@nrel.gov> wrote:

> The meta information stored on behalf of CQL must exist on all nodes and
> must update all nodes as well.
>
> What table type is that meta information stored in?  And is it possible we
> can use that same type of table?
>

It's not really a special table type, but more a wholly different code path
that is used, and is very specialized for the schema.


>
> Are any of the meta tables writeable by a thrift/CQL client such that we
> could store info in there without creating new CF's?
>

Doing so would break your nodes badly because as said, the code is pretty
specialized to handle schema and not at all expecting random data it
doesn't know about. So no.

--
Sylvain


> Thanks,
> Dean
>
> From: <Hiller>, Nrel <De...@nrel.gov>>
> Date: Friday, September 13, 2013 3:03 PM
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <
> user@cassandra.apache.org<ma...@cassandra.apache.org>>
> Subject: Re: is there any type of table existing on all nodes(slow to up
> date, fast to read in map/reduce)?
>
> That's an interesting idea…..so that would be an RF=1 in each data
> center…..very interesting.
>
> Dean
>
> From: Jonathan Haddad <jo...@jonhaddad.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <
> user@cassandra.apache.org<ma...@cassandra.apache.org>>
> Date: Friday, September 13, 2013 1:50 PM
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <
> user@cassandra.apache.org<ma...@cassandra.apache.org>>
> Subject: Re: is there any type of table existing on all nodes(slow to up
> date, fast to read in map/reduce)?
>
> You could create a bunch of 1 node DCs if you really wanted it.
>
>
> On Fri, Sep 13, 2013 at 12:29 PM, Hiller, Dean <Dean.Hiller@nrel.gov
> <ma...@nrel.gov>> wrote:
> Actually, I have been on a few projects where something like that is
> useful.  Gemfire(a grid memory cache) had that feature which we used at
> another company.  On every project I encounter, there is usually one small
> table somewhereŠ.either meta data or something that is infrequently
> changing and nice to duplicate on every node.  I bet eventually nosql
> stores may start to add it maybe in a few years, but I guess we are not
> there yet.
>
> Thanks,
> Dean
>
> On 9/13/13 12:24 PM, "Jon Haddad" <jon@jonhaddad.com<mailto:
> jon@jonhaddad.com>> wrote:
>
> >It sounds some something that's only useful in a really limited use case.
> > In an 11 node cluster it would be quorum reads / writes would need to
> >come from 6 nodes.  It would probably be much slower for both reads &
> >writes.
> >
> >It sounds like what you want is a database with replication, not
> >partitioning.
> >
> >On Sep 13, 2013, at 11:15 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov
> <ma...@nrel.gov>> wrote:
> >
> >> When I add nodes though, I would kind of be screwed there, right?  Is
> >>there an RF=${nodecount}Šthat would be neat.
> >>
> >> Dean
> >>
> >> From: Robert Coli <rcoli@eventbrite.com<mailto:rcoli@eventbrite.com
> ><ma...@eventbrite.com>>>
> >> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org
> ><ma...@cassandra.apache.org>>"
> >><us...@cassandra.apache.org><mailto:
> user@cassandra.apache.org<ma...@cassandra.apache.org>>>
> >> Date: Friday, September 13, 2013 12:06 PM
> >> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org
> ><ma...@cassandra.apache.org>>"
> >><us...@cassandra.apache.org><mailto:
> user@cassandra.apache.org<ma...@cassandra.apache.org>>>
> >> Subject: Re: is there any type of table existing on all nodes(slow to
> >>up date, fast to read in map/reduce)?
> >>
> >> On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean
> >><De...@nrel.gov><mailto:
> Dean.Hiller@nrel.gov<ma...@nrel.gov>>> wrote:
> >> I was just wondering if cassandra had any special CF that every row
> >>exists on every node for smaller tables that we would want to leverage
> >>in map/reduce.  The table row count is less than 500k and we are ok with
> >>slow updates to the table, but this would make M/R blazingly fast since
> >>for every row, we read into this table.
> >>
> >> Create a keyspace with replication configured such that RF=N?
> >>
> >> =Rob
> >
>
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

Revisit with another spin: is there any type of table existing on all nodes?

Posted by "Hiller, Dean" <De...@nrel.gov>.

The meta information stored on behalf of CQL must exist on all nodes and must update all nodes as well.

What table type is that meta information stored in?  And is it possible we can use that same type of table?

After all, this makes M/R blazingly fast to do local lookups in the database (and we just have additional meta information that is used by all rows).

Are any of the meta tables writeable by a thrift/CQL client such that we could store info in there without creating new CF's?

Thanks,
Dean

From: <Hiller>, Nrel <De...@nrel.gov>>
Date: Friday, September 13, 2013 3:03 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

That's an interesting idea…..so that would be an RF=1 in each data center…..very interesting.

Dean

From: Jonathan Haddad <jo...@jonhaddad.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday, September 13, 2013 1:50 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

You could create a bunch of 1 node DCs if you really wanted it.

On Fri, Sep 13, 2013 at 12:29 PM, Hiller, Dean <De...@nrel.gov>> wrote:
Actually, I have been on a few projects where something like that is
useful.  Gemfire(a grid memory cache) had that feature which we used at
another company.  On every project I encounter, there is usually one small
table somewhereŠ.either meta data or something that is infrequently
changing and nice to duplicate on every node.  I bet eventually nosql
stores may start to add it maybe in a few years, but I guess we are not
there yet.

Thanks,
Dean

On 9/13/13 12:24 PM, "Jon Haddad" <jo...@jonhaddad.com>> wrote:

>It sounds some something that's only useful in a really limited use case.
> In an 11 node cluster it would be quorum reads / writes would need to
>come from 6 nodes.  It would probably be much slower for both reads &
>writes.
>
>It sounds like what you want is a database with replication, not
>partitioning.
>
>On Sep 13, 2013, at 11:15 AM, "Hiller, Dean" <De...@nrel.gov>> wrote:
>
>> When I add nodes though, I would kind of be screwed there, right?  Is
>>there an RF=${nodecount}Šthat would be neat.
>>
>> Dean
>>
>> From: Robert Coli <rc...@eventbrite.com>>>
>> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>>"
>><us...@cassandra.apache.org>>>
>> Date: Friday, September 13, 2013 12:06 PM
>> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>>"
>><us...@cassandra.apache.org>>>
>> Subject: Re: is there any type of table existing on all nodes(slow to
>>up date, fast to read in map/reduce)?
>>
>> On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean
>><De...@nrel.gov>>> wrote:
>> I was just wondering if cassandra had any special CF that every row
>>exists on every node for smaller tables that we would want to leverage
>>in map/reduce.  The table row count is less than 500k and we are ok with
>>slow updates to the table, but this would make M/R blazingly fast since
>>for every row, we read into this table.
>>
>> Create a keyspace with replication configured such that RF=N?
>>
>> =Rob
>

--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

Posted by "Hiller, Dean" <De...@nrel.gov>.

That's an interesting idea…..so that would be an RF=1 in each data center…..very interesting.

Dean

From: Jonathan Haddad <jo...@jonhaddad.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday, September 13, 2013 1:50 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

You could create a bunch of 1 node DCs if you really wanted it.

On Fri, Sep 13, 2013 at 12:29 PM, Hiller, Dean <De...@nrel.gov>> wrote:
Actually, I have been on a few projects where something like that is
useful.  Gemfire(a grid memory cache) had that feature which we used at
another company.  On every project I encounter, there is usually one small
table somewhereŠ.either meta data or something that is infrequently
changing and nice to duplicate on every node.  I bet eventually nosql
stores may start to add it maybe in a few years, but I guess we are not
there yet.

Thanks,
Dean

On 9/13/13 12:24 PM, "Jon Haddad" <jo...@jonhaddad.com>> wrote:

>It sounds some something that's only useful in a really limited use case.
> In an 11 node cluster it would be quorum reads / writes would need to
>come from 6 nodes.  It would probably be much slower for both reads &
>writes.
>
>It sounds like what you want is a database with replication, not
>partitioning.
>
>On Sep 13, 2013, at 11:15 AM, "Hiller, Dean" <De...@nrel.gov>> wrote:
>
>> When I add nodes though, I would kind of be screwed there, right?  Is
>>there an RF=${nodecount}Šthat would be neat.
>>
>> Dean
>>
>> From: Robert Coli <rc...@eventbrite.com>>>
>> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>>"
>><us...@cassandra.apache.org>>>
>> Date: Friday, September 13, 2013 12:06 PM
>> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>>"
>><us...@cassandra.apache.org>>>
>> Subject: Re: is there any type of table existing on all nodes(slow to
>>up date, fast to read in map/reduce)?
>>
>> On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean
>><De...@nrel.gov>>> wrote:
>> I was just wondering if cassandra had any special CF that every row
>>exists on every node for smaller tables that we would want to leverage
>>in map/reduce.  The table row count is less than 500k and we are ok with
>>slow updates to the table, but this would make M/R blazingly fast since
>>for every row, we read into this table.
>>
>> Create a keyspace with replication configured such that RF=N?
>>
>> =Rob
>

--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

You could create a bunch of 1 node DCs if you really wanted it.


On Fri, Sep 13, 2013 at 12:29 PM, Hiller, Dean <De...@nrel.gov> wrote:

> Actually, I have been on a few projects where something like that is
> useful.  Gemfire(a grid memory cache) had that feature which we used at
> another company.  On every project I encounter, there is usually one small
> table somewhereŠ.either meta data or something that is infrequently
> changing and nice to duplicate on every node.  I bet eventually nosql
> stores may start to add it maybe in a few years, but I guess we are not
> there yet.
>
> Thanks,
> Dean
>
> On 9/13/13 12:24 PM, "Jon Haddad" <jo...@jonhaddad.com> wrote:
>
> >It sounds some something that's only useful in a really limited use case.
> > In an 11 node cluster it would be quorum reads / writes would need to
> >come from 6 nodes.  It would probably be much slower for both reads &
> >writes.
> >
> >It sounds like what you want is a database with replication, not
> >partitioning.
> >
> >On Sep 13, 2013, at 11:15 AM, "Hiller, Dean" <De...@nrel.gov>
> wrote:
> >
> >> When I add nodes though, I would kind of be screwed there, right?  Is
> >>there an RF=${nodecount}Šthat would be neat.
> >>
> >> Dean
> >>
> >> From: Robert Coli <rc...@eventbrite.com>>
> >> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
> >><us...@cassandra.apache.org>>
> >> Date: Friday, September 13, 2013 12:06 PM
> >> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
> >><us...@cassandra.apache.org>>
> >> Subject: Re: is there any type of table existing on all nodes(slow to
> >>up date, fast to read in map/reduce)?
> >>
> >> On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean
> >><De...@nrel.gov>> wrote:
> >> I was just wondering if cassandra had any special CF that every row
> >>exists on every node for smaller tables that we would want to leverage
> >>in map/reduce.  The table row count is less than 500k and we are ok with
> >>slow updates to the table, but this would make M/R blazingly fast since
> >>for every row, we read into this table.
> >>
> >> Create a keyspace with replication configured such that RF=N?
> >>
> >> =Rob
> >
>
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

Posted by "Hiller, Dean" <De...@nrel.gov>.

Actually, I have been on a few projects where something like that is
useful.  Gemfire(a grid memory cache) had that feature which we used at
another company.  On every project I encounter, there is usually one small
table somewhereŠ.either meta data or something that is infrequently
changing and nice to duplicate on every node.  I bet eventually nosql
stores may start to add it maybe in a few years, but I guess we are not
there yet.

Thanks,
Dean

On 9/13/13 12:24 PM, "Jon Haddad" <jo...@jonhaddad.com> wrote:

>It sounds some something that's only useful in a really limited use case.
> In an 11 node cluster it would be quorum reads / writes would need to
>come from 6 nodes.  It would probably be much slower for both reads &
>writes. 
>
>It sounds like what you want is a database with replication, not
>partitioning.
>
>On Sep 13, 2013, at 11:15 AM, "Hiller, Dean" <De...@nrel.gov> wrote:
>
>> When I add nodes though, I would kind of be screwed there, right?  Is
>>there an RF=${nodecount}Šthat would be neat.
>> 
>> Dean
>> 
>> From: Robert Coli <rc...@eventbrite.com>>
>> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
>><us...@cassandra.apache.org>>
>> Date: Friday, September 13, 2013 12:06 PM
>> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
>><us...@cassandra.apache.org>>
>> Subject: Re: is there any type of table existing on all nodes(slow to
>>up date, fast to read in map/reduce)?
>> 
>> On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean
>><De...@nrel.gov>> wrote:
>> I was just wondering if cassandra had any special CF that every row
>>exists on every node for smaller tables that we would want to leverage
>>in map/reduce.  The table row count is less than 500k and we are ok with
>>slow updates to the table, but this would make M/R blazingly fast since
>>for every row, we read into this table.
>> 
>> Create a keyspace with replication configured such that RF=N?
>> 
>> =Rob
>

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

Posted by Jon Haddad <jo...@jonhaddad.com>.

It sounds some something that's only useful in a really limited use case.  In an 11 node cluster it would be quorum reads / writes would need to come from 6 nodes.  It would probably be much slower for both reads & writes. 

It sounds like what you want is a database with replication, not partitioning.

On Sep 13, 2013, at 11:15 AM, "Hiller, Dean" <De...@nrel.gov> wrote:

> When I add nodes though, I would kind of be screwed there, right?  Is there an RF=${nodecount}…that would be neat.
> 
> Dean
> 
> From: Robert Coli <rc...@eventbrite.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Date: Friday, September 13, 2013 12:06 PM
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?
> 
> On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean <De...@nrel.gov>> wrote:
> I was just wondering if cassandra had any special CF that every row exists on every node for smaller tables that we would want to leverage in map/reduce.  The table row count is less than 500k and we are ok with slow updates to the table, but this would make M/R blazingly fast since for every row, we read into this table.
> 
> Create a keyspace with replication configured such that RF=N?
> 
> =Rob

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

Posted by "Hiller, Dean" <De...@nrel.gov>.

When I add nodes though, I would kind of be screwed there, right?  Is there an RF=${nodecount}…that would be neat.

Dean

From: Robert Coli <rc...@eventbrite.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday, September 13, 2013 12:06 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean <De...@nrel.gov>> wrote:
I was just wondering if cassandra had any special CF that every row exists on every node for smaller tables that we would want to leverage in map/reduce.  The table row count is less than 500k and we are ok with slow updates to the table, but this would make M/R blazingly fast since for every row, we read into this table.

Create a keyspace with replication configured such that RF=N?

=Rob

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean <De...@nrel.gov> wrote:

> I was just wondering if cassandra had any special CF that every row exists
> on every node for smaller tables that we would want to leverage in
> map/reduce.  The table row count is less than 500k and we are ok with slow
> updates to the table, but this would make M/R blazingly fast since for
> every row, we read into this table.
>

Create a keyspace with replication configured such that RF=N?

=Rob