You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Utku Can Topçu <ut...@topcu.gen.tr> on 2010/10/04 12:12:14 UTC

A proposed use case, any comments and experience is appreciated

Hey All,

I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys are
formed in such a fashion that, they are indexed in descending order by time.
So I'll be analyzing the data for every hour iteratively.

Since the current Hadoop integration does not support partial columnfamily
analysis. I feel that, I'll need to dump the data of the last hour and put
it to the hadoop cluster and do my analysis on the flat text file.
Do you think of any other "better" way of getting the data of a keyrange
into a hadoop cluster for analysis?

Regards,

Utku

Re: A proposed use case, any comments and experience is appreciated

Posted by Utku Can Topçu <ut...@topcu.gen.tr>.
What I can understand from "behaving like a deleted" column is
- They'll be there for at most GCGraceSeconds?

On Mon, Oct 4, 2010 at 3:51 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Expiring columns are 0.7 only.
>
> An expired column behaves like a deleted column until it is compacted away.
>
> On Mon, Oct 4, 2010 at 8:48 AM, Utku Can Topçu <ut...@topcu.gen.tr> wrote:
> > Hi Jonathan,
> >
> > Thank you for mentioning about the expiring columns issue. I didn't know
> > that it had existed.
> > That's really great news.
> > First of all, does the current 0.6 branch support it? If not so, is the
> > patch available for the 0.6.5 somehow?
> > And about the deletion issue, if all the columns in a row expire? When
> the
> > row will be deleted, will I be seeing the row in my map inputs somehow,
> and
> > for how long?
> >
> > Regards,
> > Utku
> >
> > On Mon, Oct 4, 2010 at 3:30 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> A simpler approach might be to insert expiring columns into a 2nd CF
> >> with a TTL of one hour.
> >>
> >> On Mon, Oct 4, 2010 at 5:12 AM, Utku Can Topçu <ut...@topcu.gen.tr>
> wrote:
> >> > Hey All,
> >> >
> >> > I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys
> >> > are
> >> > formed in such a fashion that, they are indexed in descending order by
> >> > time.
> >> > So I'll be analyzing the data for every hour iteratively.
> >> >
> >> > Since the current Hadoop integration does not support partial
> >> > columnfamily
> >> > analysis. I feel that, I'll need to dump the data of the last hour and
> >> > put
> >> > it to the hadoop cluster and do my analysis on the flat text file.
> >> > Do you think of any other "better" way of getting the data of a
> keyrange
> >> > into a hadoop cluster for analysis?
> >> >
> >> > Regards,
> >> >
> >> > Utku
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: A proposed use case, any comments and experience is appreciated

Posted by Jonathan Ellis <jb...@gmail.com>.
Expiring columns are 0.7 only.

An expired column behaves like a deleted column until it is compacted away.

On Mon, Oct 4, 2010 at 8:48 AM, Utku Can Topçu <ut...@topcu.gen.tr> wrote:
> Hi Jonathan,
>
> Thank you for mentioning about the expiring columns issue. I didn't know
> that it had existed.
> That's really great news.
> First of all, does the current 0.6 branch support it? If not so, is the
> patch available for the 0.6.5 somehow?
> And about the deletion issue, if all the columns in a row expire? When the
> row will be deleted, will I be seeing the row in my map inputs somehow, and
> for how long?
>
> Regards,
> Utku
>
> On Mon, Oct 4, 2010 at 3:30 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> A simpler approach might be to insert expiring columns into a 2nd CF
>> with a TTL of one hour.
>>
>> On Mon, Oct 4, 2010 at 5:12 AM, Utku Can Topçu <ut...@topcu.gen.tr> wrote:
>> > Hey All,
>> >
>> > I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys
>> > are
>> > formed in such a fashion that, they are indexed in descending order by
>> > time.
>> > So I'll be analyzing the data for every hour iteratively.
>> >
>> > Since the current Hadoop integration does not support partial
>> > columnfamily
>> > analysis. I feel that, I'll need to dump the data of the last hour and
>> > put
>> > it to the hadoop cluster and do my analysis on the flat text file.
>> > Do you think of any other "better" way of getting the data of a keyrange
>> > into a hadoop cluster for analysis?
>> >
>> > Regards,
>> >
>> > Utku
>> >
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: A proposed use case, any comments and experience is appreciated

Posted by Utku Can Topçu <ut...@topcu.gen.tr>.
Hi Jonathan,

Thank you for mentioning about the expiring columns issue. I didn't know
that it had existed.
That's really great news.
First of all, does the current 0.6 branch support it? If not so, is the
patch available for the 0.6.5 somehow?
And about the deletion issue, if all the columns in a row expire? When the
row will be deleted, will I be seeing the row in my map inputs somehow, and
for how long?

Regards,
Utku

On Mon, Oct 4, 2010 at 3:30 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> A simpler approach might be to insert expiring columns into a 2nd CF
> with a TTL of one hour.
>
> On Mon, Oct 4, 2010 at 5:12 AM, Utku Can Topçu <ut...@topcu.gen.tr> wrote:
> > Hey All,
> >
> > I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys are
> > formed in such a fashion that, they are indexed in descending order by
> time.
> > So I'll be analyzing the data for every hour iteratively.
> >
> > Since the current Hadoop integration does not support partial
> columnfamily
> > analysis. I feel that, I'll need to dump the data of the last hour and
> put
> > it to the hadoop cluster and do my analysis on the flat text file.
> > Do you think of any other "better" way of getting the data of a keyrange
> > into a hadoop cluster for analysis?
> >
> > Regards,
> >
> > Utku
> >
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: A proposed use case, any comments and experience is appreciated

Posted by Jonathan Ellis <jb...@gmail.com>.
A simpler approach might be to insert expiring columns into a 2nd CF
with a TTL of one hour.

On Mon, Oct 4, 2010 at 5:12 AM, Utku Can Topçu <ut...@topcu.gen.tr> wrote:
> Hey All,
>
> I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys are
> formed in such a fashion that, they are indexed in descending order by time.
> So I'll be analyzing the data for every hour iteratively.
>
> Since the current Hadoop integration does not support partial columnfamily
> analysis. I feel that, I'll need to dump the data of the last hour and put
> it to the hadoop cluster and do my analysis on the flat text file.
> Do you think of any other "better" way of getting the data of a keyrange
> into a hadoop cluster for analysis?
>
> Regards,
>
> Utku
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com