You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mayank Mishra <ma...@gmail.com> on 2011/02/28 11:10:58 UTC

ColumnFamilyRecordWriter

Hi all,

As I was integrating Hadoop with Cassandra, I wanted to serialize 
mutations, hence I used thrift mutations in M/R jobs.

During the course, I came to know that CFRW considers only Avro 
mutations. Can someone please explain me why only avro transport is 
entertained by CFRW. Why not, both thrift and avro mutations are 
considered?

Please let me know if I missed some important point.

With regards,
Mayank

Re: ColumnFamilyRecordWriter

Posted by Mayank Mishra <ma...@gmail.com>.
Thanks Jeremy,

It make sense to abstract out CFOF and CFRW (right now it's tightly 
bounded to avro), so that one can plugin custom serializer (avro, thrift 
and going forward I guess may be CQL). I will create a JIRA and submit 
the patch with do the needful changes. Surely, I will ping you if I 
require help.

With regards,
Mayank


On 28-02-2011 23:07, Jeremy Hanna wrote:
> One thing that could be done is the CFRW could be abstracted more so that it's easier to extend and only the serialization mechanism is required to extend it.  That is, all of the core functionality relating to Cassandra would be in an abstract class or something like that.  Then the avro based one could extend that with things specific to avro.  That way people could write their own CFRW extension with whatever serialization they chose.  Anyway, that seems reasonable, but would take some work - if you'd like to look at that, I could help as I had time.
>
> On Feb 28, 2011, at 10:19 AM, Jeremy Hanna wrote:
>
>> There certainly could be a thrift based record writer.  However, (if I remember correctly) to enable Hadoop output streaming, it was easier to go with Avro for doing the records as the schema is included.  There could also have been a thrift version of the record writer, but it's simpler to just have one record writer.  That was the decision process at least.
>>
>> If there is a compelling reason or a lot of demand for a thrift based one, maybe it could be revisited - though I'm not the one making that decision.
>>
>> On Feb 28, 2011, at 4:10 AM, Mayank Mishra wrote:
>>
>>> Hi all,
>>>
>>> As I was integrating Hadoop with Cassandra, I wanted to serialize mutations, hence I used thrift mutations in M/R jobs.
>>>
>>> During the course, I came to know that CFRW considers only Avro mutations. Can someone please explain me why only avro transport is entertained by CFRW. Why not, both thrift and avro mutations are considered?
>>>
>>> Please let me know if I missed some important point.
>>>
>>> With regards,
>>> Mayank


Re: ColumnFamilyRecordWriter

Posted by Jeremy Hanna <je...@gmail.com>.
One thing that could be done is the CFRW could be abstracted more so that it's easier to extend and only the serialization mechanism is required to extend it.  That is, all of the core functionality relating to Cassandra would be in an abstract class or something like that.  Then the avro based one could extend that with things specific to avro.  That way people could write their own CFRW extension with whatever serialization they chose.  Anyway, that seems reasonable, but would take some work - if you'd like to look at that, I could help as I had time.

On Feb 28, 2011, at 10:19 AM, Jeremy Hanna wrote:

> There certainly could be a thrift based record writer.  However, (if I remember correctly) to enable Hadoop output streaming, it was easier to go with Avro for doing the records as the schema is included.  There could also have been a thrift version of the record writer, but it's simpler to just have one record writer.  That was the decision process at least.
> 
> If there is a compelling reason or a lot of demand for a thrift based one, maybe it could be revisited - though I'm not the one making that decision.
> 
> On Feb 28, 2011, at 4:10 AM, Mayank Mishra wrote:
> 
>> Hi all,
>> 
>> As I was integrating Hadoop with Cassandra, I wanted to serialize mutations, hence I used thrift mutations in M/R jobs.
>> 
>> During the course, I came to know that CFRW considers only Avro mutations. Can someone please explain me why only avro transport is entertained by CFRW. Why not, both thrift and avro mutations are considered?
>> 
>> Please let me know if I missed some important point.
>> 
>> With regards,
>> Mayank
> 


Re: ColumnFamilyRecordWriter

Posted by Jeremy Hanna <je...@gmail.com>.
There certainly could be a thrift based record writer.  However, (if I remember correctly) to enable Hadoop output streaming, it was easier to go with Avro for doing the records as the schema is included.  There could also have been a thrift version of the record writer, but it's simpler to just have one record writer.  That was the decision process at least.

If there is a compelling reason or a lot of demand for a thrift based one, maybe it could be revisited - though I'm not the one making that decision.

On Feb 28, 2011, at 4:10 AM, Mayank Mishra wrote:

> Hi all,
> 
> As I was integrating Hadoop with Cassandra, I wanted to serialize mutations, hence I used thrift mutations in M/R jobs.
> 
> During the course, I came to know that CFRW considers only Avro mutations. Can someone please explain me why only avro transport is entertained by CFRW. Why not, both thrift and avro mutations are considered?
> 
> Please let me know if I missed some important point.
> 
> With regards,
> Mayank