You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sam Tunnicliffe (JIRA)" <ji...@apache.org> on 2015/06/01 13:37:17 UTC

[jira] [Commented] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)

    [ https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567188#comment-14567188 ] 

Sam Tunnicliffe commented on CASSANDRA-8609:
--------------------------------------------

I don't think we can completely remove the dependency on internal classes in this way as it would remove the ability to write M/R jobs which use timestamp and ttl. While it doesn't break any of the bundled pig or hadoop examples, it's feasible for jobs out in the wild to be doing this. 

I think the right thing to do is to create a new simple class in the {{org.apache.cassandra.hadoop}} package to represent a column (much like the old {{org.apache.cassandra.db.Column}} from 2.0) and use that throughout the thrift side of the hadoop integration. The {{ColumnFamilyRecordReader#unthriftifyX}} methods should then be translating from the thrift classes into these new simple columns.

Also, the utility of {{AbstractCassandraStorage}} isn't clear to me. {{CassandraStorage}} doesn't extend it and I can't find any reference to it in the project at all (i.e. it isn't being tested/exercised by any of the demos as far as I can tell). Is there any reason why users writing their own {{LoadStoreFunc}} would choose to extend {{ACS}} rather than {{CS}}. At the very least, shouldn't it be marked deprecated like {{CS}}?

> Remove depency of hadoop to internals (Cell/CellName)
> -----------------------------------------------------
>
>                 Key: CASSANDRA-8609
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Philip Thompson
>             Fix For: 2.2.0 rc1
>
>         Attachments: 8609-2.2-2.txt, 8609-2.2.txt, CASSANDRA-8609-3.0-branch.txt
>
>
> For some reason most of the Hadoop code (ColumnFamilyRecordReader, CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency is entirely artificial: all this code is really client code that communicate with Cassandra over thrift/native protocol and there is thus no reason for it to use internal classes. And in fact, thoses classes are used in a very crude way, as a {{Pair<ByteBuffer, ByteBuffer>}} really.
> But this dependency is really painful when we make changes to the internals. Further, every time we do so, I believe we break some of those the APIs due to the change. This has been painful for CASSANDRA-5417 and this is now painful for CASSANDRA-8099. But while I somewhat hack over it in CASSANDRA-5417, this was a mistake and we should have removed the depency back then. So let do that now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)