You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sam Tunnicliffe (JIRA)" <ji...@apache.org> on 2015/06/01 13:37:17 UTC
[jira] [Commented] (CASSANDRA-8609) Remove depency of hadoop to
internals (Cell/CellName)
[ https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567188#comment-14567188 ]
Sam Tunnicliffe commented on CASSANDRA-8609:
--------------------------------------------
I don't think we can completely remove the dependency on internal classes in this way as it would remove the ability to write M/R jobs which use timestamp and ttl. While it doesn't break any of the bundled pig or hadoop examples, it's feasible for jobs out in the wild to be doing this.
I think the right thing to do is to create a new simple class in the {{org.apache.cassandra.hadoop}} package to represent a column (much like the old {{org.apache.cassandra.db.Column}} from 2.0) and use that throughout the thrift side of the hadoop integration. The {{ColumnFamilyRecordReader#unthriftifyX}} methods should then be translating from the thrift classes into these new simple columns.
Also, the utility of {{AbstractCassandraStorage}} isn't clear to me. {{CassandraStorage}} doesn't extend it and I can't find any reference to it in the project at all (i.e. it isn't being tested/exercised by any of the demos as far as I can tell). Is there any reason why users writing their own {{LoadStoreFunc}} would choose to extend {{ACS}} rather than {{CS}}. At the very least, shouldn't it be marked deprecated like {{CS}}?
> Remove depency of hadoop to internals (Cell/CellName)
> -----------------------------------------------------
>
> Key: CASSANDRA-8609
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8609
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Assignee: Philip Thompson
> Fix For: 2.2.0 rc1
>
> Attachments: 8609-2.2-2.txt, 8609-2.2.txt, CASSANDRA-8609-3.0-branch.txt
>
>
> For some reason most of the Hadoop code (ColumnFamilyRecordReader, CqlStorage, ...) uses the {{Cell}} and {{CellName}} classes. That dependency is entirely artificial: all this code is really client code that communicate with Cassandra over thrift/native protocol and there is thus no reason for it to use internal classes. And in fact, thoses classes are used in a very crude way, as a {{Pair<ByteBuffer, ByteBuffer>}} really.
> But this dependency is really painful when we make changes to the internals. Further, every time we do so, I believe we break some of those the APIs due to the change. This has been painful for CASSANDRA-5417 and this is now painful for CASSANDRA-8099. But while I somewhat hack over it in CASSANDRA-5417, this was a mistake and we should have removed the depency back then. So let do that now.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)