You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Karthick Sankarachary (JIRA)" <ji...@apache.org> on 2010/06/26 00:21:50 UTC

[jira] Commented: (CASSANDRA-1227) Input and Output column families should be configured independently

    [ https://issues.apache.org/jira/browse/CASSANDRA-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882742#action_12882742 ] 

Karthick Sankarachary commented on CASSANDRA-1227:
--------------------------------------------------

Hi Bryan,

It totally makes sense to have the input and output configuration properties be mutually disjoint. It was an oversight on my part to not separate out the properties for the input and output formats to begin with.

To build on your suggestion, how about adding methods to the ColumnFamilyInputFormat (ColumnFamilyOutputFormat) to allow users to set those properties programmatically, just for the sake of convenience (a la SequenceFileInputFormat (SequenceFileOutputFormat))? For example, to set the column family for the input format, one could provide a ColumnFamilyInputFormat.setColumnFamily(job, columnFamily) method, which would simply translate that call to job.getConfiguration().set(ConfigHelper.INPUT_COLUMNFAMILY_CONFIG, columnFamily).

In addition, if you don't mind, can you add a test case for the use case described above, if it doesn't involve too much configuration?

Regards,
Karthick

> Input and Output column families should be configured independently
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-1227
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1227
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.7
>            Reporter: Bryan Tower
>             Fix For: 0.7
>
>         Attachments: trunk-1227.txt
>
>
> I would like to use a ColumnFamilyInputFormat  and a ColumnFamilyRecordReader to map a bunch of data from Cassandra to a job and then I would like to do some operations on the data and in the Reducer write out some summary of the work that I have done.  Both the ColumnFamilyInputFormat and the ColumnFamilyOutputFormat read the column family from the same configuration property in the job configuration object (they both use the ConfigHelper.COLUMNFAMILY_CONFIG property).  This means that I can not read from one Cassandra column family and write out to different one in the same job with the existing code.
> I changed the ColumnFamilyOutputFormat to read from "cassandra.output.columnfamily" instead of the "cassandra.input.columnfamily" that it was using before.
> I changed the COLUMNFAMILY_CONFIG property and related methods to include the word input.  I also added corresponding Output versions of each of the relevant properties that should be configured for the ColumnFamilyOutputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.