You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Bryan Tower (JIRA)" <ji...@apache.org> on 2010/06/25 08:44:50 UTC

[jira] Created: (CASSANDRA-1227) Input and Output column families should be configured independently

Input and Output column families should be configured independently
-------------------------------------------------------------------

Key: CASSANDRA-1227
URL: https://issues.apache.org/jira/browse/CASSANDRA-1227
Project: Cassandra
Issue Type: Improvement
Components: Hadoop
Affects Versions: 0.7
Reporter: Bryan Tower
Fix For: 0.7

I would like to use a ColumnFamilyInputFormat and a ColumnFamilyRecordReader to map a bunch of data from Cassandra to a job and then I would like to do some operations on the data and in the Reducer write out some summary of the work that I have done. Both the ColumnFamilyInputFormat and the ColumnFamilyOutputFormat read the column family from the same configuration property in the job configuration object (they both use the ConfigHelper.COLUMNFAMILY_CONFIG property). This means that I can not read from one Cassandra column family and write out to different one in the same job with the existing code.

I changed the ColumnFamilyOutputFormat to read from "cassandra.output.columnfamily" instead of the "cassandra.input.columnfamily" that it was using before.

I changed the COLUMNFAMILY_CONFIG property and related methods to include the word input. I also added corresponding Output versions of each of the relevant properties that should be configured for the ColumnFamilyOutputFormat.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CASSANDRA-1227) Input and Output column families should be configured independently

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna reassigned CASSANDRA-1227:
---------------------------------------

    Assignee: Jeremy Hanna

> Input and Output column families should be configured independently
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-1227
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1227
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.7
>            Reporter: Bryan Tower
>            Assignee: Jeremy Hanna
>             Fix For: 0.7
>
>         Attachments: trunk-1227.txt
>
>
> I would like to use a ColumnFamilyInputFormat  and a ColumnFamilyRecordReader to map a bunch of data from Cassandra to a job and then I would like to do some operations on the data and in the Reducer write out some summary of the work that I have done.  Both the ColumnFamilyInputFormat and the ColumnFamilyOutputFormat read the column family from the same configuration property in the job configuration object (they both use the ConfigHelper.COLUMNFAMILY_CONFIG property).  This means that I can not read from one Cassandra column family and write out to different one in the same job with the existing code.
> I changed the ColumnFamilyOutputFormat to read from "cassandra.output.columnfamily" instead of the "cassandra.input.columnfamily" that it was using before.
> I changed the COLUMNFAMILY_CONFIG property and related methods to include the word input.  I also added corresponding Output versions of each of the relevant properties that should be configured for the ColumnFamilyOutputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1227) Input and Output column families should be configured independently

Posted by "Bryan Tower (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Tower updated CASSANDRA-1227:
-----------------------------------

    Attachment: trunk-1227.txt

This patch file includes all of the changes needed to allow the ColumnInputFormat and the ColumnOutputFormat to be configured independently.

> Input and Output column families should be configured independently
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-1227
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1227
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.7
>            Reporter: Bryan Tower
>             Fix For: 0.7
>
>         Attachments: trunk-1227.txt
>
>
> I would like to use a ColumnFamilyInputFormat  and a ColumnFamilyRecordReader to map a bunch of data from Cassandra to a job and then I would like to do some operations on the data and in the Reducer write out some summary of the work that I have done.  Both the ColumnFamilyInputFormat and the ColumnFamilyOutputFormat read the column family from the same configuration property in the job configuration object (they both use the ConfigHelper.COLUMNFAMILY_CONFIG property).  This means that I can not read from one Cassandra column family and write out to different one in the same job with the existing code.
> I changed the ColumnFamilyOutputFormat to read from "cassandra.output.columnfamily" instead of the "cassandra.input.columnfamily" that it was using before.
> I changed the COLUMNFAMILY_CONFIG property and related methods to include the word input.  I also added corresponding Output versions of each of the relevant properties that should be configured for the ColumnFamilyOutputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1227) Input and Output column families should be configured independently

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884919#action_12884919 ] 

Hudson commented on CASSANDRA-1227:
-----------------------------------

Integrated in Cassandra #484 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/484/])
    allow Hadoop output to go to a different KS + CF than input.  patch by Bryan Tower; reviewed by jbellis for CASSANDRA-1227


> Input and Output column families should be configured independently
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-1227
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1227
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.7
>            Reporter: Bryan Tower
>            Assignee: Bryan Tower
>             Fix For: 0.7
>
>         Attachments: trunk-1227.txt
>
>
> I would like to use a ColumnFamilyInputFormat  and a ColumnFamilyRecordReader to map a bunch of data from Cassandra to a job and then I would like to do some operations on the data and in the Reducer write out some summary of the work that I have done.  Both the ColumnFamilyInputFormat and the ColumnFamilyOutputFormat read the column family from the same configuration property in the job configuration object (they both use the ConfigHelper.COLUMNFAMILY_CONFIG property).  This means that I can not read from one Cassandra column family and write out to different one in the same job with the existing code.
> I changed the ColumnFamilyOutputFormat to read from "cassandra.output.columnfamily" instead of the "cassandra.input.columnfamily" that it was using before.
> I changed the COLUMNFAMILY_CONFIG property and related methods to include the word input.  I also added corresponding Output versions of each of the relevant properties that should be configured for the ColumnFamilyOutputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1227) Input and Output column families should be configured independently

Posted by "Karthick Sankarachary (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882742#action_12882742 ] 

Karthick Sankarachary commented on CASSANDRA-1227:
--------------------------------------------------

Hi Bryan,

It totally makes sense to have the input and output configuration properties be mutually disjoint. It was an oversight on my part to not separate out the properties for the input and output formats to begin with.

To build on your suggestion, how about adding methods to the ColumnFamilyInputFormat (ColumnFamilyOutputFormat) to allow users to set those properties programmatically, just for the sake of convenience (a la SequenceFileInputFormat (SequenceFileOutputFormat))? For example, to set the column family for the input format, one could provide a ColumnFamilyInputFormat.setColumnFamily(job, columnFamily) method, which would simply translate that call to job.getConfiguration().set(ConfigHelper.INPUT_COLUMNFAMILY_CONFIG, columnFamily).

In addition, if you don't mind, can you add a test case for the use case described above, if it doesn't involve too much configuration?

Regards,
Karthick

> Input and Output column families should be configured independently
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-1227
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1227
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.7
>            Reporter: Bryan Tower
>             Fix For: 0.7
>
>         Attachments: trunk-1227.txt
>
>
> I would like to use a ColumnFamilyInputFormat  and a ColumnFamilyRecordReader to map a bunch of data from Cassandra to a job and then I would like to do some operations on the data and in the Reducer write out some summary of the work that I have done.  Both the ColumnFamilyInputFormat and the ColumnFamilyOutputFormat read the column family from the same configuration property in the job configuration object (they both use the ConfigHelper.COLUMNFAMILY_CONFIG property).  This means that I can not read from one Cassandra column family and write out to different one in the same job with the existing code.
> I changed the ColumnFamilyOutputFormat to read from "cassandra.output.columnfamily" instead of the "cassandra.input.columnfamily" that it was using before.
> I changed the COLUMNFAMILY_CONFIG property and related methods to include the word input.  I also added corresponding Output versions of each of the relevant properties that should be configured for the ColumnFamilyOutputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1227) Input and Output column families should be configured independently

Posted by "Bryan Tower (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Tower updated CASSANDRA-1227:
-----------------------------------

    Comment: was deleted

(was: This patch file includes all of the changes needed to allow the ColumnInputFormat and the ColumnOutputFormat to be configured independently.)

> Input and Output column families should be configured independently
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-1227
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1227
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.7
>            Reporter: Bryan Tower
>             Fix For: 0.7
>
>         Attachments: trunk-1227.txt
>
>
> I would like to use a ColumnFamilyInputFormat  and a ColumnFamilyRecordReader to map a bunch of data from Cassandra to a job and then I would like to do some operations on the data and in the Reducer write out some summary of the work that I have done.  Both the ColumnFamilyInputFormat and the ColumnFamilyOutputFormat read the column family from the same configuration property in the job configuration object (they both use the ConfigHelper.COLUMNFAMILY_CONFIG property).  This means that I can not read from one Cassandra column family and write out to different one in the same job with the existing code.
> I changed the ColumnFamilyOutputFormat to read from "cassandra.output.columnfamily" instead of the "cassandra.input.columnfamily" that it was using before.
> I changed the COLUMNFAMILY_CONFIG property and related methods to include the word input.  I also added corresponding Output versions of each of the relevant properties that should be configured for the ColumnFamilyOutputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.