You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jeremy Hanna (JIRA)" <ji...@apache.org> on 2011/06/20 20:16:48 UTC

[jira] [Created] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
----------------------------------------------------------------------------------------

                 Key: CASSANDRA-2799
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Jeremy Hanna
            Assignee: Jeremy Hanna
            Priority: Minor
             Fix For: 0.7.7, 0.8.2


For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.

I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131530#comment-13131530 ] 

Steeve Morin commented on CASSANDRA-2799:
-----------------------------------------

Please also note that you can use ColumnFamilyInputFormat2 also when you're using the new Hadoop API, as it extends ColumnFamilyInputFormat.
                
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2799:
--------------------------------------

    Fix Version/s:     (was: 0.7.9)

> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steeve Morin updated CASSANDRA-2799:
------------------------------------

    Comment: was deleted

(was: Please also note that you can use ColumnFamilyInputFormat2 also when you're using the new Hadoop API, as it extends ColumnFamilyInputFormat.)
    
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steeve Morin updated CASSANDRA-2799:
------------------------------------

    Comment: was deleted

(was: Patch submitted.)
    
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2799:
--------------------------------------

    Reviewer: brandon.williams  (was: jeromatron)
    Assignee: Steeve Morin  (was: Jeremy Hanna)
    
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Steeve Morin
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131527#comment-13131527 ] 

Steeve Morin commented on CASSANDRA-2799:
-----------------------------------------

Also, I'm not sure wether the stuff in ColumnFamilyInputFormat2.getSplits() regarding the TaskAttemptID() is correct. Feedback welcomed!
                
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131532#comment-13131532 ] 

Steeve Morin edited comment on CASSANDRA-2799 at 10/20/11 11:43 AM:
--------------------------------------------------------------------

I do think it would be actually better to just patch the original ColumnFamilyInputFormat tho... In the same flavor ColumnFamilyOutputFormat already is!
                
      was (Author: steeve):
    I do think it would be actually better to just patch the original ColumnFamilyInputFormat tho...
                  
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132065#comment-13132065 ] 

Brandon Williams commented on CASSANDRA-2799:
---------------------------------------------

bq. I do think it would be actually better to just patch the original ColumnFamilyInputFormat tho... In the same flavor ColumnFamilyOutputFormat already is!

If you want to do it, +1
                
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Steeve Morin
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131532#comment-13131532 ] 

Steeve Morin commented on CASSANDRA-2799:
-----------------------------------------

I do think it would be actually better to just patch the original ColumnFamilyInputFormat tho...
                
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steeve Morin updated CASSANDRA-2799:
------------------------------------

    Attachment: ColumnFamilySplit2.java
                ColumnFamilyRecordReader2.java
                ColumnFamilyInputFormat2.java

This is a version of the old Hadoop API basically just "wrapping" the new. Please note however that the row key as a fixed size.

This is due to the old hadoop api wanting values by "writing" to them. This however, can be changed in the job conf by setting: cassandra.hadoop.max_key_size or ColumnFamilyInputFormat2.CASSANDRA_HADOOP_MAX_KEY_SIZE
                
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131525#comment-13131525 ] 

Steeve Morin edited comment on CASSANDRA-2799 at 10/20/11 11:40 AM:
--------------------------------------------------------------------

This is a version of the old Hadoop API basically just "wrapping" the new. Please note however that the row key as a fixed size.

This is due to the old hadoop api wanting values by "writing" to them. This however, can be changed in the job conf by setting: cassandra.hadoop.max_key_size or ColumnFamilyInputFormat2.CASSANDRA_HADOOP_MAX_KEY_SIZE.

Also, due to that, expect a small penalty hit. Albeit minimal.

Also, I'm not sure wether the stuff in ColumnFamilyInputFormat2.getSplits() regarding the TaskAttemptID() is correct. Feedback welcomed!


                
      was (Author: steeve):
    This is a version of the old Hadoop API basically just "wrapping" the new. Please note however that the row key as a fixed size.

This is due to the old hadoop api wanting values by "writing" to them. This however, can be changed in the job conf by setting: cassandra.hadoop.max_key_size or ColumnFamilyInputFormat2.CASSANDRA_HADOOP_MAX_KEY_SIZE.

Also, due to that, expect a small penalty hit. Albeit minimal.
                  
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131525#comment-13131525 ] 

Steeve Morin edited comment on CASSANDRA-2799 at 10/20/11 11:41 AM:
--------------------------------------------------------------------

This is a version of the old Hadoop API basically just "wrapping" the new. Please note however that the row key as a fixed size.

This is due to the old hadoop api wanting values by "writing" to them. This however, can be changed in the job conf by setting: cassandra.hadoop.max_key_size or ColumnFamilyInputFormat2.CASSANDRA_HADOOP_MAX_KEY_SIZE.

Also, due to that, expect a small penalty hit. Albeit minimal.

Also, I'm not sure wether the stuff in ColumnFamilyInputFormat2.getSplits() regarding the TaskAttemptID() is correct. Feedback welcomed!

Please also note that ColumnFamilyInputFormat2 only extends ColumnFamilyInputFormat, which means you can use it with the new Hadoop API also.
                
      was (Author: steeve):
    This is a version of the old Hadoop API basically just "wrapping" the new. Please note however that the row key as a fixed size.

This is due to the old hadoop api wanting values by "writing" to them. This however, can be changed in the job conf by setting: cassandra.hadoop.max_key_size or ColumnFamilyInputFormat2.CASSANDRA_HADOOP_MAX_KEY_SIZE.

Also, due to that, expect a small penalty hit. Albeit minimal.

Also, I'm not sure wether the stuff in ColumnFamilyInputFormat2.getSplits() regarding the TaskAttemptID() is correct. Feedback welcomed!


                  
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131525#comment-13131525 ] 

Steeve Morin edited comment on CASSANDRA-2799 at 10/20/11 11:32 AM:
--------------------------------------------------------------------

This is a version of the old Hadoop API basically just "wrapping" the new. Please note however that the row key as a fixed size.

This is due to the old hadoop api wanting values by "writing" to them. This however, can be changed in the job conf by setting: cassandra.hadoop.max_key_size or ColumnFamilyInputFormat2.CASSANDRA_HADOOP_MAX_KEY_SIZE.

Also, due to that, expect a small penalty hit. Albeit minimal.
                
      was (Author: steeve):
    This is a version of the old Hadoop API basically just "wrapping" the new. Please note however that the row key as a fixed size.

This is due to the old hadoop api wanting values by "writing" to them. This however, can be changed in the job conf by setting: cassandra.hadoop.max_key_size or ColumnFamilyInputFormat2.CASSANDRA_HADOOP_MAX_KEY_SIZE
                  
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steeve Morin updated CASSANDRA-2799:
------------------------------------

    Comment: was deleted

(was: Also, I'm not sure wether the stuff in ColumnFamilyInputFormat2.getSplits() regarding the TaskAttemptID() is correct. Feedback welcomed!)
    
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

Posted by "Steeve Morin (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steeve Morin updated CASSANDRA-2799:
------------------------------------

    Attachment: old_hadoop_v1.patch

Support old hadoop API by patching ColumnFamilyInputFormat, ColumnFamilyRecordReader and ColumnFamilyInputSplit.

PLEASE TEST !
                
> Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2799
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Jeremy Hanna
>            Assignee: Steeve Morin
>            Priority: Minor
>              Labels: hadoop
>         Attachments: ColumnFamilyInputFormat2.java, ColumnFamilyRecordReader2.java, ColumnFamilySplit2.java, old_hadoop_v1.patch
>
>
> For better compatibility with hadoop, I would like to add old style hadoop support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  We already have it in the output.  Oozie in particular handles the old style api better.  That is the motivation for us.  I already did this as part of my patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight schedule right now and I'll come back to this once we have a bit of breathing room.
> I think it would help with compatibility with other systems that rely on hadoop as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira