You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Samarth Gahire (Created) (JIRA)" <ji...@apache.org> on 2012/02/17 14:53:59 UTC

[jira] [Created] (CASSANDRA-3928) Bulk loading to cassandra with Python Hadoop Job.

Bulk loading to cassandra with Python Hadoop Job.
-------------------------------------------------

                 Key: CASSANDRA-3928
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3928
             Project: Cassandra
          Issue Type: New Feature
          Components: Hadoop, Tools
    Affects Versions: 1.2
            Reporter: Samarth Gahire
            Assignee: Brandon Williams
            Priority: Minor
             Fix For: 1.2


I was wondering if we can have a OutPutFormat to load the data to Cassandra with Hadoop Job Written in Python.
I am having very complex Hadoop job written in Python which processes test data and generate structured data in sequential file. I read this data and stream it to cassandra using BulkOutPutFormat.
Is there any way that I can avoid writing to sequential file and directly process and stream data to Cassandra(With Hadoop Job written in python)?
What could be a possible solution for same?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-3928) Bulk loading to cassandra with Python Hadoop Job.

Posted by "Brandon Williams (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams resolved CASSANDRA-3928.
-----------------------------------------

    Resolution: Won't Fix
      Reviewer:   (was: jbellis)

The only way to do this short of reimplementing everything in python would be to use jython to write the sstables via BOF and stream them in.  Alternatively, you could insert the data via thrift from cpython.
                
> Bulk loading to cassandra with Python Hadoop Job.
> -------------------------------------------------
>
>                 Key: CASSANDRA-3928
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3928
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop, Tools
>    Affects Versions: 1.2
>            Reporter: Samarth Gahire
>            Assignee: Brandon Williams
>            Priority: Minor
>              Labels: bulkloader, hadoop, python, sstableloader
>             Fix For: 1.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I was wondering if we can have a OutPutFormat to Bulkload the data to Cassandra with Hadoop Job Written in Python.
> I am having very complex Hadoop job written in Python which processes test data and generate structured data in sequential file. I read this data and stream it to cassandra using BulkOutPutFormat.
> Is there any way that I can avoid writing to sequential file and directly process and stream data to Cassandra(With Hadoop Job written in python)?
> What could be a possible solution for same?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3928) Bulk loading to cassandra with Python Hadoop Job.

Posted by "Samarth Gahire (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Samarth Gahire updated CASSANDRA-3928:
--------------------------------------

    Description: 
I was wondering if we can have a OutPutFormat to Bulkload the data to Cassandra with Hadoop Job Written in Python.
I am having very complex Hadoop job written in Python which processes test data and generate structured data in sequential file. I read this data and stream it to cassandra using BulkOutPutFormat.
Is there any way that I can avoid writing to sequential file and directly process and stream data to Cassandra(With Hadoop Job written in python)?
What could be a possible solution for same?

  was:
I was wondering if we can have a OutPutFormat to load the data to Cassandra with Hadoop Job Written in Python.
I am having very complex Hadoop job written in Python which processes test data and generate structured data in sequential file. I read this data and stream it to cassandra using BulkOutPutFormat.
Is there any way that I can avoid writing to sequential file and directly process and stream data to Cassandra(With Hadoop Job written in python)?
What could be a possible solution for same?

    
> Bulk loading to cassandra with Python Hadoop Job.
> -------------------------------------------------
>
>                 Key: CASSANDRA-3928
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3928
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop, Tools
>    Affects Versions: 1.2
>            Reporter: Samarth Gahire
>            Assignee: Brandon Williams
>            Priority: Minor
>              Labels: bulkloader, hadoop, python, sstableloader
>             Fix For: 1.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I was wondering if we can have a OutPutFormat to Bulkload the data to Cassandra with Hadoop Job Written in Python.
> I am having very complex Hadoop job written in Python which processes test data and generate structured data in sequential file. I read this data and stream it to cassandra using BulkOutPutFormat.
> Is there any way that I can avoid writing to sequential file and directly process and stream data to Cassandra(With Hadoop Job written in python)?
> What could be a possible solution for same?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira