You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeremy Hanna (JIRA)" <ji...@apache.org> on 2010/07/30 01:37:16 UTC

[jira] Created: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputwriter and such

Have the word_count contrib example use the new baked in hadoop outputwriter and such
-------------------------------------------------------------------------------------

                 Key: CASSANDRA-1342
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
             Project: Cassandra
          Issue Type: Improvement
          Components: Contrib, Hadoop
            Reporter: Jeremy Hanna
            Assignee: Jeremy Hanna
            Priority: Minor


The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in outputwriter and such, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna updated CASSANDRA-1342:
------------------------------------

    Attachment: 0001-wordcount-output-to-cassandra.patch

Made the word count example output to Cassandra - in the Standard2 column family.

Removed o.a.c.hadoop.ColumnWritable because it was a relic that is no longer used.
Removed o.a.c.hadoop.ColumnFamilyOutputReducer because it's also no longer used.
Removed hadoop column family output unit tests as they exercise the removed classes.

Removed the clock struct in contrib/hadoop_streaming_output/bin/reducer.py
Removed the clock struct in contrib/py_stress/avro_stress.py

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0001-wordcount-output-to-cassandra.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898409#action_12898409 ] 

Stu Hood edited comment on CASSANDRA-1342 at 8/13/10 4:49 PM:
--------------------------------------------------------------

Actually, I don't think we should encourage direct use of ColumnFamilyRecordWriter: it's really intended to be a package protected class that the OutputFormat owns and uses.

EDIT: Nevermind me.

      was (Author: stuhood):
    Actually, I don't think we should encourage direct use of ColumnFamilyRecordWriter: it's really intended to be a package protected class that the OutputFormat owns and uses.
  
> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>            Reporter: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12911102#action_12911102 ] 

Jeremy Hanna commented on CASSANDRA-1342:
-----------------------------------------

yep - I'll get it working this next week.  Stu had already added a separate contrib module to demonstrate it in python (with streaming), but good to have this use it in Java as well.

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna reassigned CASSANDRA-1342:
---------------------------------------

    Assignee: Jeremy Hanna

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna updated CASSANDRA-1342:
------------------------------------

    Attachment:     (was: 0001-wordcount-output-to-cassandra.patch)

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna updated CASSANDRA-1342:
------------------------------------

    Attachment: 0001-Added-option-for-filesystem-cassandra-output.patch

Made changes to give users the option of outputting to the filesystem or to cassandra.

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0001-Added-option-for-filesystem-cassandra-output.patch, 0002-cleaned-up-clock-struct.patch, 0003-removed-unnecessary-files.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914252#action_12914252 ] 

Jeremy Hanna commented on CASSANDRA-1342:
-----------------------------------------

bq. 01: can we have an option to output to local FS still? handy for debugging.
Sure, I'll create the changes and re-post patch 1.

bq. 03: is the removed code basically a workaround for not having CFOF, is that why it's unnecessary?
>From what I understand ColumnWritable was no longer being used at all after the changes for output streaming were added.  So ColumnWritable was used by the other files that were deleted, including tests.  It looked like the tests were testing that code, so I removed those as well.  Stu may have a better answer since his code rendered ColumnWritable unused.

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0001-made-wordcount-output-to-cassandra.patch, 0002-cleaned-up-clock-struct.patch, 0003-removed-unnecessary-files.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12911089#action_12911089 ] 

Jonathan Ellis commented on CASSANDRA-1342:
-------------------------------------------

Green light on this now?

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915739#action_12915739 ] 

Hudson commented on CASSANDRA-1342:
-----------------------------------

Integrated in Cassandra #549 (See [https://hudson.apache.org/hudson/job/Cassandra/549/])
    removed unnecessary files.
patch by Jeremy Hanna; reviewed by Stu Hood for CASSANDRA-1342
Added option for filesystem/cassandra output.
patch by Jeremy Hanna; reviewed by Stu Hood for CASSANDRA-1342


> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0001-Added-option-for-filesystem-cassandra-output.patch, 0002-cleaned-up-clock-struct.patch, 0003-removed-unnecessary-files.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop ColumnFamilyRecordWriter and such

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898409#action_12898409 ] 

Stu Hood commented on CASSANDRA-1342:
-------------------------------------

Actually, I don't think we should encourage direct use of ColumnFamilyRecordWriter: it's really intended to be a package protected class that the OutputFormat owns and uses.

> Have the word_count contrib example use the new baked in hadoop ColumnFamilyRecordWriter and such
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>            Reporter: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in ColumnFamilyRecordWriter and such, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-1342.
---------------------------------------

      Reviewer: stuhood  (was: jbellis)
    Resolution: Fixed

committed

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0001-Added-option-for-filesystem-cassandra-output.patch, 0002-cleaned-up-clock-struct.patch, 0003-removed-unnecessary-files.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914242#action_12914242 ] 

Jonathan Ellis commented on CASSANDRA-1342:
-------------------------------------------

committed 02.

01: can we have an option to output to local FS still?  handy for debugging.
03: is the removed code basically a workaround for not having CFOF, is that why it's unnecessary?

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0001-made-wordcount-output-to-cassandra.patch, 0002-cleaned-up-clock-struct.patch, 0003-removed-unnecessary-files.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna updated CASSANDRA-1342:
------------------------------------

    Attachment:     (was: 0001-made-wordcount-output-to-cassandra.patch)

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0002-cleaned-up-clock-struct.patch, 0003-removed-unnecessary-files.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop ColumnFamilyRecordWriter and such

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna reassigned CASSANDRA-1342:
---------------------------------------

    Assignee:     (was: Jeremy Hanna)

> Have the word_count contrib example use the new baked in hadoop ColumnFamilyRecordWriter and such
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>            Reporter: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in ColumnFamilyRecordWriter and such, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna updated CASSANDRA-1342:
------------------------------------

        Summary: Have the word_count contrib example use the new baked in hadoop outputformat  (was: Have the word_count contrib example use the new baked in hadoop ColumnFamilyRecordWriter and such)
    Description: The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.  (was: The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in ColumnFamilyRecordWriter and such, based on CASSANDRA-1101.)

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>            Reporter: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputwriter and such

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna updated CASSANDRA-1342:
------------------------------------

    Fix Version/s: 0.7.0

Making a target of 0.7.0 as it would be nice to showcase this new integration.

> Have the word_count contrib example use the new baked in hadoop outputwriter and such
> -------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in outputwriter and such, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914449#action_12914449 ] 

Hudson commented on CASSANDRA-1342:
-----------------------------------

Integrated in Cassandra #545 (See [https://hudson.apache.org/hudson/job/Cassandra/545/])
    r/m clock struct from Streaming example.  patch by Jeremy Hanna; reviewed by jbellis for CASSANDRA-1342


> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0001-made-wordcount-output-to-cassandra.patch, 0002-cleaned-up-clock-struct.patch, 0003-removed-unnecessary-files.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915495#action_12915495 ] 

Stu Hood commented on CASSANDRA-1342:
-------------------------------------

+1

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0001-Added-option-for-filesystem-cassandra-output.patch, 0002-cleaned-up-clock-struct.patch, 0003-removed-unnecessary-files.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914258#action_12914258 ] 

Stu Hood commented on CASSANDRA-1342:
-------------------------------------

I don't think the files in 0003 were ever really necessary: they were apparently intended for use between the Mapper/Reducer and the OutputFormat, but Hadoop doesn't serialize data there.

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0001-made-wordcount-output-to-cassandra.patch, 0002-cleaned-up-clock-struct.patch, 0003-removed-unnecessary-files.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop outputformat

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna updated CASSANDRA-1342:
------------------------------------

    Attachment: 0001-made-wordcount-output-to-cassandra.patch
                0002-cleaned-up-clock-struct.patch
                0003-removed-unnecessary-files.patch

separated into three patches:

0001 - core changes to output word count results into cassandra into the Standard2 column family
0002 - cleaned up clock structs in contrib/hadoop_streaming_output/bin/reducer.py and contrib/py_stress/avro_stress.py
0003 - removed unnecessary classes having to do with the hadoop output

> Have the word_count contrib example use the new baked in hadoop outputformat
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>    Affects Versions: 0.7 beta 1
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 0001-made-wordcount-output-to-cassandra.patch, 0002-cleaned-up-clock-struct.patch, 0003-removed-unnecessary-files.patch
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in output format, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1342) Have the word_count contrib example use the new baked in hadoop ColumnFamilyRecordWriter and such

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna updated CASSANDRA-1342:
------------------------------------

        Summary: Have the word_count contrib example use the new baked in hadoop ColumnFamilyRecordWriter and such  (was: Have the word_count contrib example use the new baked in hadoop outputwriter and such)
    Description: The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in ColumnFamilyRecordWriter and such, based on CASSANDRA-1101.  (was: The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in outputwriter and such, based on CASSANDRA-1101.)

> Have the word_count contrib example use the new baked in hadoop ColumnFamilyRecordWriter and such
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1342
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Hadoop
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>            Priority: Minor
>             Fix For: 0.7.0
>
>
> The contrib/word_count example currently outputs results to the /tmp directory.  It would be nice to give an example of writing back to Cassandra with the new baked in ColumnFamilyRecordWriter and such, based on CASSANDRA-1101.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.