You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jesse Shieh (JIRA)" <ji...@apache.org> on 2011/01/16 02:38:46 UTC

[jira] Created: (CASSANDRA-1993) Word count example doesn't output the words correctly to cassandra. It outputs spurious data past the length of the byte array.

Word count example doesn't output the words correctly to cassandra.  It outputs spurious data past the length of the byte array.
--------------------------------------------------------------------------------------------------------------------------------

                 Key: CASSANDRA-1993
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1993
             Project: Cassandra
          Issue Type: Bug
          Components: Hadoop
    Affects Versions: 0.7.0
         Environment: All
            Reporter: Jesse Shieh
            Priority: Minor


To reproduce:
# start a local cassandra server e.g. sudo bin/cassandra -f
cd contrib/word_count
ant
bin/word_count_setup
bin/word_count

# check the data in cassandra, all looks fine because the words are all of the same length.
# change the data in cassandra to real words, rerun the mapreduce and you'll see some words have spurious characters written past their length
# this is because the word bytes are not terminated at their length

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1993) Word count example doesn't output the words correctly to cassandra. It outputs spurious data past the length of the byte array.

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1993:
--------------------------------------

    Attachment: 1993-v2.txt

Thanks for the report!

v2 should provide a similar fix while avoiding an unnecessary intermediate copy.

> Word count example doesn't output the words correctly to cassandra.  It outputs spurious data past the length of the byte array.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1993
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1993
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Contrib
>    Affects Versions: 0.7.0
>         Environment: All
>            Reporter: Jesse Shieh
>            Priority: Minor
>             Fix For: 0.7.1
>
>         Attachments: 1993-v2.txt, trunk-1993.txt
>
>   Original Estimate: 0.08h
>  Remaining Estimate: 0.08h
>
> To reproduce:
> # start a local cassandra server e.g. sudo bin/cassandra -f
> cd contrib/word_count
> ant
> bin/word_count_setup
> bin/word_count
> # check the data in cassandra, all looks fine because the words are all of the same length.
> # change the data in cassandra to real words, rerun the mapreduce and you'll see some words have spurious characters written past their length
> # this is because the word bytes are not terminated at their length

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1993) Word count example doesn't output the words correctly to cassandra. It outputs spurious data past the length of the byte array.

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982418#action_12982418 ] 

Hudson commented on CASSANDRA-1993:
-----------------------------------

Integrated in Cassandra-0.7 #164 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/164/])
    fix copy bounds for word Text in wordcount demo
patch by Jesse Shieh and jbellis for CASSANDRA-1993


> Word count example doesn't output the words correctly to cassandra.  It outputs spurious data past the length of the byte array.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1993
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1993
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Contrib
>    Affects Versions: 0.7.0
>         Environment: All
>            Reporter: Jesse Shieh
>            Assignee: Jesse Shieh
>            Priority: Minor
>             Fix For: 0.7.1
>
>         Attachments: 1993-v2.txt, trunk-1993.txt
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> To reproduce:
> # start a local cassandra server e.g. sudo bin/cassandra -f
> cd contrib/word_count
> ant
> bin/word_count_setup
> bin/word_count
> # check the data in cassandra, all looks fine because the words are all of the same length.
> # change the data in cassandra to real words, rerun the mapreduce and you'll see some words have spurious characters written past their length
> # this is because the word bytes are not terminated at their length

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1993) Word count example doesn't output the words correctly to cassandra. It outputs spurious data past the length of the byte array.

Posted by "Jesse Shieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesse Shieh updated CASSANDRA-1993:
-----------------------------------

    Attachment: trunk-1993.txt

> Word count example doesn't output the words correctly to cassandra.  It outputs spurious data past the length of the byte array.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1993
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1993
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.7.0
>         Environment: All
>            Reporter: Jesse Shieh
>            Priority: Minor
>         Attachments: trunk-1993.txt
>
>   Original Estimate: 0.08h
>  Remaining Estimate: 0.08h
>
> To reproduce:
> # start a local cassandra server e.g. sudo bin/cassandra -f
> cd contrib/word_count
> ant
> bin/word_count_setup
> bin/word_count
> # check the data in cassandra, all looks fine because the words are all of the same length.
> # change the data in cassandra to real words, rerun the mapreduce and you'll see some words have spurious characters written past their length
> # this is because the word bytes are not terminated at their length

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1993) Word count example doesn't output the words correctly to cassandra. It outputs spurious data past the length of the byte array.

Posted by "Jesse Shieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982223#action_12982223 ] 

Jesse Shieh commented on CASSANDRA-1993:
----------------------------------------

nice improvement =)

you might also be interested to know that the latest version of hadoop adds a method copyBytes to the Text object that should be able to replace getBytes and take care of this automatically for us.

see: http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/io/Text.java?r1=953881&r2=1050070

> Word count example doesn't output the words correctly to cassandra.  It outputs spurious data past the length of the byte array.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1993
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1993
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Contrib
>    Affects Versions: 0.7.0
>         Environment: All
>            Reporter: Jesse Shieh
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7.1
>
>         Attachments: 1993-v2.txt, trunk-1993.txt
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> To reproduce:
> # start a local cassandra server e.g. sudo bin/cassandra -f
> cd contrib/word_count
> ant
> bin/word_count_setup
> bin/word_count
> # check the data in cassandra, all looks fine because the words are all of the same length.
> # change the data in cassandra to real words, rerun the mapreduce and you'll see some words have spurious characters written past their length
> # this is because the word bytes are not terminated at their length

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1993) Word count example doesn't output the words correctly to cassandra. It outputs spurious data past the length of the byte array.

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1993:
--------------------------------------

           Component/s:     (was: Hadoop)
                        Contrib
         Fix Version/s: 0.7.1
              Assignee: Jonathan Ellis
    Remaining Estimate: 1h  (was: 0.08h)
     Original Estimate: 1h  (was: 0.08h)

> Word count example doesn't output the words correctly to cassandra.  It outputs spurious data past the length of the byte array.
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1993
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1993
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Contrib
>    Affects Versions: 0.7.0
>         Environment: All
>            Reporter: Jesse Shieh
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7.1
>
>         Attachments: 1993-v2.txt, trunk-1993.txt
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> To reproduce:
> # start a local cassandra server e.g. sudo bin/cassandra -f
> cd contrib/word_count
> ant
> bin/word_count_setup
> bin/word_count
> # check the data in cassandra, all looks fine because the words are all of the same length.
> # change the data in cassandra to real words, rerun the mapreduce and you'll see some words have spurious characters written past their length
> # this is because the word bytes are not terminated at their length

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.