You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2008/03/15 11:46:24 UTC

[jira] Commented: (HADOOP-2806) Streaming has no way to force entire record (or null) as key

    [ https://issues.apache.org/jira/browse/HADOOP-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579027#action_12579027 ] 

Hadoop QA commented on HADOOP-2806:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12377872/patch-2806.txt
against trunk revision 619744.

    @author +1.  The patch does not contain any @author tags.

    tests included -1.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/console

This message is automatically generated.

> Streaming has no way to force entire record (or null) as key
> ------------------------------------------------------------
>
>                 Key: HADOOP-2806
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2806
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: Marco Nicosia
>            Assignee: Amareshwari Sriramadasu
>            Priority: Minor
>             Fix For: 0.17.0
>
>         Attachments: patch-2806.txt
>
>
> I think perhaps streaming needs a "-allkey" or "-nullkey" option? Otherwise, I'm concerned there is a subtle streaming documentation problem.
> These two docs:
> http://hadoop.apache.org/core/docs/current/streaming.html
> http://wiki.apache.org/hadoop/HadoopStreaming (Should be merged with above?)
> ... seem to ignore that streaming, by default, splits key/value on TAB. Sure, they mention it, but in all the simple (no separator) examples, they don't seem to take into account that streaming may inconsistently decide whether the whole line is the key, or just up to the first tab, should one occur. This means that some records might be sorted differently as compared to others based on whether or not there's a tab?
> Here's a very simple pair of examples, that to the naive, should produce the same output, but do not:
> > [hod] (marco) >> run dfs -fs local -cat str-tabs
> > a       1
> > b       3
> > a       4
> > 
> > [hod] (marco) >> run dfs -put str-tabs str-tabs
> > 
> > [hod] (marco) >> run jar hadoop-streaming.jar -input str-tabs -output str-tabs.out -mapper /bin/cat -reducer /bin/cat     
> > [blah blah blah]
> > 
> > [hod] (marco) >> run dfs -cat str-tabs.out/part-00000
> > a       4
> > a       1
> > b       3
> Compare to this negative-test:
> > [hod] (marco) >> run dfs -fs local -cat str-notabs
> > a 1
> > b 3
> > a 4
> > 
> > [hod] (marco) >> run dfs -put str-notabs str-notabs
> > 
> > [hod] (marco) >> run jar hadoop-streaming.jar -input str-notabs -output str-notabs.out -mapper /bin/cat -reducer /bin/cat
> > [blah blah blah]
> > 
> > [hod] (marco) >> run dfs -cat str-notabs.out/part-00000
> > a 1
> > a 4
> > b 3
> > 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.