You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2008/04/22 04:03:21 UTC

[jira] Created: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Allow TextOutputFormat to use configurable separators
-----------------------------------------------------

                 Key: HADOOP-3295
                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
             Project: Hadoop Core
          Issue Type: Improvement
          Components: io
            Reporter: Zheng Shao
            Assignee: Runping Qi
            Priority: Minor


TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-3295:
-------------------------------

    Status: Patch Available  (was: Open)

> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Runping Qi
>            Priority: Minor
>         Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591518#action_12591518 ] 

Hadoop QA commented on HADOOP-3295:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12380659/3295.patch
against trunk revision 645773.

    @author +1.  The patch does not contain any @author tags.

    tests included -1.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/console

This message is automatically generated.

> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Runping Qi
>            Priority: Minor
>         Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591875#action_12591875 ] 

Hadoop QA commented on HADOOP-3295:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12380797/3295-2.patch
against trunk revision 645773.

    @author +1.  The patch does not contain any @author tags.

    tests included +1.  The patch appears to include 3 new or modified tests.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/console

This message is automatically generated.

> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>         Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-3295:
-------------------------------

    Attachment: 3295.patch

This patch adds the configuration parameter.


> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Runping Qi
>            Priority: Minor
>         Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-3295:
-------------------------------

    Attachment: 3295-2.patch

Added a test for customized separator.

Added a constructor with the old prototype to make sure user code does not break because of the patch.


> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>         Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley reassigned HADOOP-3295:
-------------------------------------

    Assignee: Zheng Shao  (was: Runping Qi)

> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>         Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-3295:
-------------------------------

    Status: Patch Available  (was: Open)

> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>         Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591624#action_12591624 ] 

Runping Qi commented on HADOOP-3295:
------------------------------------


Note that you have made public api changes:
{code}
public LineRecordWriter(DataOutputStream out)
{code}
into 
{code}
public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
{code}
It is a better to the keep the original one as an overloaded constructor:
{code}
public LineRecordWriter(DataOutputStream out) {
    LineRecordWriter(out, "\t");
}
{code}



> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>         Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Suhas Gogate (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655049#action_12655049 ] 

Suhas Gogate commented on HADOOP-3295:
--------------------------------------

Feature added by this Jira has a problem while setting up some of the invalid xml characters e.g. ctrl-A e.g. mapred.textoutputformat.separator = "\u0001" 

e,g, 
String delim = "\u0001";
Conf.set("mapred.textoutputformat.separator", delim);

Job client serializes the jobconf with mapred.textoutputformat.separator set to "\u0001" (ctrl-A) and problem happens when it is de-serialized (read back) by job tracker, where it encounters invalid xml character. 

The test for this feature public : testFormatWithCustomSeparator()  does not serialize the jobconf after adding the separator as ctrl-A and hence does not detect the specific problem.

Here is an exception:

   08/12/06 01:40:50 INFO mapred.FileInputFormat: Total input paths to process : 1
org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
character.
       at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:961)
       at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:864)
       at
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:832)
       at org.apache.hadoop.conf.Configuration.get(Configuration.java:291)
       at
org.apache.hadoop.mapred.JobConf.getJobPriority(JobConf.java:1163)
       at
org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:179)
       at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

       at org.apache.hadoop.ipc.Client.call(Client.java:715)
       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
       at org.apache.hadoop.mapred.$Proxy1.submitJob(Unknown Source)
       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
       at 

> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655076#action_12655076 ] 

Zheng Shao commented on HADOOP-3295:
------------------------------------

Can you open a separate jira and mark this one as related? Then we can discuss from there and produce a fix.



> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592565#action_12592565 ] 

Hudson commented on HADOOP-3295:
--------------------------------

Integrated in Hadoop-trunk #471 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/471/])

> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-3295:
----------------------------------

    Status: Open  (was: Patch Available)

Zheng, please include a test for the new functionality.

> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>         Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-3295:
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.18.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Zheng

> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use configurable separators

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591185#action_12591185 ] 

Milind Bhandarkar commented on HADOOP-3295:
-------------------------------------------

This is great !!!

I have been requesting this for a long time !!!!

Thanks Zheng !

Committers, please please please take a serious look at this !

> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
>                 Key: HADOOP-3295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3295
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>            Assignee: Runping Qi
>            Priority: Minor
>         Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.