You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2008/04/22 04:03:21 UTC
[jira] Created: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Allow TextOutputFormat to use configurable separators
-----------------------------------------------------
Key: HADOOP-3295
URL: https://issues.apache.org/jira/browse/HADOOP-3295
Project: Hadoop Core
Issue Type: Improvement
Components: io
Reporter: Zheng Shao
Assignee: Runping Qi
Priority: Minor
TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HADOOP-3295:
-------------------------------
Status: Patch Available (was: Open)
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Runping Qi
> Priority: Minor
> Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591518#action_12591518 ]
Hadoop QA commented on HADOOP-3295:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380659/3295.patch
against trunk revision 645773.
@author +1. The patch does not contain any @author tags.
tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new javac compiler warnings.
release audit +1. The applied patch does not generate any new release audit warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2299/console
This message is automatically generated.
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Runping Qi
> Priority: Minor
> Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591875#action_12591875 ]
Hadoop QA commented on HADOOP-3295:
-----------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380797/3295-2.patch
against trunk revision 645773.
@author +1. The patch does not contain any @author tags.
tests included +1. The patch appears to include 3 new or modified tests.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new javac compiler warnings.
release audit +1. The applied patch does not generate any new release audit warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2312/console
This message is automatically generated.
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HADOOP-3295:
-------------------------------
Attachment: 3295.patch
This patch adds the configuration parameter.
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Runping Qi
> Priority: Minor
> Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HADOOP-3295:
-------------------------------
Attachment: 3295-2.patch
Added a test for customized separator.
Added a constructor with the old prototype to make sure user code does not break because of the patch.
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley reassigned HADOOP-3295:
-------------------------------------
Assignee: Zheng Shao (was: Runping Qi)
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HADOOP-3295:
-------------------------------
Status: Patch Available (was: Open)
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591624#action_12591624 ]
Runping Qi commented on HADOOP-3295:
------------------------------------
Note that you have made public api changes:
{code}
public LineRecordWriter(DataOutputStream out)
{code}
into
{code}
public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
{code}
It is a better to the keep the original one as an overloaded constructor:
{code}
public LineRecordWriter(DataOutputStream out) {
LineRecordWriter(out, "\t");
}
{code}
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Suhas Gogate (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655049#action_12655049 ]
Suhas Gogate commented on HADOOP-3295:
--------------------------------------
Feature added by this Jira has a problem while setting up some of the invalid xml characters e.g. ctrl-A e.g. mapred.textoutputformat.separator = "\u0001"
e,g,
String delim = "\u0001";
Conf.set("mapred.textoutputformat.separator", delim);
Job client serializes the jobconf with mapred.textoutputformat.separator set to "\u0001" (ctrl-A) and problem happens when it is de-serialized (read back) by job tracker, where it encounters invalid xml character.
The test for this feature public : testFormatWithCustomSeparator() does not serialize the jobconf after adding the separator as ctrl-A and hence does not detect the specific problem.
Here is an exception:
08/12/06 01:40:50 INFO mapred.FileInputFormat: Total input paths to process : 1
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference "" is an invalid XML
character.
at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:961)
at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:864)
at
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:832)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:291)
at
org.apache.hadoop.mapred.JobConf.getJobPriority(JobConf.java:1163)
at
org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:179)
at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.mapred.$Proxy1.submitJob(Unknown Source)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Fix For: 0.18.0
>
> Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655076#action_12655076 ]
Zheng Shao commented on HADOOP-3295:
------------------------------------
Can you open a separate jira and mark this one as related? Then we can discuss from there and produce a fix.
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Fix For: 0.18.0
>
> Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592565#action_12592565 ]
Hudson commented on HADOOP-3295:
--------------------------------
Integrated in Hadoop-trunk #471 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/471/])
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Fix For: 0.18.0
>
> Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-3295:
----------------------------------
Status: Open (was: Patch Available)
Zheng, please include a test for the new functionality.
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas updated HADOOP-3295:
----------------------------------
Resolution: Fixed
Fix Version/s: 0.18.0
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
I just committed this. Thanks, Zheng
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
> Fix For: 0.18.0
>
> Attachments: 3295-2.patch, 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3295) Allow TextOutputFormat to use
configurable separators
Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591185#action_12591185 ]
Milind Bhandarkar commented on HADOOP-3295:
-------------------------------------------
This is great !!!
I have been requesting this for a long time !!!!
Thanks Zheng !
Committers, please please please take a serious look at this !
> Allow TextOutputFormat to use configurable separators
> -----------------------------------------------------
>
> Key: HADOOP-3295
> URL: https://issues.apache.org/jira/browse/HADOOP-3295
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io
> Reporter: Zheng Shao
> Assignee: Runping Qi
> Priority: Minor
> Attachments: 3295.patch
>
>
> TextOutputFormat use hardcoded tab as key-value separator. We should allow configurable separators like ^A, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.