You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2012/05/09 03:33:48 UTC
[jira] [Created] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Namit Jain created HIVE-3012:
--------------------------------
Summary: hive custom scripts do not work well if the data contains new lines
Key: HIVE-3012
URL: https://issues.apache.org/jira/browse/HIVE-3012
Project: Hive
Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
If the data contain newline, it will be passed as is to the script.
The script has no way of splitting the data based on the new line.
An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Wilfong updated HIVE-3012:
--------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Committed, thanks Namit.
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch, HIVE-3012.D3099.3.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-3012:
-----------------------------
Status: Patch Available (was: Open)
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271829#comment-13271829 ]
Namit Jain commented on HIVE-3012:
----------------------------------
comments addressed
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271759#comment-13271759 ]
Ashutosh Chauhan commented on HIVE-3012:
----------------------------------------
This has been fixed upstream: https://issues.apache.org/jira/browse/HADOOP-7096 If thats there, do we still need this?
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Phabricator (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271846#comment-13271846 ]
Phabricator commented on HIVE-3012:
-----------------------------------
kevinwilfong has accepted the revision "HIVE-3012 [jira] hive custom scripts do not work well if the data contains new lines".
+1 Running tests.
REVISION DETAIL
https://reviews.facebook.net/D3099
BRANCH
svn
To: JIRA, kevinwilfong, njain
Cc: kevinwilfong
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch, HIVE-3012.D3099.3.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Phabricator (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-3012:
------------------------------
Attachment: HIVE-3012.D3099.2.patch
njain updated the revision "HIVE-3012 [jira] hive custom scripts do not work well if the data contains new lines".
Reviewers: JIRA
Comments
REVISION DETAIL
https://reviews.facebook.net/D3099
AFFECTED FILES
data/scripts/newline.py
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
ql/src/test/results/clientpositive/newline.q.out
ql/src/test/queries/clientpositive/newline.q
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordReader.java
ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordWriter.java
To: JIRA, njain
Cc: kevinwilfong
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271677#comment-13271677 ]
Namit Jain commented on HIVE-3012:
----------------------------------
https://reviews.facebook.net/D3099
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Phabricator (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-3012:
------------------------------
Attachment: HIVE-3012.D3099.1.patch
njain requested code review of "HIVE-3012 [jira] hive custom scripts do not work well if the data contains new lines".
Reviewers: JIRA
https://issues.apache.org/jira/browse/HIVE-3012
HIVE-3012 hive custom scripts do not work well if the data contains new lines
If the data contain newline, it will be passed as is to the script.
The script has no way of splitting the data based on the new line.
An option should be added to hive to escape/unescape the new lines.
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D3099
AFFECTED FILES
data/scripts/newline.py
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
ql/src/test/results/clientpositive/newline.q.out
ql/src/test/queries/clientpositive/newline.q
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordReader.java
ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordWriter.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/7047/
Tip: use the X-Herald-Rules header to filter Herald messages in your client.
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273125#comment-13273125 ]
Hudson commented on HIVE-3012:
------------------------------
Integrated in Hive-trunk-h0.21 #1424 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1424/])
HIVE-3012 hive custom scripts do not work well if the data contains new lines (njain via kevinwilfong) (Revision 1336986)
Result = FAILURE
kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336986
Files :
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/data/scripts/newline.py
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordReader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
* /hive/trunk/ql/src/test/queries/clientpositive/newline.q
* /hive/trunk/ql/src/test/results/clientpositive/newline.q.out
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch, HIVE-3012.D3099.3.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HIVE-3012) hive custom scripts do
not work well if the data contains new lines
Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271759#comment-13271759 ]
Ashutosh Chauhan edited comment on HIVE-3012 at 5/9/12 8:18 PM:
----------------------------------------------------------------
This has been fixed upstream: HADOOP-7096 If thats there, do we still need this?
was (Author: ashutoshc):
This has been fixed upstream: https://issues.apache.org/jira/browse/HADOOP-7096 If thats there, do we still need this?
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271827#comment-13271827 ]
Namit Jain commented on HIVE-3012:
----------------------------------
We still need it - a lot of hive clients are using a older hadoop version
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Phabricator (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271703#comment-13271703 ]
Phabricator commented on HIVE-3012:
-----------------------------------
kevinwilfong has commented on the revision "HIVE-3012 [jira] hive custom scripts do not work well if the data contains new lines".
INLINE COMMENTS
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java:100-119 Rather than use methods here, wouldn't it be easier to just use a final static variable?
E.g.
static final byte[] newLineEscapeBytes = "\\n".getBytes();
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:393 Could you add this to conf/hive-default.xml.template with a description of what it does?
REVISION DETAIL
https://reviews.facebook.net/D3099
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3012) hive custom scripts do not work well
if the data contains new lines
Posted by "Phabricator (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-3012:
------------------------------
Attachment: HIVE-3012.D3099.3.patch
njain updated the revision "HIVE-3012 [jira] hive custom scripts do not work well if the data contains new lines".
Reviewers: JIRA
comments
REVISION DETAIL
https://reviews.facebook.net/D3099
AFFECTED FILES
conf/hive-default.xml.template
data/scripts/newline.py
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
ql/src/test/results/clientpositive/newline.q.out
ql/src/test/queries/clientpositive/newline.q
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordReader.java
ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordWriter.java
To: JIRA, njain
Cc: kevinwilfong
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
> Key: HIVE-3012
> URL: https://issues.apache.org/jira/browse/HIVE-3012
> Project: Hive
> Issue Type: Improvement
> Reporter: Namit Jain
> Assignee: Namit Jain
> Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch, HIVE-3012.D3099.3.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira