You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2012/05/09 03:33:48 UTC

[jira] [Created] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Namit Jain created HIVE-3012:
--------------------------------

             Summary: hive custom scripts do not work well if the data contains new lines
                 Key: HIVE-3012
                 URL: https://issues.apache.org/jira/browse/HIVE-3012
             Project: Hive
          Issue Type: Improvement
            Reporter: Namit Jain
            Assignee: Namit Jain


If the data contain newline, it will be passed as is to the script.
The script has no way of splitting the data based on the new line.

An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Kevin Wilfong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Wilfong updated HIVE-3012:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed, thanks Namit.
                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch, HIVE-3012.D3099.3.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-3012:
-----------------------------

    Status: Patch Available  (was: Open)
    
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271829#comment-13271829 ] 

Namit Jain commented on HIVE-3012:
----------------------------------

comments addressed
                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271759#comment-13271759 ] 

Ashutosh Chauhan commented on HIVE-3012:
----------------------------------------

This has been fixed upstream: https://issues.apache.org/jira/browse/HADOOP-7096 If thats there, do we still need this?
                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271846#comment-13271846 ] 

Phabricator commented on HIVE-3012:
-----------------------------------

kevinwilfong has accepted the revision "HIVE-3012 [jira] hive custom scripts do not work well if the data contains new lines".

  +1 Running tests.

REVISION DETAIL
  https://reviews.facebook.net/D3099

BRANCH
  svn

To: JIRA, kevinwilfong, njain
Cc: kevinwilfong

                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch, HIVE-3012.D3099.3.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-3012:
------------------------------

    Attachment: HIVE-3012.D3099.2.patch

njain updated the revision "HIVE-3012 [jira] hive custom scripts do not work well if the data contains new lines".
Reviewers: JIRA

  Comments


REVISION DETAIL
  https://reviews.facebook.net/D3099

AFFECTED FILES
  data/scripts/newline.py
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/test/results/clientpositive/newline.q.out
  ql/src/test/queries/clientpositive/newline.q
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordReader.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordWriter.java

To: JIRA, njain
Cc: kevinwilfong

                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271677#comment-13271677 ] 

Namit Jain commented on HIVE-3012:
----------------------------------

https://reviews.facebook.net/D3099
                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-3012:
------------------------------

    Attachment: HIVE-3012.D3099.1.patch

njain requested code review of "HIVE-3012 [jira] hive custom scripts do not work well if the data contains new lines".
Reviewers: JIRA

  https://issues.apache.org/jira/browse/HIVE-3012

  HIVE-3012 hive custom scripts do not work well if the data contains new lines

  If the data contain newline, it will be passed as is to the script.
  The script has no way of splitting the data based on the new line.

  An option should be added to hive to escape/unescape the new lines.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D3099

AFFECTED FILES
  data/scripts/newline.py
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/test/results/clientpositive/newline.q.out
  ql/src/test/queries/clientpositive/newline.q
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordReader.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordWriter.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/7047/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273125#comment-13273125 ] 

Hudson commented on HIVE-3012:
------------------------------

Integrated in Hive-trunk-h0.21 #1424 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1424/])
    HIVE-3012 hive custom scripts do not work well if the data contains new lines (njain via kevinwilfong) (Revision 1336986)

     Result = FAILURE
kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336986
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/data/scripts/newline.py
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordReader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
* /hive/trunk/ql/src/test/queries/clientpositive/newline.q
* /hive/trunk/ql/src/test/results/clientpositive/newline.q.out

                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch, HIVE-3012.D3099.3.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271759#comment-13271759 ] 

Ashutosh Chauhan edited comment on HIVE-3012 at 5/9/12 8:18 PM:
----------------------------------------------------------------

This has been fixed upstream: HADOOP-7096 If thats there, do we still need this?
                
      was (Author: ashutoshc):
    This has been fixed upstream: https://issues.apache.org/jira/browse/HADOOP-7096 If thats there, do we still need this?
                  
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271827#comment-13271827 ] 

Namit Jain commented on HIVE-3012:
----------------------------------

We still need it - a lot of hive clients are using a older hadoop version

                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271703#comment-13271703 ] 

Phabricator commented on HIVE-3012:
-----------------------------------

kevinwilfong has commented on the revision "HIVE-3012 [jira] hive custom scripts do not work well if the data contains new lines".

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java:100-119 Rather than use methods here, wouldn't it be easier to just use a final static variable?

  E.g.
  static final byte[] newLineEscapeBytes = "\\n".getBytes();
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:393 Could you add this to conf/hive-default.xml.template with a description of what it does?

REVISION DETAIL
  https://reviews.facebook.net/D3099

                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3012) hive custom scripts do not work well if the data contains new lines

Posted by "Phabricator (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-3012:
------------------------------

    Attachment: HIVE-3012.D3099.3.patch

njain updated the revision "HIVE-3012 [jira] hive custom scripts do not work well if the data contains new lines".
Reviewers: JIRA

  comments


REVISION DETAIL
  https://reviews.facebook.net/D3099

AFFECTED FILES
  conf/hive-default.xml.template
  data/scripts/newline.py
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/test/results/clientpositive/newline.q.out
  ql/src/test/queries/clientpositive/newline.q
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordReader.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TextRecordWriter.java

To: JIRA, njain
Cc: kevinwilfong

                
> hive custom scripts do not work well if the data contains new lines
> -------------------------------------------------------------------
>
>                 Key: HIVE-3012
>                 URL: https://issues.apache.org/jira/browse/HIVE-3012
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: HIVE-3012.D3099.1.patch, HIVE-3012.D3099.2.patch, HIVE-3012.D3099.3.patch
>
>
> If the data contain newline, it will be passed as is to the script.
> The script has no way of splitting the data based on the new line.
> An option should be added to hive to escape/unescape the new lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira