You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2010/08/26 02:37:17 UTC

[jira] Created: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

use SequenceFile rather than TextFile format for hive query results
-------------------------------------------------------------------

                 Key: HIVE-1598
                 URL: https://issues.apache.org/jira/browse/HIVE-1598
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Ning Zhang
            Assignee: Ning Zhang


Hive query's result is written to a temporary directory first and then FetchTask takes the files and display it to the users. Currently the file format used for the resulting file is TextFile format. This could cause incorrect result display if some string typed column contains new lines, which are used as record delimiters in TextInputFormat. Switching to SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1598:
-----------------------------

    Status: Open  (was: Patch Available)

> use SequenceFile rather than TextFile format for hive query results
> -------------------------------------------------------------------
>
>                 Key: HIVE-1598
>                 URL: https://issues.apache.org/jira/browse/HIVE-1598
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then FetchTask takes the files and display it to the users. Currently the file format used for the resulting file is TextFile format. This could cause incorrect result display if some string typed column contains new lines, which are used as record delimiters in TextInputFormat. Switching to SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain resolved HIVE-1598.
------------------------------

     Hadoop Flags: [Reviewed]
    Fix Version/s: 0.7.0
       Resolution: Fixed

committed. Thanks Ning

> use SequenceFile rather than TextFile format for hive query results
> -------------------------------------------------------------------
>
>                 Key: HIVE-1598
>                 URL: https://issues.apache.org/jira/browse/HIVE-1598
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1598.2.patch, HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then FetchTask takes the files and display it to the users. Currently the file format used for the resulting file is TextFile format. This could cause incorrect result display if some string typed column contains new lines, which are used as record delimiters in TextInputFormat. Switching to SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904693#action_12904693 ] 

Namit Jain commented on HIVE-1598:
----------------------------------

Ning, can you add the test which was failing with TextFile ('\n' in the data)

> use SequenceFile rather than TextFile format for hive query results
> -------------------------------------------------------------------
>
>                 Key: HIVE-1598
>                 URL: https://issues.apache.org/jira/browse/HIVE-1598
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then FetchTask takes the files and display it to the users. Currently the file format used for the resulting file is TextFile format. This could cause incorrect result display if some string typed column contains new lines, which are used as record delimiters in TextInputFormat. Switching to SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1598:
-----------------------------

    Attachment: HIVE-1598.patch

This patch only add support for using SequenceFile as query result. There are still questions on whether we should use it for script operator or not. Will open another JIRA if needed.

> use SequenceFile rather than TextFile format for hive query results
> -------------------------------------------------------------------
>
>                 Key: HIVE-1598
>                 URL: https://issues.apache.org/jira/browse/HIVE-1598
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then FetchTask takes the files and display it to the users. Currently the file format used for the resulting file is TextFile format. This could cause incorrect result display if some string typed column contains new lines, which are used as record delimiters in TextInputFormat. Switching to SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902742#action_12902742 ] 

Ning Zhang commented on HIVE-1598:
----------------------------------

Also make sure script operator handles new lines correctly. 

> use SequenceFile rather than TextFile format for hive query results
> -------------------------------------------------------------------
>
>                 Key: HIVE-1598
>                 URL: https://issues.apache.org/jira/browse/HIVE-1598
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> Hive query's result is written to a temporary directory first and then FetchTask takes the files and display it to the users. Currently the file format used for the resulting file is TextFile format. This could cause incorrect result display if some string typed column contains new lines, which are used as record delimiters in TextInputFormat. Switching to SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1598:
-----------------------------

    Attachment: HIVE-1598.2.patch

Attached the test case and also removed some debugging info. These are the only changes. 

> use SequenceFile rather than TextFile format for hive query results
> -------------------------------------------------------------------
>
>                 Key: HIVE-1598
>                 URL: https://issues.apache.org/jira/browse/HIVE-1598
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1598.2.patch, HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then FetchTask takes the files and display it to the users. Currently the file format used for the resulting file is TextFile format. This could cause incorrect result display if some string typed column contains new lines, which are used as record delimiters in TextInputFormat. Switching to SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1598:
-----------------------------

    Status: Patch Available  (was: Open)

all 0.17 & 0.20 tests passed.

> use SequenceFile rather than TextFile format for hive query results
> -------------------------------------------------------------------
>
>                 Key: HIVE-1598
>                 URL: https://issues.apache.org/jira/browse/HIVE-1598
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then FetchTask takes the files and display it to the users. Currently the file format used for the resulting file is TextFile format. This could cause incorrect result display if some string typed column contains new lines, which are used as record delimiters in TextInputFormat. Switching to SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.