You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Adam Kramer (JIRA)" <ji...@apache.org> on 2011/07/27 01:09:10 UTC

[jira] [Updated] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification

     [ https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kramer updated HIVE-1466:
------------------------------

    Description: 
NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as "NULL". This is inconsistent.

The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving.

  was:
I just updated the Hive wiki to clarify what some would consider an oddity: When NULL values are exported to a script via TRANSFORM, they are converted to the string "\N", and then when the script's output is read, any cell that contains only \N is treated as a NULL value.

I believe that there are very VERY few reasons why anyone would need cells that contain only a backslash and then a capital N to be distinguished from NULL cells, but for complete generality, we should allow this.

The way to do that is probably by adding a specification in the ROW FORMAT for a table that would allow any string to be treated as a NULL if it is the only string in a cell. Some may prefer the empty string, others the word NULL in caps, etc. I vote for keeping \N as the default because I am used to it, but also for allowing this to be customized.


> Add NULL DEFINED AS to ROW FORMAT specification
> -----------------------------------------------
>
>                 Key: HIVE-1466
>                 URL: https://issues.apache.org/jira/browse/HIVE-1466
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Adam Kramer
>
> NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as "NULL". This is inconsistent.
> The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira