You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by mahender bigdata <Ma...@outlook.com> on 2016/03/03 23:38:03 UTC

Field delimiter in hive

Hi,

I'm bit confused to know which character should be taken as delimiter 
for hive table generically. Can any one suggest me best Unicode 
character which doesn't come has part of data.

Here are the couple of options, Im thinking off for Field Delimiter. 
Please let me know which is best one use and chance of that character ( 
i.e delimiter ) in data is less in day to day scenario..

\U0001  = START OF HEADING ==> SOH  ==> ( CTRL+SHIFT+A in windows) ==> 
Hive Default delimiter


_\U001F  __= INFORMATION SEPARATOR ONE = unit separator (US)  => __( 
CTRL+SHIFT+ - in windows)_


_\U001E  __= INFORMATION SEPARATOR TWO = record separator (RS) ==> __( 
CTRL+SHIFT+6 in windows)_

Some how by name i feel \U001F is best option, can any one comment or 
provide best Unicode which doesn't in regular data.

Re: Field delimiter in hive

Posted by Mich Talebzadeh <mi...@gmail.com>.

try "~|~" as field delimiter. It normally works for most conditions

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 8 March 2016 at 11:56, Chandeep Singh <cs...@chandeep.com> wrote:

> I’ve been pretty successful with two pipes (||) or two carets (^^) based
> on my dataset even though they aren’t unicode.
>
> On Mar 7, 2016, at 8:32 PM, mahender bigdata <Ma...@outlook.com>
> wrote:
>
> Any help on this.
>
> On 3/3/2016 2:38 PM, mahender bigdata wrote:
>
> Hi,
>
> I'm bit confused to know which character should be taken as delimiter for
> hive table generically. Can any one suggest me best Unicode character which
> doesn't come has part of data.
>
> Here are the couple of options, Im thinking off for Field Delimiter.
> Please let me know which is best one use and chance of that character ( i.e
> delimiter ) in data is less in day to day scenario..
>
> \U0001  = START OF HEADING ==> SOH  ==> ( CTRL+SHIFT+A in windows)  ==>
> Hive Default delimiter
>
>
> *\U001F  ** = INFORMATION SEPARATOR ONE = unit separator (US)  => **(
> CTRL+SHIFT+ - in windows)*
>
>
> *\U001E  ** = INFORMATION SEPARATOR TWO = record separator (RS) ==> ** (
> CTRL+SHIFT+6 in windows)*
>
> Some how by name i feel \U001F is best option, can any one comment or
> provide best Unicode which doesn't in regular data.
>
>
>
>
>
>

Re: Field delimiter in hive

Posted by Chandeep Singh <cs...@chandeep.com>.

I’ve been pretty successful with two pipes (||) or two carets (^^) based on my dataset even though they aren’t unicode.

> On Mar 7, 2016, at 8:32 PM, mahender bigdata <Ma...@outlook.com> wrote:
> 
> Any help on this.
> 
> On 3/3/2016 2:38 PM, mahender bigdata wrote:
>> Hi,
>> 
>> I'm bit confused to know which character should be taken as delimiter for hive table generically. Can any one suggest me best Unicode character which doesn't come has part of data.
>> 
>> Here are the couple of options, Im thinking off for Field Delimiter. Please let me know which is best one use and chance of that character ( i.e delimiter ) in data is less in day to day scenario..
>> 
>> \U0001  = START OF HEADING ==> SOH  ==> ( CTRL+SHIFT+A in windows)  ==> Hive Default delimiter
>> 
>> 
>> \U001F  = INFORMATION SEPARATOR ONE = unit separator (US)  => ( CTRL+SHIFT+ - in windows)
>> 
>> 
>> \U001E  = INFORMATION SEPARATOR TWO = record separator (RS) ==> ( CTRL+SHIFT+6 in windows)
>> 
>> Some how by name i feel \U001F is best option, can any one comment or provide best Unicode which doesn't in regular data.
>> 
>> 
>> 
>

Re: Field delimiter in hive

Posted by mahender bigdata <Ma...@outlook.com>.

Any help on this.

On 3/3/2016 2:38 PM, mahender bigdata wrote:
> Hi,
>
> I'm bit confused to know which character should be taken as delimiter 
> for hive table generically. Can any one suggest me best Unicode 
> character which doesn't come has part of data.
>
> Here are the couple of options, Im thinking off for Field Delimiter. 
> Please let me know which is best one use and chance of that character 
> ( i.e delimiter ) in data is less in day to day scenario..
>
> \U0001  = START OF HEADING ==> SOH  ==> ( CTRL+SHIFT+A in windows)  
> ==> Hive Default delimiter
>
>
> _\U001F  __= INFORMATION SEPARATOR ONE = unit separator (US)  => __( 
> CTRL+SHIFT+ - in windows)_
>
>
> _\U001E  __= INFORMATION SEPARATOR TWO = record separator (RS) ==> 
> __( CTRL+SHIFT+6 in windows)_
>
> Some how by name i feel \U001F is best option, can any one comment or 
> provide best Unicode which doesn't in regular data.
>
>
>