You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@trafodion.apache.org by Hans Zeller <ha...@esgyn.com> on 2016/05/26 01:11:54 UTC

Hive STRING and VARCHAR types

Hi,

Here is a question on Hive data types. Ming is about to add support for
Hive VARCHAR data types, in addition to the existing STRING type, but we
hit a question we wanted to pose to the user community. Here is a proposed
type mapping from Hive to Trafodion:

Hive type Trafodion type Max # of chars Size in bytes Existing/new Comments
STRING VARCHAR(n BYTES) n/4 to n n existing n is determined by the
HIVE_MAX_STRING_LENGTH CQD
VARCHAR(m) VARCHAR(m) m 4*m proposed

Is it ok if we treat STRING and VARCHAR differently?

Thanks,

Ming and Hans

RE: 答复: Hive STRING and VARCHAR types

Posted by Selva Govindarajan <se...@esgyn.com>.
I think the official manual for hive can be accessed @
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-string

Selva



*From:* Qifan Chen [mailto:qifan.chen@esgyn.com]
*Sent:* Thursday, May 26, 2016 8:55 AM
*To:* user@trafodion.incubator.apache.org
*Subject:* Re: 答复: Hive STRING and VARCHAR types



To drive the support of Hive semantics, sounds like we need to know better
the operations allowed on STRING and VARCHAR(n) in hive?



On Thu, May 26, 2016 at 10:53 AM, Liu, Ming (Ming) <mi...@esgyn.cn>
wrote:

Agree with QiFan.

I think we should map closest data type from Hive to Trafodion. Current
mapping of hive STRING to Trafodion VARCHAR may not be proper. Since Hive
can save up to 2G in a String column, but Trafodion VARCHAR has much
smaller max size. So if there is a 2G hive string data, how can we convert
it into VARCHAR? We can implicitly truncate like Impala, but that not seems
good.

But why I cannot find an official Hive manual that describes the max length
of STRING?



This seems a big semantic change, if map Hive STRING into Trafodion CLOB,
and there will be no confusing anymore, since then STRING and VARCHAR are
two very different types. VARCHAR(n) will be treated as n Characters.



Thanks,

Ming





*发件人**:* Dave Birdsall [mailto:dave.birdsall@esgyn.com]
*发送时间:* 2016年5月26日 23:15
*收件人:* user@trafodion.incubator.apache.org
*主题:* RE: Hive STRING and VARCHAR types



But CLOB would limit what predicates and functions one could use on the
column, right?



*From:* Qifan Chen [mailto:qifan.chen@esgyn.com]
*Sent:* Thursday, May 26, 2016 5:43 AM
*To:* user@trafodion.incubator.apache.org
*Subject:* Re: Hive STRING and VARCHAR types



I wonder if we should consider the same length limit as Hive for a STRING
type, which is 2GB (http://www.folkstalk.com/2011/11/data-types-in-hive.html).
If so, the Trafodion mapping should be CLOB?



--Qifan



On Wed, May 25, 2016 at 11:49 PM, Selva Govindarajan <
selva.govindarajan@esgyn.com> wrote:

From the Cloudera documentation

*Text table considerations:*

Text data files can contain values that are longer than allowed by the
VARCHAR(n) length limit. Any extra trailing characters are ignored when
Impala processes those values during a query

Will Trafodion behave the same way? Having the maximum limit for the
individual column provides the flexibility and optimal resource
utilization. However, having the limit in number of bytes for String and
number of characters for Varchar could be quite confusing for the user.

Selva





*From:* Hans Zeller [mailto:hans.zeller@esgyn.com]
*Sent:* Wednesday, May 25, 2016 6:12 PM
*To:* user@trafodion.incubator.apache.org
*Subject:* Hive STRING and VARCHAR types



Hi,



Here is a question on Hive data types. Ming is about to add support for
Hive VARCHAR data types, in addition to the existing STRING type, but we
hit a question we wanted to pose to the user community. Here is a proposed
type mapping from Hive to Trafodion:



*Hive type*

*Trafodion type*

*Max # of chars*

*Size in bytes*

*Existing/new*

*Comments*

STRING

VARCHAR(n BYTES)

n/4 to n

n

existing

n is determined by the HIVE_MAX_STRING_LENGTH CQD

VARCHAR(m)

VARCHAR(m)

m

4*m

proposed



Is it ok if we treat STRING and VARCHAR differently?



Thanks,


Ming and Hans





-- 

Regards, --Qifan







-- 

Regards, --Qifan

Re: 答复: Hive STRING and VARCHAR types

Posted by Qifan Chen <qi...@esgyn.com>.
To drive the support of Hive semantics, sounds like we need to know better
the operations allowed on STRING and VARCHAR(n) in hive?

On Thu, May 26, 2016 at 10:53 AM, Liu, Ming (Ming) <mi...@esgyn.cn>
wrote:

> Agree with QiFan.
>
> I think we should map closest data type from Hive to Trafodion. Current
> mapping of hive STRING to Trafodion VARCHAR may not be proper. Since Hive
> can save up to 2G in a String column, but Trafodion VARCHAR has much
> smaller max size. So if there is a 2G hive string data, how can we convert
> it into VARCHAR? We can implicitly truncate like Impala, but that not seems
> good.
>
> But why I cannot find an official Hive manual that describes the max
> length of STRING?
>
>
>
> This seems a big semantic change, if map Hive STRING into Trafodion CLOB,
> and there will be no confusing anymore, since then STRING and VARCHAR are
> two very different types. VARCHAR(n) will be treated as n Characters.
>
>
>
> Thanks,
>
> Ming
>
>
>
>
>
> *发件人:* Dave Birdsall [mailto:dave.birdsall@esgyn.com]
> *发送时间:* 2016年5月26日 23:15
> *收件人:* user@trafodion.incubator.apache.org
> *主题:* RE: Hive STRING and VARCHAR types
>
>
>
> But CLOB would limit what predicates and functions one could use on the
> column, right?
>
>
>
> *From:* Qifan Chen [mailto:qifan.chen@esgyn.com]
> *Sent:* Thursday, May 26, 2016 5:43 AM
> *To:* user@trafodion.incubator.apache.org
> *Subject:* Re: Hive STRING and VARCHAR types
>
>
>
> I wonder if we should consider the same length limit as Hive for a STRING
> type, which is 2GB (
> http://www.folkstalk.com/2011/11/data-types-in-hive.html). If so, the
> Trafodion mapping should be CLOB?
>
>
>
> --Qifan
>
>
>
> On Wed, May 25, 2016 at 11:49 PM, Selva Govindarajan <
> selva.govindarajan@esgyn.com> wrote:
>
> From the Cloudera documentation
>
> *Text table considerations:*
>
> Text data files can contain values that are longer than allowed by the
> VARCHAR(n) length limit. Any extra trailing characters are ignored when
> Impala processes those values during a query
>
> Will Trafodion behave the same way? Having the maximum limit for the
> individual column provides the flexibility and optimal resource
> utilization. However, having the limit in number of bytes for String and
> number of characters for Varchar could be quite confusing for the user.
>
> Selva
>
>
>
>
>
> *From:* Hans Zeller [mailto:hans.zeller@esgyn.com]
> *Sent:* Wednesday, May 25, 2016 6:12 PM
> *To:* user@trafodion.incubator.apache.org
> *Subject:* Hive STRING and VARCHAR types
>
>
>
> Hi,
>
>
>
> Here is a question on Hive data types. Ming is about to add support for
> Hive VARCHAR data types, in addition to the existing STRING type, but we
> hit a question we wanted to pose to the user community. Here is a proposed
> type mapping from Hive to Trafodion:
>
>
>
> *Hive type*
>
> *Trafodion type*
>
> *Max # of chars*
>
> *Size in bytes*
>
> *Existing/new*
>
> *Comments*
>
> STRING
>
> VARCHAR(n BYTES)
>
> n/4 to n
>
> n
>
> existing
>
> n is determined by the HIVE_MAX_STRING_LENGTH CQD
>
> VARCHAR(m)
>
> VARCHAR(m)
>
> m
>
> 4*m
>
> proposed
>
>
>
> Is it ok if we treat STRING and VARCHAR differently?
>
>
>
> Thanks,
>
>
> Ming and Hans
>
>
>
>
>
> --
>
> Regards, --Qifan
>
>
>



-- 
Regards, --Qifan

答复: Hive STRING and VARCHAR types

Posted by "Liu, Ming (Ming)" <mi...@esgyn.cn>.
Agree with QiFan.
I think we should map closest data type from Hive to Trafodion. Current mapping of hive STRING to Trafodion VARCHAR may not be proper. Since Hive can save up to 2G in a String column, but Trafodion VARCHAR has much smaller max size. So if there is a 2G hive string data, how can we convert it into VARCHAR? We can implicitly truncate like Impala, but that not seems good.
But why I cannot find an official Hive manual that describes the max length of STRING?

This seems a big semantic change, if map Hive STRING into Trafodion CLOB, and there will be no confusing anymore, since then STRING and VARCHAR are two very different types. VARCHAR(n) will be treated as n Characters.

Thanks,
Ming


发件人: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
发送时间: 2016年5月26日 23:15
收件人: user@trafodion.incubator.apache.org
主题: RE: Hive STRING and VARCHAR types

But CLOB would limit what predicates and functions one could use on the column, right?

From: Qifan Chen [mailto:qifan.chen@esgyn.com<ma...@esgyn.com>]
Sent: Thursday, May 26, 2016 5:43 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: Re: Hive STRING and VARCHAR types

I wonder if we should consider the same length limit as Hive for a STRING type, which is 2GB (http://www.folkstalk.com/2011/11/data-types-in-hive.html). If so, the Trafodion mapping should be CLOB?

--Qifan

On Wed, May 25, 2016 at 11:49 PM, Selva Govindarajan <se...@esgyn.com>> wrote:
From the Cloudera documentation

Text table considerations:

Text data files can contain values that are longer than allowed by the VARCHAR(n) length limit. Any extra trailing characters are ignored when Impala processes those values during a query

Will Trafodion behave the same way? Having the maximum limit for the individual column provides the flexibility and optimal resource utilization. However, having the limit in number of bytes for String and number of characters for Varchar could be quite confusing for the user.

Selva


From: Hans Zeller [mailto:hans.zeller@esgyn.com<ma...@esgyn.com>]
Sent: Wednesday, May 25, 2016 6:12 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: Hive STRING and VARCHAR types

Hi,

Here is a question on Hive data types. Ming is about to add support for Hive VARCHAR data types, in addition to the existing STRING type, but we hit a question we wanted to pose to the user community. Here is a proposed type mapping from Hive to Trafodion:

Hive type

Trafodion type

Max # of chars

Size in bytes

Existing/new

Comments

STRING

VARCHAR(n BYTES)

n/4 to n

n

existing

n is determined by the HIVE_MAX_STRING_LENGTH CQD

VARCHAR(m)

VARCHAR(m)

m

4*m

proposed


Is it ok if we treat STRING and VARCHAR differently?

Thanks,

Ming and Hans



--
Regards, --Qifan


RE: Hive STRING and VARCHAR types

Posted by Dave Birdsall <da...@esgyn.com>.
But CLOB would limit what predicates and functions one could use on the
column, right?



*From:* Qifan Chen [mailto:qifan.chen@esgyn.com]
*Sent:* Thursday, May 26, 2016 5:43 AM
*To:* user@trafodion.incubator.apache.org
*Subject:* Re: Hive STRING and VARCHAR types



I wonder if we should consider the same length limit as Hive for a STRING
type, which is 2GB (http://www.folkstalk.com/2011/11/data-types-in-hive.html).
If so, the Trafodion mapping should be CLOB?



--Qifan



On Wed, May 25, 2016 at 11:49 PM, Selva Govindarajan <
selva.govindarajan@esgyn.com> wrote:

From the Cloudera documentation

*Text table considerations:*

Text data files can contain values that are longer than allowed by the
VARCHAR(n) length limit. Any extra trailing characters are ignored when
Impala processes those values during a query

Will Trafodion behave the same way? Having the maximum limit for the
individual column provides the flexibility and optimal resource
utilization. However, having the limit in number of bytes for String and
number of characters for Varchar could be quite confusing for the user.

Selva





*From:* Hans Zeller [mailto:hans.zeller@esgyn.com]
*Sent:* Wednesday, May 25, 2016 6:12 PM
*To:* user@trafodion.incubator.apache.org
*Subject:* Hive STRING and VARCHAR types



Hi,



Here is a question on Hive data types. Ming is about to add support for
Hive VARCHAR data types, in addition to the existing STRING type, but we
hit a question we wanted to pose to the user community. Here is a proposed
type mapping from Hive to Trafodion:



*Hive type*

*Trafodion type*

*Max # of chars*

*Size in bytes*

*Existing/new*

*Comments*

STRING

VARCHAR(n BYTES)

n/4 to n

n

existing

n is determined by the HIVE_MAX_STRING_LENGTH CQD

VARCHAR(m)

VARCHAR(m)

m

4*m

proposed



Is it ok if we treat STRING and VARCHAR differently?



Thanks,


Ming and Hans





-- 

Regards, --Qifan

Re: Hive STRING and VARCHAR types

Posted by Qifan Chen <qi...@esgyn.com>.
I wonder if we should consider the same length limit as Hive for a STRING
type, which is 2GB (http://www.folkstalk.com/2011/11/data-types-in-hive.html).
If so, the Trafodion mapping should be CLOB?

--Qifan

On Wed, May 25, 2016 at 11:49 PM, Selva Govindarajan <
selva.govindarajan@esgyn.com> wrote:

> From the Cloudera documentation
>
> *Text table considerations:*
>
> Text data files can contain values that are longer than allowed by the
> VARCHAR(n) length limit. Any extra trailing characters are ignored when
> Impala processes those values during a query
>
> Will Trafodion behave the same way? Having the maximum limit for the
> individual column provides the flexibility and optimal resource
> utilization. However, having the limit in number of bytes for String and
> number of characters for Varchar could be quite confusing for the user.
>
> Selva
>
>
>
>
>
> *From:* Hans Zeller [mailto:hans.zeller@esgyn.com]
> *Sent:* Wednesday, May 25, 2016 6:12 PM
> *To:* user@trafodion.incubator.apache.org
> *Subject:* Hive STRING and VARCHAR types
>
>
>
> Hi,
>
>
>
> Here is a question on Hive data types. Ming is about to add support for
> Hive VARCHAR data types, in addition to the existing STRING type, but we
> hit a question we wanted to pose to the user community. Here is a proposed
> type mapping from Hive to Trafodion:
>
>
>
> *Hive type*
>
> *Trafodion type*
>
> *Max # of chars*
>
> *Size in bytes*
>
> *Existing/new*
>
> *Comments*
>
> STRING
>
> VARCHAR(n BYTES)
>
> n/4 to n
>
> n
>
> existing
>
> n is determined by the HIVE_MAX_STRING_LENGTH CQD
>
> VARCHAR(m)
>
> VARCHAR(m)
>
> m
>
> 4*m
>
> proposed
>
>
>
> Is it ok if we treat STRING and VARCHAR differently?
>
>
>
> Thanks,
>
>
> Ming and Hans
>



-- 
Regards, --Qifan

RE: Hive STRING and VARCHAR types

Posted by Selva Govindarajan <se...@esgyn.com>.
From the Cloudera documentation

*Text table considerations:*

Text data files can contain values that are longer than allowed by the
VARCHAR(n) length limit. Any extra trailing characters are ignored when
Impala processes those values during a query

Will Trafodion behave the same way? Having the maximum limit for the
individual column provides the flexibility and optimal resource
utilization. However, having the limit in number of bytes for String and
number of characters for Varchar could be quite confusing for the user.

Selva





*From:* Hans Zeller [mailto:hans.zeller@esgyn.com]
*Sent:* Wednesday, May 25, 2016 6:12 PM
*To:* user@trafodion.incubator.apache.org
*Subject:* Hive STRING and VARCHAR types



Hi,



Here is a question on Hive data types. Ming is about to add support for
Hive VARCHAR data types, in addition to the existing STRING type, but we
hit a question we wanted to pose to the user community. Here is a proposed
type mapping from Hive to Trafodion:



*Hive type*

*Trafodion type*

*Max # of chars*

*Size in bytes*

*Existing/new*

*Comments*

STRING

VARCHAR(n BYTES)

n/4 to n

n

existing

n is determined by the HIVE_MAX_STRING_LENGTH CQD

VARCHAR(m)

VARCHAR(m)

m

4*m

proposed



Is it ok if we treat STRING and VARCHAR differently?



Thanks,


Ming and Hans