You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Igor Kravzov <ig...@gmail.com> on 2016/06/04 17:12:33 UTC

Convert date in string format to timestamp in table definition

Hi,

I have 2 dates in Json file defined like this
"addDate": "2016-05-17T02:10:44.527",
  "postDate": "2016-05-16T02:08:55",

Right now I define external table based on this file like this:
CREATE external TABLE threads_test
(url string,
 pagetype string,
 adddate string,
 postdate string,
 posttext string)
partitioned by (yyyymmdd int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location 'my location';

is it possible to define these 2 dates as timestamp?
Do I need to change date format in the file? is it possible to specify date
format in table definition?
Or I better off with string?

Thanks in advance.

Re: Convert date in string format to timestamp in table definition

Posted by Igor Kravzov <ig...@gmail.com>.
Thanks Mich. This date is informational for now. Will see once I need it.

On Sat, Jun 4, 2016 at 5:23 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> or just create an internal table and do insert/select from external table
> to that table as Dudu mentioned
>
> hive> use test;
> OK
> hive> desc mytime;
> OK
> adddate                 timestamp
>
> hive> insert into
>     > test.mytime
>     > select cast(concat_ws(' ',substring
> ("2016-05-17T02:10:44.527",1,10),substring ("2016-05-17T02:10:44.527",12))
> as timestamp)  as adddate;
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 4 June 2016 at 21:31, Igor Kravzov <ig...@gmail.com> wrote:
>
>> Thanks Dudu.
>> So if I need actual date I will use view.
>> Regarding partition column:  I can create 2 external tables based on the
>> same data with integer or string column partition and see which one is more
>> convenient for our use.
>>
>> On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dm...@paypal.com>
>> wrote:
>>
>>> I’m not aware of an option to do what you request in the external table
>>> definition but you might want to that using a view.
>>>
>>>
>>>
>>> P.s.
>>>
>>> I seems to me that defining the partition column as a string would be
>>> more user friendly than integer, e.g. –
>>>
>>>
>>>
>>> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
>>>
>>> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
>>>
>>> select * from threads_test where yyyymmdd like ‘______01’ -- first of
>>> every month;
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> $ hdfs dfs -ls -R /tmp/threads_test
>>>
>>> drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45
>>> /tmp/threads_test/20160604
>>>
>>> -rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45
>>> /tmp/threads_test/20160604/data.txt
>>>
>>>
>>>
>>> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
>>>
>>> {"url":"www.blablabla.com
>>> ","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
>>>
>>>
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>> hive> add jar
>>> /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
>>>
>>>
>>>
>>> hive>
>>>
>>> create external table threads_test
>>>
>>> (
>>>
>>>     url         string
>>>
>>>    ,pagetype    string
>>>
>>>    ,adddate     string
>>>
>>>    ,postdate    string
>>>
>>>    ,posttext    string
>>>
>>> )
>>>
>>>     partitioned by (yyyymmdd string)
>>>
>>>     row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
>>>
>>>     location '/tmp/threads_test'
>>>
>>> ;
>>>
>>>
>>>
>>> hive> alter table threads_test add partition (yyyymmdd=20160604)
>>> location '/tmp/threads_test/20160604';
>>>
>>>
>>>
>>> hive> select * from threads_test;
>>>
>>>
>>>
>>> www.blablabla.com    pg1    2016-05-17T02:10:44.527
>>> 2016-05-16T02:08:55  YadaYada      20160604
>>>
>>>
>>>
>>> hive>
>>>
>>> create view threads_test_v
>>>
>>> as
>>>
>>> select      url
>>>
>>>            ,pagetype
>>>
>>>            ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate
>>> ,12)) as timestamp)  as adddate
>>>
>>>            ,cast (concat_ws(' ',substr (postdate,1,10),substr
>>> (postdate,12)) as timestamp)  as postdate
>>>
>>>            ,posttext
>>>
>>>
>>>
>>> from        threads_test
>>>
>>> ;
>>>
>>>
>>>
>>> hive> select * from threads_test_v;
>>>
>>>
>>>
>>> www.blablabla.com    pg1    2016-05-17 02:10:44.527    2016-05-16
>>> 02:08:55  YadaYada
>>>
>>>
>>>
>>>
>>>
>>> *From:* Igor Kravzov [mailto:igork.inexso@gmail.com]
>>> *Sent:* Saturday, June 04, 2016 8:13 PM
>>> *To:* user@hive.apache.org
>>> *Subject:* Convert date in string format to timestamp in table
>>> definition
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I have 2 dates in Json file defined like this
>>>
>>> "addDate": "2016-05-17T02:10:44.527",
>>>
>>>   "postDate": "2016-05-16T02:08:55",
>>>
>>>
>>>
>>> Right now I define external table based on this file like this:
>>>
>>> CREATE external TABLE threads_test
>>>
>>> (url string,
>>>
>>>  pagetype string,
>>>
>>>  adddate string,
>>>
>>>  postdate string,
>>>
>>>  posttext string)
>>>
>>> partitioned by (yyyymmdd int)
>>>
>>> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>>>
>>> location 'my location';
>>>
>>>
>>>
>>> is it possible to define these 2 dates as timestamp?
>>>
>>> Do I need to change date format in the file? is it possible to specify
>>> date format in table definition?
>>>
>>> Or I better off with string?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>
>>
>

Re: Convert date in string format to timestamp in table definition

Posted by Mich Talebzadeh <mi...@gmail.com>.
or just create an internal table and do insert/select from external table
to that table as Dudu mentioned

hive> use test;
OK
hive> desc mytime;
OK
adddate                 timestamp

hive> insert into
    > test.mytime
    > select cast(concat_ws(' ',substring
("2016-05-17T02:10:44.527",1,10),substring ("2016-05-17T02:10:44.527",12))
as timestamp)  as adddate;

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 4 June 2016 at 21:31, Igor Kravzov <ig...@gmail.com> wrote:

> Thanks Dudu.
> So if I need actual date I will use view.
> Regarding partition column:  I can create 2 external tables based on the
> same data with integer or string column partition and see which one is more
> convenient for our use.
>
> On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dm...@paypal.com>
> wrote:
>
>> I’m not aware of an option to do what you request in the external table
>> definition but you might want to that using a view.
>>
>>
>>
>> P.s.
>>
>> I seems to me that defining the partition column as a string would be
>> more user friendly than integer, e.g. –
>>
>>
>>
>> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
>>
>> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
>>
>> select * from threads_test where yyyymmdd like ‘______01’ -- first of
>> every month;
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> $ hdfs dfs -ls -R /tmp/threads_test
>>
>> drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45
>> /tmp/threads_test/20160604
>>
>> -rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45
>> /tmp/threads_test/20160604/data.txt
>>
>>
>>
>> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
>>
>> {"url":"www.blablabla.com
>> ","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
>>
>>
>>
>>
>> ----------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>>
>> hive> add jar
>> /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
>>
>>
>>
>> hive>
>>
>> create external table threads_test
>>
>> (
>>
>>     url         string
>>
>>    ,pagetype    string
>>
>>    ,adddate     string
>>
>>    ,postdate    string
>>
>>    ,posttext    string
>>
>> )
>>
>>     partitioned by (yyyymmdd string)
>>
>>     row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
>>
>>     location '/tmp/threads_test'
>>
>> ;
>>
>>
>>
>> hive> alter table threads_test add partition (yyyymmdd=20160604) location
>> '/tmp/threads_test/20160604';
>>
>>
>>
>> hive> select * from threads_test;
>>
>>
>>
>> www.blablabla.com    pg1    2016-05-17T02:10:44.527
>> 2016-05-16T02:08:55  YadaYada      20160604
>>
>>
>>
>> hive>
>>
>> create view threads_test_v
>>
>> as
>>
>> select      url
>>
>>            ,pagetype
>>
>>            ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate
>> ,12)) as timestamp)  as adddate
>>
>>            ,cast (concat_ws(' ',substr (postdate,1,10),substr
>> (postdate,12)) as timestamp)  as postdate
>>
>>            ,posttext
>>
>>
>>
>> from        threads_test
>>
>> ;
>>
>>
>>
>> hive> select * from threads_test_v;
>>
>>
>>
>> www.blablabla.com    pg1    2016-05-17 02:10:44.527    2016-05-16
>> 02:08:55  YadaYada
>>
>>
>>
>>
>>
>> *From:* Igor Kravzov [mailto:igork.inexso@gmail.com]
>> *Sent:* Saturday, June 04, 2016 8:13 PM
>> *To:* user@hive.apache.org
>> *Subject:* Convert date in string format to timestamp in table definition
>>
>>
>>
>> Hi,
>>
>>
>>
>> I have 2 dates in Json file defined like this
>>
>> "addDate": "2016-05-17T02:10:44.527",
>>
>>   "postDate": "2016-05-16T02:08:55",
>>
>>
>>
>> Right now I define external table based on this file like this:
>>
>> CREATE external TABLE threads_test
>>
>> (url string,
>>
>>  pagetype string,
>>
>>  adddate string,
>>
>>  postdate string,
>>
>>  posttext string)
>>
>> partitioned by (yyyymmdd int)
>>
>> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>>
>> location 'my location';
>>
>>
>>
>> is it possible to define these 2 dates as timestamp?
>>
>> Do I need to change date format in the file? is it possible to specify
>> date format in table definition?
>>
>> Or I better off with string?
>>
>>
>>
>> Thanks in advance.
>>
>
>

RE: Convert date in string format to timestamp in table definition

Posted by "Markovitz, Dudu" <dm...@paypal.com>.
‘Never’ is a strong word.


1.       We’re talking about the metadata so –



a.       The data format is irrelevant



b.      The records number is small (scale of thousands)

I would have sacrificed 1 second of metadata processing for a better user experience



2.       Partitions values are being held in the metastore (at least with MySQL)  as strings

Dudu

From: Jörn Franke [mailto:jornfranke@gmail.com]
Sent: Sunday, June 05, 2016 11:38 AM
To: user@hive.apache.org
Subject: Re: Convert date in string format to timestamp in table definition

Never use string when you can use int - the performance will be much better - especially for tables in Orc / parquet format

On 04 Jun 2016, at 22:31, Igor Kravzov <ig...@gmail.com>> wrote:
Thanks Dudu.
So if I need actual date I will use view.
Regarding partition column:  I can create 2 external tables based on the same data with integer or string column partition and see which one is more convenient for our use.

On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dm...@paypal.com>> wrote:
I’m not aware of an option to do what you request in the external table definition but you might want to that using a view.

P.s.
I seems to me that defining the partition column as a string would be more user friendly than integer, e.g. –

select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
select * from threads_test where yyyymmdd like ‘______01’ -- first of every month;





$ hdfs dfs -ls -R /tmp/threads_test
drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45 /tmp/threads_test/20160604
-rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45 /tmp/threads_test/20160604/data.txt

$ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
{"url":"www.blablabla.com<http://www.blablabla.com>","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}

----------------------------------------------------------------------------------------------------


hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;

hive>
create external table threads_test
(
    url         string
   ,pagetype    string
   ,adddate     string
   ,postdate    string
   ,posttext    string
)
    partitioned by (yyyymmdd string)
    row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    location '/tmp/threads_test'
;

hive> alter table threads_test add partition (yyyymmdd=20160604) location '/tmp/threads_test/20160604';

hive> select * from threads_test;

www.blablabla.com<http://www.blablabla.com>    pg1    2016-05-17T02:10:44.527    2016-05-16T02:08:55  YadaYada      20160604

hive>
create view threads_test_v
as
select      url
           ,pagetype
           ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as timestamp)  as adddate
           ,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as timestamp)  as postdate
           ,posttext

from        threads_test
;

hive> select * from threads_test_v;

www.blablabla.com<http://www.blablabla.com>    pg1    2016-05-17 02:10:44.527    2016-05-16 02:08:55  YadaYada


From: Igor Kravzov [mailto:igork.inexso@gmail.com<ma...@gmail.com>]
Sent: Saturday, June 04, 2016 8:13 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Convert date in string format to timestamp in table definition

Hi,

I have 2 dates in Json file defined like this
"addDate": "2016-05-17T02:10:44.527",
  "postDate": "2016-05-16T02:08:55",

Right now I define external table based on this file like this:
CREATE external TABLE threads_test
(url string,
 pagetype string,
 adddate string,
 postdate string,
 posttext string)
partitioned by (yyyymmdd int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location 'my location';

is it possible to define these 2 dates as timestamp?
Do I need to change date format in the file? is it possible to specify date format in table definition?
Or I better off with string?

Thanks in advance.


Re: Convert date in string format to timestamp in table definition

Posted by Jörn Franke <jo...@gmail.com>.
Never use string when you can use int - the performance will be much better - especially for tables in Orc / parquet format 

> On 04 Jun 2016, at 22:31, Igor Kravzov <ig...@gmail.com> wrote:
> 
> Thanks Dudu. 
> So if I need actual date I will use view.
> Regarding partition column:  I can create 2 external tables based on the same data with integer or string column partition and see which one is more convenient for our use.
> 
>> On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dm...@paypal.com> wrote:
>> I’m not aware of an option to do what you request in the external table definition but you might want to that using a view.
>> 
>>  
>> 
>> P.s.
>> 
>> I seems to me that defining the partition column as a string would be more user friendly than integer, e.g. –
>> 
>>  
>> 
>> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
>> 
>> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
>> 
>> select * from threads_test where yyyymmdd like ‘______01’ -- first of every month;
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> $ hdfs dfs -ls -R /tmp/threads_test
>> 
>> drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45 /tmp/threads_test/20160604
>> 
>> -rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45 /tmp/threads_test/20160604/data.txt
>> 
>>  
>> 
>> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
>> 
>> {"url":"www.blablabla.com","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
>> 
>>  
>> 
>> ----------------------------------------------------------------------------------------------------
>> 
>>  
>> 
>>  
>> 
>> hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
>> 
>>  
>> 
>> hive>
>> 
>> create external table threads_test
>> 
>> (
>> 
>>     url         string
>> 
>>    ,pagetype    string
>> 
>>    ,adddate     string
>> 
>>    ,postdate    string
>> 
>>    ,posttext    string
>> 
>> )
>> 
>>     partitioned by (yyyymmdd string)
>> 
>>     row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
>> 
>>     location '/tmp/threads_test'
>> 
>> ;
>> 
>>  
>> 
>> hive> alter table threads_test add partition (yyyymmdd=20160604) location '/tmp/threads_test/20160604';
>> 
>>  
>> 
>> hive> select * from threads_test;
>> 
>>  
>> 
>> www.blablabla.com    pg1    2016-05-17T02:10:44.527    2016-05-16T02:08:55  YadaYada      20160604
>> 
>>  
>> 
>> hive>
>> 
>> create view threads_test_v
>> 
>> as
>> 
>> select      url    
>> 
>>            ,pagetype
>> 
>>            ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as timestamp)  as adddate 
>> 
>>            ,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as timestamp)  as postdate 
>> 
>>            ,posttext
>> 
>>            
>> 
>> from        threads_test
>> 
>> ;
>> 
>>  
>> 
>> hive> select * from threads_test_v;
>> 
>>  
>> 
>> www.blablabla.com    pg1    2016-05-17 02:10:44.527    2016-05-16 02:08:55  YadaYada
>> 
>>  
>> 
>>  
>> 
>> From: Igor Kravzov [mailto:igork.inexso@gmail.com] 
>> Sent: Saturday, June 04, 2016 8:13 PM
>> To: user@hive.apache.org
>> Subject: Convert date in string format to timestamp in table definition
>> 
>>  
>> 
>> Hi,
>> 
>>  
>> 
>> I have 2 dates in Json file defined like this
>> 
>> "addDate": "2016-05-17T02:10:44.527",
>> 
>>   "postDate": "2016-05-16T02:08:55",
>> 
>>  
>> 
>> Right now I define external table based on this file like this:
>> 
>> CREATE external TABLE threads_test 
>> 
>> (url string,
>> 
>>  pagetype string,
>> 
>>  adddate string,
>> 
>>  postdate string,
>> 
>>  posttext string)
>> 
>> partitioned by (yyyymmdd int)
>> 
>> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>> 
>> location 'my location';
>> 
>>  
>> 
>> is it possible to define these 2 dates as timestamp? 
>> 
>> Do I need to change date format in the file? is it possible to specify date format in table definition?
>> 
>> Or I better off with string?
>> 
>>  
>> 
>> Thanks in advance.
>> 
> 

Re: Convert date in string format to timestamp in table definition

Posted by Igor Kravzov <ig...@gmail.com>.
Thanks Dudu.
So if I need actual date I will use view.
Regarding partition column:  I can create 2 external tables based on the
same data with integer or string column partition and see which one is more
convenient for our use.

On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dm...@paypal.com>
wrote:

> I’m not aware of an option to do what you request in the external table
> definition but you might want to that using a view.
>
>
>
> P.s.
>
> I seems to me that defining the partition column as a string would be more
> user friendly than integer, e.g. –
>
>
>
> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
>
> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
>
> select * from threads_test where yyyymmdd like ‘______01’ -- first of
> every month;
>
>
>
>
>
>
>
>
>
>
>
> $ hdfs dfs -ls -R /tmp/threads_test
>
> drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45
> /tmp/threads_test/20160604
>
> -rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45
> /tmp/threads_test/20160604/data.txt
>
>
>
> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
>
> {"url":"www.blablabla.com
> ","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
>
>
>
>
> ----------------------------------------------------------------------------------------------------
>
>
>
>
>
> hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
>
>
>
> hive>
>
> create external table threads_test
>
> (
>
>     url         string
>
>    ,pagetype    string
>
>    ,adddate     string
>
>    ,postdate    string
>
>    ,posttext    string
>
> )
>
>     partitioned by (yyyymmdd string)
>
>     row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
>
>     location '/tmp/threads_test'
>
> ;
>
>
>
> hive> alter table threads_test add partition (yyyymmdd=20160604) location
> '/tmp/threads_test/20160604';
>
>
>
> hive> select * from threads_test;
>
>
>
> www.blablabla.com    pg1    2016-05-17T02:10:44.527
> 2016-05-16T02:08:55  YadaYada      20160604
>
>
>
> hive>
>
> create view threads_test_v
>
> as
>
> select      url
>
>            ,pagetype
>
>            ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate
> ,12)) as timestamp)  as adddate
>
>            ,cast (concat_ws(' ',substr (postdate,1,10),substr
> (postdate,12)) as timestamp)  as postdate
>
>            ,posttext
>
>
>
> from        threads_test
>
> ;
>
>
>
> hive> select * from threads_test_v;
>
>
>
> www.blablabla.com    pg1    2016-05-17 02:10:44.527    2016-05-16
> 02:08:55  YadaYada
>
>
>
>
>
> *From:* Igor Kravzov [mailto:igork.inexso@gmail.com]
> *Sent:* Saturday, June 04, 2016 8:13 PM
> *To:* user@hive.apache.org
> *Subject:* Convert date in string format to timestamp in table definition
>
>
>
> Hi,
>
>
>
> I have 2 dates in Json file defined like this
>
> "addDate": "2016-05-17T02:10:44.527",
>
>   "postDate": "2016-05-16T02:08:55",
>
>
>
> Right now I define external table based on this file like this:
>
> CREATE external TABLE threads_test
>
> (url string,
>
>  pagetype string,
>
>  adddate string,
>
>  postdate string,
>
>  posttext string)
>
> partitioned by (yyyymmdd int)
>
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>
> location 'my location';
>
>
>
> is it possible to define these 2 dates as timestamp?
>
> Do I need to change date format in the file? is it possible to specify
> date format in table definition?
>
> Or I better off with string?
>
>
>
> Thanks in advance.
>

RE: Convert date in string format to timestamp in table definition

Posted by "Markovitz, Dudu" <dm...@paypal.com>.
I’m not aware of an option to do what you request in the external table definition but you might want to that using a view.

P.s.
I seems to me that defining the partition column as a string would be more user friendly than integer, e.g. –

select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
select * from threads_test where yyyymmdd like ‘______01’ -- first of every month;





$ hdfs dfs -ls -R /tmp/threads_test
drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45 /tmp/threads_test/20160604
-rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45 /tmp/threads_test/20160604/data.txt

$ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
{"url":"www.blablabla.com","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}

----------------------------------------------------------------------------------------------------


hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;

hive>
create external table threads_test
(
    url         string
   ,pagetype    string
   ,adddate     string
   ,postdate    string
   ,posttext    string
)
    partitioned by (yyyymmdd string)
    row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    location '/tmp/threads_test'
;

hive> alter table threads_test add partition (yyyymmdd=20160604) location '/tmp/threads_test/20160604';

hive> select * from threads_test;

www.blablabla.com    pg1    2016-05-17T02:10:44.527    2016-05-16T02:08:55  YadaYada      20160604

hive>
create view threads_test_v
as
select      url
           ,pagetype
           ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as timestamp)  as adddate
           ,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as timestamp)  as postdate
           ,posttext

from        threads_test
;

hive> select * from threads_test_v;

www.blablabla.com    pg1    2016-05-17 02:10:44.527    2016-05-16 02:08:55  YadaYada


From: Igor Kravzov [mailto:igork.inexso@gmail.com]
Sent: Saturday, June 04, 2016 8:13 PM
To: user@hive.apache.org
Subject: Convert date in string format to timestamp in table definition

Hi,

I have 2 dates in Json file defined like this
"addDate": "2016-05-17T02:10:44.527",
  "postDate": "2016-05-16T02:08:55",

Right now I define external table based on this file like this:
CREATE external TABLE threads_test
(url string,
 pagetype string,
 adddate string,
 postdate string,
 posttext string)
partitioned by (yyyymmdd int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location 'my location';

is it possible to define these 2 dates as timestamp?
Do I need to change date format in the file? is it possible to specify date format in table definition?
Or I better off with string?

Thanks in advance.