You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Igor Kravzov <ig...@gmail.com> on 2016/06/04 17:12:33 UTC
Convert date in string format to timestamp in table definition
Hi,
I have 2 dates in Json file defined like this
"addDate": "2016-05-17T02:10:44.527",
"postDate": "2016-05-16T02:08:55",
Right now I define external table based on this file like this:
CREATE external TABLE threads_test
(url string,
pagetype string,
adddate string,
postdate string,
posttext string)
partitioned by (yyyymmdd int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location 'my location';
is it possible to define these 2 dates as timestamp?
Do I need to change date format in the file? is it possible to specify date
format in table definition?
Or I better off with string?
Thanks in advance.
Re: Convert date in string format to timestamp in table definition
Posted by Igor Kravzov <ig...@gmail.com>.
Thanks Mich. This date is informational for now. Will see once I need it.
On Sat, Jun 4, 2016 at 5:23 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:
> or just create an internal table and do insert/select from external table
> to that table as Dudu mentioned
>
> hive> use test;
> OK
> hive> desc mytime;
> OK
> adddate timestamp
>
> hive> insert into
> > test.mytime
> > select cast(concat_ws(' ',substring
> ("2016-05-17T02:10:44.527",1,10),substring ("2016-05-17T02:10:44.527",12))
> as timestamp) as adddate;
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 4 June 2016 at 21:31, Igor Kravzov <ig...@gmail.com> wrote:
>
>> Thanks Dudu.
>> So if I need actual date I will use view.
>> Regarding partition column: I can create 2 external tables based on the
>> same data with integer or string column partition and see which one is more
>> convenient for our use.
>>
>> On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dm...@paypal.com>
>> wrote:
>>
>>> I’m not aware of an option to do what you request in the external table
>>> definition but you might want to that using a view.
>>>
>>>
>>>
>>> P.s.
>>>
>>> I seems to me that defining the partition column as a string would be
>>> more user friendly than integer, e.g. –
>>>
>>>
>>>
>>> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
>>>
>>> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
>>>
>>> select * from threads_test where yyyymmdd like ‘______01’ -- first of
>>> every month;
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> $ hdfs dfs -ls -R /tmp/threads_test
>>>
>>> drwxr-xr-x - cloudera supergroup 0 2016-06-04 10:45
>>> /tmp/threads_test/20160604
>>>
>>> -rw-r--r-- 1 cloudera supergroup 136 2016-06-04 10:45
>>> /tmp/threads_test/20160604/data.txt
>>>
>>>
>>>
>>> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
>>>
>>> {"url":"www.blablabla.com
>>> ","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
>>>
>>>
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>> hive> add jar
>>> /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
>>>
>>>
>>>
>>> hive>
>>>
>>> create external table threads_test
>>>
>>> (
>>>
>>> url string
>>>
>>> ,pagetype string
>>>
>>> ,adddate string
>>>
>>> ,postdate string
>>>
>>> ,posttext string
>>>
>>> )
>>>
>>> partitioned by (yyyymmdd string)
>>>
>>> row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
>>>
>>> location '/tmp/threads_test'
>>>
>>> ;
>>>
>>>
>>>
>>> hive> alter table threads_test add partition (yyyymmdd=20160604)
>>> location '/tmp/threads_test/20160604';
>>>
>>>
>>>
>>> hive> select * from threads_test;
>>>
>>>
>>>
>>> www.blablabla.com pg1 2016-05-17T02:10:44.527
>>> 2016-05-16T02:08:55 YadaYada 20160604
>>>
>>>
>>>
>>> hive>
>>>
>>> create view threads_test_v
>>>
>>> as
>>>
>>> select url
>>>
>>> ,pagetype
>>>
>>> ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate
>>> ,12)) as timestamp) as adddate
>>>
>>> ,cast (concat_ws(' ',substr (postdate,1,10),substr
>>> (postdate,12)) as timestamp) as postdate
>>>
>>> ,posttext
>>>
>>>
>>>
>>> from threads_test
>>>
>>> ;
>>>
>>>
>>>
>>> hive> select * from threads_test_v;
>>>
>>>
>>>
>>> www.blablabla.com pg1 2016-05-17 02:10:44.527 2016-05-16
>>> 02:08:55 YadaYada
>>>
>>>
>>>
>>>
>>>
>>> *From:* Igor Kravzov [mailto:igork.inexso@gmail.com]
>>> *Sent:* Saturday, June 04, 2016 8:13 PM
>>> *To:* user@hive.apache.org
>>> *Subject:* Convert date in string format to timestamp in table
>>> definition
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I have 2 dates in Json file defined like this
>>>
>>> "addDate": "2016-05-17T02:10:44.527",
>>>
>>> "postDate": "2016-05-16T02:08:55",
>>>
>>>
>>>
>>> Right now I define external table based on this file like this:
>>>
>>> CREATE external TABLE threads_test
>>>
>>> (url string,
>>>
>>> pagetype string,
>>>
>>> adddate string,
>>>
>>> postdate string,
>>>
>>> posttext string)
>>>
>>> partitioned by (yyyymmdd int)
>>>
>>> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>>>
>>> location 'my location';
>>>
>>>
>>>
>>> is it possible to define these 2 dates as timestamp?
>>>
>>> Do I need to change date format in the file? is it possible to specify
>>> date format in table definition?
>>>
>>> Or I better off with string?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>
>>
>
Re: Convert date in string format to timestamp in table definition
Posted by Mich Talebzadeh <mi...@gmail.com>.
or just create an internal table and do insert/select from external table
to that table as Dudu mentioned
hive> use test;
OK
hive> desc mytime;
OK
adddate timestamp
hive> insert into
> test.mytime
> select cast(concat_ws(' ',substring
("2016-05-17T02:10:44.527",1,10),substring ("2016-05-17T02:10:44.527",12))
as timestamp) as adddate;
HTH
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 4 June 2016 at 21:31, Igor Kravzov <ig...@gmail.com> wrote:
> Thanks Dudu.
> So if I need actual date I will use view.
> Regarding partition column: I can create 2 external tables based on the
> same data with integer or string column partition and see which one is more
> convenient for our use.
>
> On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dm...@paypal.com>
> wrote:
>
>> I’m not aware of an option to do what you request in the external table
>> definition but you might want to that using a view.
>>
>>
>>
>> P.s.
>>
>> I seems to me that defining the partition column as a string would be
>> more user friendly than integer, e.g. –
>>
>>
>>
>> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
>>
>> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
>>
>> select * from threads_test where yyyymmdd like ‘______01’ -- first of
>> every month;
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> $ hdfs dfs -ls -R /tmp/threads_test
>>
>> drwxr-xr-x - cloudera supergroup 0 2016-06-04 10:45
>> /tmp/threads_test/20160604
>>
>> -rw-r--r-- 1 cloudera supergroup 136 2016-06-04 10:45
>> /tmp/threads_test/20160604/data.txt
>>
>>
>>
>> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
>>
>> {"url":"www.blablabla.com
>> ","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
>>
>>
>>
>>
>> ----------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>>
>> hive> add jar
>> /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
>>
>>
>>
>> hive>
>>
>> create external table threads_test
>>
>> (
>>
>> url string
>>
>> ,pagetype string
>>
>> ,adddate string
>>
>> ,postdate string
>>
>> ,posttext string
>>
>> )
>>
>> partitioned by (yyyymmdd string)
>>
>> row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
>>
>> location '/tmp/threads_test'
>>
>> ;
>>
>>
>>
>> hive> alter table threads_test add partition (yyyymmdd=20160604) location
>> '/tmp/threads_test/20160604';
>>
>>
>>
>> hive> select * from threads_test;
>>
>>
>>
>> www.blablabla.com pg1 2016-05-17T02:10:44.527
>> 2016-05-16T02:08:55 YadaYada 20160604
>>
>>
>>
>> hive>
>>
>> create view threads_test_v
>>
>> as
>>
>> select url
>>
>> ,pagetype
>>
>> ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate
>> ,12)) as timestamp) as adddate
>>
>> ,cast (concat_ws(' ',substr (postdate,1,10),substr
>> (postdate,12)) as timestamp) as postdate
>>
>> ,posttext
>>
>>
>>
>> from threads_test
>>
>> ;
>>
>>
>>
>> hive> select * from threads_test_v;
>>
>>
>>
>> www.blablabla.com pg1 2016-05-17 02:10:44.527 2016-05-16
>> 02:08:55 YadaYada
>>
>>
>>
>>
>>
>> *From:* Igor Kravzov [mailto:igork.inexso@gmail.com]
>> *Sent:* Saturday, June 04, 2016 8:13 PM
>> *To:* user@hive.apache.org
>> *Subject:* Convert date in string format to timestamp in table definition
>>
>>
>>
>> Hi,
>>
>>
>>
>> I have 2 dates in Json file defined like this
>>
>> "addDate": "2016-05-17T02:10:44.527",
>>
>> "postDate": "2016-05-16T02:08:55",
>>
>>
>>
>> Right now I define external table based on this file like this:
>>
>> CREATE external TABLE threads_test
>>
>> (url string,
>>
>> pagetype string,
>>
>> adddate string,
>>
>> postdate string,
>>
>> posttext string)
>>
>> partitioned by (yyyymmdd int)
>>
>> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>>
>> location 'my location';
>>
>>
>>
>> is it possible to define these 2 dates as timestamp?
>>
>> Do I need to change date format in the file? is it possible to specify
>> date format in table definition?
>>
>> Or I better off with string?
>>
>>
>>
>> Thanks in advance.
>>
>
>
RE: Convert date in string format to timestamp in table definition
Posted by "Markovitz, Dudu" <dm...@paypal.com>.
‘Never’ is a strong word.
1. We’re talking about the metadata so –
a. The data format is irrelevant
b. The records number is small (scale of thousands)
I would have sacrificed 1 second of metadata processing for a better user experience
2. Partitions values are being held in the metastore (at least with MySQL) as strings
Dudu
From: Jörn Franke [mailto:jornfranke@gmail.com]
Sent: Sunday, June 05, 2016 11:38 AM
To: user@hive.apache.org
Subject: Re: Convert date in string format to timestamp in table definition
Never use string when you can use int - the performance will be much better - especially for tables in Orc / parquet format
On 04 Jun 2016, at 22:31, Igor Kravzov <ig...@gmail.com>> wrote:
Thanks Dudu.
So if I need actual date I will use view.
Regarding partition column: I can create 2 external tables based on the same data with integer or string column partition and see which one is more convenient for our use.
On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dm...@paypal.com>> wrote:
I’m not aware of an option to do what you request in the external table definition but you might want to that using a view.
P.s.
I seems to me that defining the partition column as a string would be more user friendly than integer, e.g. –
select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
select * from threads_test where yyyymmdd like ‘______01’ -- first of every month;
$ hdfs dfs -ls -R /tmp/threads_test
drwxr-xr-x - cloudera supergroup 0 2016-06-04 10:45 /tmp/threads_test/20160604
-rw-r--r-- 1 cloudera supergroup 136 2016-06-04 10:45 /tmp/threads_test/20160604/data.txt
$ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
{"url":"www.blablabla.com<http://www.blablabla.com>","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
----------------------------------------------------------------------------------------------------
hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
hive>
create external table threads_test
(
url string
,pagetype string
,adddate string
,postdate string
,posttext string
)
partitioned by (yyyymmdd string)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
location '/tmp/threads_test'
;
hive> alter table threads_test add partition (yyyymmdd=20160604) location '/tmp/threads_test/20160604';
hive> select * from threads_test;
www.blablabla.com<http://www.blablabla.com> pg1 2016-05-17T02:10:44.527 2016-05-16T02:08:55 YadaYada 20160604
hive>
create view threads_test_v
as
select url
,pagetype
,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as timestamp) as adddate
,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as timestamp) as postdate
,posttext
from threads_test
;
hive> select * from threads_test_v;
www.blablabla.com<http://www.blablabla.com> pg1 2016-05-17 02:10:44.527 2016-05-16 02:08:55 YadaYada
From: Igor Kravzov [mailto:igork.inexso@gmail.com<ma...@gmail.com>]
Sent: Saturday, June 04, 2016 8:13 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Convert date in string format to timestamp in table definition
Hi,
I have 2 dates in Json file defined like this
"addDate": "2016-05-17T02:10:44.527",
"postDate": "2016-05-16T02:08:55",
Right now I define external table based on this file like this:
CREATE external TABLE threads_test
(url string,
pagetype string,
adddate string,
postdate string,
posttext string)
partitioned by (yyyymmdd int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location 'my location';
is it possible to define these 2 dates as timestamp?
Do I need to change date format in the file? is it possible to specify date format in table definition?
Or I better off with string?
Thanks in advance.
Re: Convert date in string format to timestamp in table definition
Posted by Jörn Franke <jo...@gmail.com>.
Never use string when you can use int - the performance will be much better - especially for tables in Orc / parquet format
> On 04 Jun 2016, at 22:31, Igor Kravzov <ig...@gmail.com> wrote:
>
> Thanks Dudu.
> So if I need actual date I will use view.
> Regarding partition column: I can create 2 external tables based on the same data with integer or string column partition and see which one is more convenient for our use.
>
>> On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dm...@paypal.com> wrote:
>> I’m not aware of an option to do what you request in the external table definition but you might want to that using a view.
>>
>>
>>
>> P.s.
>>
>> I seems to me that defining the partition column as a string would be more user friendly than integer, e.g. –
>>
>>
>>
>> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
>>
>> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
>>
>> select * from threads_test where yyyymmdd like ‘______01’ -- first of every month;
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> $ hdfs dfs -ls -R /tmp/threads_test
>>
>> drwxr-xr-x - cloudera supergroup 0 2016-06-04 10:45 /tmp/threads_test/20160604
>>
>> -rw-r--r-- 1 cloudera supergroup 136 2016-06-04 10:45 /tmp/threads_test/20160604/data.txt
>>
>>
>>
>> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
>>
>> {"url":"www.blablabla.com","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
>>
>>
>>
>> ----------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>>
>> hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
>>
>>
>>
>> hive>
>>
>> create external table threads_test
>>
>> (
>>
>> url string
>>
>> ,pagetype string
>>
>> ,adddate string
>>
>> ,postdate string
>>
>> ,posttext string
>>
>> )
>>
>> partitioned by (yyyymmdd string)
>>
>> row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
>>
>> location '/tmp/threads_test'
>>
>> ;
>>
>>
>>
>> hive> alter table threads_test add partition (yyyymmdd=20160604) location '/tmp/threads_test/20160604';
>>
>>
>>
>> hive> select * from threads_test;
>>
>>
>>
>> www.blablabla.com pg1 2016-05-17T02:10:44.527 2016-05-16T02:08:55 YadaYada 20160604
>>
>>
>>
>> hive>
>>
>> create view threads_test_v
>>
>> as
>>
>> select url
>>
>> ,pagetype
>>
>> ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as timestamp) as adddate
>>
>> ,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as timestamp) as postdate
>>
>> ,posttext
>>
>>
>>
>> from threads_test
>>
>> ;
>>
>>
>>
>> hive> select * from threads_test_v;
>>
>>
>>
>> www.blablabla.com pg1 2016-05-17 02:10:44.527 2016-05-16 02:08:55 YadaYada
>>
>>
>>
>>
>>
>> From: Igor Kravzov [mailto:igork.inexso@gmail.com]
>> Sent: Saturday, June 04, 2016 8:13 PM
>> To: user@hive.apache.org
>> Subject: Convert date in string format to timestamp in table definition
>>
>>
>>
>> Hi,
>>
>>
>>
>> I have 2 dates in Json file defined like this
>>
>> "addDate": "2016-05-17T02:10:44.527",
>>
>> "postDate": "2016-05-16T02:08:55",
>>
>>
>>
>> Right now I define external table based on this file like this:
>>
>> CREATE external TABLE threads_test
>>
>> (url string,
>>
>> pagetype string,
>>
>> adddate string,
>>
>> postdate string,
>>
>> posttext string)
>>
>> partitioned by (yyyymmdd int)
>>
>> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>>
>> location 'my location';
>>
>>
>>
>> is it possible to define these 2 dates as timestamp?
>>
>> Do I need to change date format in the file? is it possible to specify date format in table definition?
>>
>> Or I better off with string?
>>
>>
>>
>> Thanks in advance.
>>
>
Re: Convert date in string format to timestamp in table definition
Posted by Igor Kravzov <ig...@gmail.com>.
Thanks Dudu.
So if I need actual date I will use view.
Regarding partition column: I can create 2 external tables based on the
same data with integer or string column partition and see which one is more
convenient for our use.
On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dm...@paypal.com>
wrote:
> I’m not aware of an option to do what you request in the external table
> definition but you might want to that using a view.
>
>
>
> P.s.
>
> I seems to me that defining the partition column as a string would be more
> user friendly than integer, e.g. –
>
>
>
> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
>
> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
>
> select * from threads_test where yyyymmdd like ‘______01’ -- first of
> every month;
>
>
>
>
>
>
>
>
>
>
>
> $ hdfs dfs -ls -R /tmp/threads_test
>
> drwxr-xr-x - cloudera supergroup 0 2016-06-04 10:45
> /tmp/threads_test/20160604
>
> -rw-r--r-- 1 cloudera supergroup 136 2016-06-04 10:45
> /tmp/threads_test/20160604/data.txt
>
>
>
> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
>
> {"url":"www.blablabla.com
> ","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
>
>
>
>
> ----------------------------------------------------------------------------------------------------
>
>
>
>
>
> hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
>
>
>
> hive>
>
> create external table threads_test
>
> (
>
> url string
>
> ,pagetype string
>
> ,adddate string
>
> ,postdate string
>
> ,posttext string
>
> )
>
> partitioned by (yyyymmdd string)
>
> row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
>
> location '/tmp/threads_test'
>
> ;
>
>
>
> hive> alter table threads_test add partition (yyyymmdd=20160604) location
> '/tmp/threads_test/20160604';
>
>
>
> hive> select * from threads_test;
>
>
>
> www.blablabla.com pg1 2016-05-17T02:10:44.527
> 2016-05-16T02:08:55 YadaYada 20160604
>
>
>
> hive>
>
> create view threads_test_v
>
> as
>
> select url
>
> ,pagetype
>
> ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate
> ,12)) as timestamp) as adddate
>
> ,cast (concat_ws(' ',substr (postdate,1,10),substr
> (postdate,12)) as timestamp) as postdate
>
> ,posttext
>
>
>
> from threads_test
>
> ;
>
>
>
> hive> select * from threads_test_v;
>
>
>
> www.blablabla.com pg1 2016-05-17 02:10:44.527 2016-05-16
> 02:08:55 YadaYada
>
>
>
>
>
> *From:* Igor Kravzov [mailto:igork.inexso@gmail.com]
> *Sent:* Saturday, June 04, 2016 8:13 PM
> *To:* user@hive.apache.org
> *Subject:* Convert date in string format to timestamp in table definition
>
>
>
> Hi,
>
>
>
> I have 2 dates in Json file defined like this
>
> "addDate": "2016-05-17T02:10:44.527",
>
> "postDate": "2016-05-16T02:08:55",
>
>
>
> Right now I define external table based on this file like this:
>
> CREATE external TABLE threads_test
>
> (url string,
>
> pagetype string,
>
> adddate string,
>
> postdate string,
>
> posttext string)
>
> partitioned by (yyyymmdd int)
>
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>
> location 'my location';
>
>
>
> is it possible to define these 2 dates as timestamp?
>
> Do I need to change date format in the file? is it possible to specify
> date format in table definition?
>
> Or I better off with string?
>
>
>
> Thanks in advance.
>
RE: Convert date in string format to timestamp in table definition
Posted by "Markovitz, Dudu" <dm...@paypal.com>.
I’m not aware of an option to do what you request in the external table definition but you might want to that using a view.
P.s.
I seems to me that defining the partition column as a string would be more user friendly than integer, e.g. –
select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
select * from threads_test where yyyymmdd like ‘______01’ -- first of every month;
$ hdfs dfs -ls -R /tmp/threads_test
drwxr-xr-x - cloudera supergroup 0 2016-06-04 10:45 /tmp/threads_test/20160604
-rw-r--r-- 1 cloudera supergroup 136 2016-06-04 10:45 /tmp/threads_test/20160604/data.txt
$ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
{"url":"www.blablabla.com","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
----------------------------------------------------------------------------------------------------
hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
hive>
create external table threads_test
(
url string
,pagetype string
,adddate string
,postdate string
,posttext string
)
partitioned by (yyyymmdd string)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
location '/tmp/threads_test'
;
hive> alter table threads_test add partition (yyyymmdd=20160604) location '/tmp/threads_test/20160604';
hive> select * from threads_test;
www.blablabla.com pg1 2016-05-17T02:10:44.527 2016-05-16T02:08:55 YadaYada 20160604
hive>
create view threads_test_v
as
select url
,pagetype
,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as timestamp) as adddate
,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as timestamp) as postdate
,posttext
from threads_test
;
hive> select * from threads_test_v;
www.blablabla.com pg1 2016-05-17 02:10:44.527 2016-05-16 02:08:55 YadaYada
From: Igor Kravzov [mailto:igork.inexso@gmail.com]
Sent: Saturday, June 04, 2016 8:13 PM
To: user@hive.apache.org
Subject: Convert date in string format to timestamp in table definition
Hi,
I have 2 dates in Json file defined like this
"addDate": "2016-05-17T02:10:44.527",
"postDate": "2016-05-16T02:08:55",
Right now I define external table based on this file like this:
CREATE external TABLE threads_test
(url string,
pagetype string,
adddate string,
postdate string,
posttext string)
partitioned by (yyyymmdd int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location 'my location';
is it possible to define these 2 dates as timestamp?
Do I need to change date format in the file? is it possible to specify date format in table definition?
Or I better off with string?
Thanks in advance.