You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Anup Tiwari <an...@gmail.com> on 2020/01/13 07:53:48 UTC

Date Datatype Issue

Hi Team,

I have a table data written in parquet format by hive which contains DATE
datatype.
When i am reading it in Impala, it is giving me an error due to data type.
So do we have any workaround for this like cast("datecolumn" as string)?

Regards,
Anup Tiwari

Re: Date Datatype Issue

Posted by Csaba Ringhofer <cs...@cloudera.com>.
Sorry for joining this thread so late, I just bumped into it, maybe the tip
below can still help.

If you want to convert the number representation of dates into a string
representation, you can also use existing timestamp function that deal with
unix time:
select from_unixtime(date_col_as_biging*24*60*60, "yyyy-MM-dd") from
table_name;
So we calculate the number of seconds since 1970-01-01 by multiplying the
days since 1970-01-01 with the number of seconds in a day.
Note that this will only work for the 1400-01-01 to 9999-12-31 range, while
date supports 0001-01-01 to 9999-12-31. Date in unsupported range will
become NULLs.

regards,
Csaba

On Mon, Jan 20, 2020 at 2:32 PM Zoltán Borók-Nagy <bo...@cloudera.com>
wrote:

> Hi Anup,
>
> The Impala documentation contains a good description about UDFs:
> https://impala.apache.org/docs/build/html/topics/impala_udf.html
> You might also want to take a look at this repo that contains a couple of
> examples: https://github.com/cloudera/impala-udf-samples
>
> For best performance, I recommend you to write your UDF in C++.
>
> Cheers,
>      Zoltan
>
>
> On Fri, Jan 17, 2020 at 8:24 PM Anup Tiwari <an...@gmail.com>
> wrote:
>
>> Thanks Zoltán for update.
>> Can you provide me some links for UDF development for such use case ?
>>
>>
>> Regards,
>> Anup Tiwari
>>
>>
>> On Mon, Jan 13, 2020 at 9:59 PM Zoltán Borók-Nagy <
>> boroknagyz@cloudera.com> wrote:
>>
>>> Hi Anup,
>>>
>>> Impala added support for the DATE type in version 3.3.
>>>
>>> Parquet represents dates as 32-bit integers, storing the number of days
>>> from the Unix epoch, 1 January 1970.
>>> For pre-3.3 versions of Impala it means even if you could read the
>>> numbers from the Parquet file you'd still need to write a UDF that converts
>>> them to strings.
>>>
>>> Cheers,
>>>     Zoltan
>>>
>>>
>>> On Mon, Jan 13, 2020 at 8:54 AM Anup Tiwari <an...@gmail.com>
>>> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> I have a table data written in parquet format by hive which contains
>>>> DATE datatype.
>>>> When i am reading it in Impala, it is giving me an error due to data
>>>> type.
>>>> So do we have any workaround for this like cast("datecolumn" as string)?
>>>>
>>>> Regards,
>>>> Anup Tiwari
>>>>
>>>

Re: Date Datatype Issue

Posted by Zoltán Borók-Nagy <bo...@cloudera.com>.
Hi Anup,

The Impala documentation contains a good description about UDFs:
https://impala.apache.org/docs/build/html/topics/impala_udf.html
You might also want to take a look at this repo that contains a couple of
examples: https://github.com/cloudera/impala-udf-samples

For best performance, I recommend you to write your UDF in C++.

Cheers,
     Zoltan


On Fri, Jan 17, 2020 at 8:24 PM Anup Tiwari <an...@gmail.com> wrote:

> Thanks Zoltán for update.
> Can you provide me some links for UDF development for such use case ?
>
>
> Regards,
> Anup Tiwari
>
>
> On Mon, Jan 13, 2020 at 9:59 PM Zoltán Borók-Nagy <bo...@cloudera.com>
> wrote:
>
>> Hi Anup,
>>
>> Impala added support for the DATE type in version 3.3.
>>
>> Parquet represents dates as 32-bit integers, storing the number of days
>> from the Unix epoch, 1 January 1970.
>> For pre-3.3 versions of Impala it means even if you could read the
>> numbers from the Parquet file you'd still need to write a UDF that converts
>> them to strings.
>>
>> Cheers,
>>     Zoltan
>>
>>
>> On Mon, Jan 13, 2020 at 8:54 AM Anup Tiwari <an...@gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> I have a table data written in parquet format by hive which contains
>>> DATE datatype.
>>> When i am reading it in Impala, it is giving me an error due to data
>>> type.
>>> So do we have any workaround for this like cast("datecolumn" as string)?
>>>
>>> Regards,
>>> Anup Tiwari
>>>
>>

Re: Date Datatype Issue

Posted by Anup Tiwari <an...@gmail.com>.
Thanks Zoltán for update.
Can you provide me some links for UDF development for such use case ?


Regards,
Anup Tiwari


On Mon, Jan 13, 2020 at 9:59 PM Zoltán Borók-Nagy <bo...@cloudera.com>
wrote:

> Hi Anup,
>
> Impala added support for the DATE type in version 3.3.
>
> Parquet represents dates as 32-bit integers, storing the number of days
> from the Unix epoch, 1 January 1970.
> For pre-3.3 versions of Impala it means even if you could read the numbers
> from the Parquet file you'd still need to write a UDF that converts them to
> strings.
>
> Cheers,
>     Zoltan
>
>
> On Mon, Jan 13, 2020 at 8:54 AM Anup Tiwari <an...@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> I have a table data written in parquet format by hive which contains DATE
>> datatype.
>> When i am reading it in Impala, it is giving me an error due to data
>> type.
>> So do we have any workaround for this like cast("datecolumn" as string)?
>>
>> Regards,
>> Anup Tiwari
>>
>

Re: Date Datatype Issue

Posted by Zoltán Borók-Nagy <bo...@cloudera.com>.
Hi Anup,

Impala added support for the DATE type in version 3.3.

Parquet represents dates as 32-bit integers, storing the number of days
from the Unix epoch, 1 January 1970.
For pre-3.3 versions of Impala it means even if you could read the numbers
from the Parquet file you'd still need to write a UDF that converts them to
strings.

Cheers,
    Zoltan


On Mon, Jan 13, 2020 at 8:54 AM Anup Tiwari <an...@gmail.com> wrote:

> Hi Team,
>
> I have a table data written in parquet format by hive which contains DATE
> datatype.
> When i am reading it in Impala, it is giving me an error due to data type.
> So do we have any workaround for this like cast("datecolumn" as string)?
>
> Regards,
> Anup Tiwari
>