You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2016/10/26 17:15:25 UTC

csv date/timestamp type inference in spark 2.0.1

we had the inference of dates/timestamps when reading csv files disabled in
spark 2.0.0 by always setting dateFormat to something impossible (e.g.
dateFormat "~|.G~z~a|wW")

i noticed in spark 2.0.1 that setting this impossible dateFormat does not
stop spark from inferring it is a date or timestamp type anyhow. is this
intentional? how do i disable inference of datetype/timestamp type now?

thanks! koert

Re: csv date/timestamp type inference in spark 2.0.1

Posted by Hyukjin Kwon <gu...@gmail.com>.
Hi Koert,


I am curious about your case. I guess the purpose of timestampFormat and
dateFormat is to infer timestamps/dates when parsing/inferring

but not to exclude the type inference/parsing. Actually, it does try to
infer/parse in 2.0.0 as well (but it fails) so actually I guess there
wouldn't be a big performance difference.


I guess it is type inference and therefore it is the right behaviour that
it tries to do its best to infer the appropriate type inclusively.

Why don't you just cast the timestamps to strings?


Thanks.


2016-10-27 9:47 GMT+09:00 Koert Kuipers <ko...@tresata.com>:

> i tried setting both dateFormat and timestampFormat to impossible values
> (e.g. "~|.G~z~a|wW") and it still detected my data to be TimestampType
>
> On Wed, Oct 26, 2016 at 1:15 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> we had the inference of dates/timestamps when reading csv files disabled
>> in spark 2.0.0 by always setting dateFormat to something impossible (e.g.
>> dateFormat "~|.G~z~a|wW")
>>
>> i noticed in spark 2.0.1 that setting this impossible dateFormat does not
>> stop spark from inferring it is a date or timestamp type anyhow. is this
>> intentional? how do i disable inference of datetype/timestamp type now?
>>
>> thanks! koert
>>
>>
>

Re: csv date/timestamp type inference in spark 2.0.1

Posted by Koert Kuipers <ko...@tresata.com>.
i tried setting both dateFormat and timestampFormat to impossible values
(e.g. "~|.G~z~a|wW") and it still detected my data to be TimestampType

On Wed, Oct 26, 2016 at 1:15 PM, Koert Kuipers <ko...@tresata.com> wrote:

> we had the inference of dates/timestamps when reading csv files disabled
> in spark 2.0.0 by always setting dateFormat to something impossible (e.g.
> dateFormat "~|.G~z~a|wW")
>
> i noticed in spark 2.0.1 that setting this impossible dateFormat does not
> stop spark from inferring it is a date or timestamp type anyhow. is this
> intentional? how do i disable inference of datetype/timestamp type now?
>
> thanks! koert
>
>