You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Eric Wohlstadter (JIRA)" <ji...@apache.org> on 2018/06/01 21:21:00 UTC
[jira] [Comment Edited] (HIVE-19723) Arrow serde: "Unsupported data
type: Timestamp(NANOSECOND, null)"
[ https://issues.apache.org/jira/browse/HIVE-19723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498593#comment-16498593 ]
Eric Wohlstadter edited comment on HIVE-19723 at 6/1/18 9:20 PM:
-----------------------------------------------------------------
[~teddy.choi]
Hive's Arrow serializer appears to truncate down to MILLISECONDS, but the Jira description calls for MICROSECONDS.
This is motivated by {{org.apache.spark.sql.execution.arrow.ArrowUtils.scala}}
{code:java}
case ts: ArrowType.Timestamp if ts.getUnit == TimeUnit.MICROSECOND => TimestampType{code}
My understanding is that since the primary use-case for {{ArrowUtils}} is Python integration, some of the conversions are currently somewhat particular for Python. Perhaps Python/Pandas only supports MICROSECOND timestamps.
FYI: [~hyukjin.kwon] [~bryanc]
was (Author: ewohlstadter):
[~teddy.choi]
The Arrow serializer appears to truncate down to MILLISECONDS, but the Jira description calls for MICROSECONDS.
This is motivated by {{org.apache.spark.sql.execution.arrow.ArrowUtils.scala}}
{code:java}
case ts: ArrowType.Timestamp if ts.getUnit == TimeUnit.MICROSECOND => TimestampType{code}
My understanding is that since the primary use-case for {{ArrowUtils}} is Python integration, some of the conversions are currently somewhat particular for Python. Perhaps Python/Pandas only supports MICROSECOND timestamps.
FYI: [~hyukjin.kwon] [~bryanc]
> Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
> -----------------------------------------------------------------
>
> Key: HIVE-19723
> URL: https://issues.apache.org/jira/browse/HIVE-19723
> Project: Hive
> Issue Type: Bug
> Reporter: Teddy Choi
> Assignee: Teddy Choi
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19723.1.patch, HIVE-19732.2.patch
>
>
> Spark's Arrow support only provides Timestamp at MICROSECOND granularity. Spark 2.3.0 won't accept NANOSECOND. Switch it back to MICROSECOND.
> The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need to change the assertion to test microsecond. And we'll need to add this to documentation on supported datatypes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)