You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Lsw_aka_laplace (Jira)" <ji...@apache.org> on 2020/10/14 03:57:00 UTC
[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

    [ https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213579#comment-17213579 ] 

Lsw_aka_laplace commented on FLINK-19630:
-----------------------------------------

[~jark]

[~lzljs3620320]

 

Would u guys mind taking a glimpse~

> Sink data in [ORC] format into Hive By using Legacy Table API  caused unexpected Exception
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-19630
>                 URL: https://issues.apache.org/jira/browse/FLINK-19630
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive, Table SQL / Ecosystem
>    Affects Versions: 1.11.2
>            Reporter: Lsw_aka_laplace
>            Priority: Critical
>         Attachments: image-2020-10-14-11-36-48-086.png, image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type ：ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some encapsulation upon it)*
> *Using StreamingFileCommitter：YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message：Owner thread shall be the [Legacy Source Thread], but actually the streamTaskThread which represents the whole first stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask Thread
>  
>    According to the thread dump info and the Exception Message, I searched and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask and  LegacySourceTask chained together. the Legacy source task is a seperated thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)