You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2023/04/17 12:37:00 UTC
[jira] [Commented] (SPARK-43163) An exception occurred while hive table join tidb table

    [ https://issues.apache.org/jira/browse/SPARK-43163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713069#comment-17713069 ] 

Yuming Wang commented on SPARK-43163:
-------------------------------------

It seems like TiSpark issue.

> An exception occurred while hive table join tidb table
> ------------------------------------------------------
>
>                 Key: SPARK-43163
>                 URL: https://issues.apache.org/jira/browse/SPARK-43163
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.3
>            Reporter: kroraina
>            Priority: Major
>
> When executing a query of a hive partition table (big one) inner join a tidb table(small one), the hive partition table is auto broadcasted, which leads an error.
> The query is somelike
>  {{select hive_table.col1,tidb_table.col2 from hive_table inner join tidb_table on hive_table.col2=tidb_table.col3 where ...}}
> == Physical Plan ==
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [... 109 more fields]
> +- Generate HiveGenericUDTF#udf.json.JsonExtractValueUDTF(xxx), [, ... 101 more fields], false, [...]
> +- Project [, ... 102 more fields]
> +- BroadcastHashJoin [xxx#94], [xxxx#475], Inner, BuildRight, false
> :- TiKV CoprocessorRDD\{[table: xxx] TableReader, Columns: xxxx(): { TableRangeScan: { RangeFilter: [], Range: [([t\200\000\000\000\000\000\004\253_r\200\000\000\000\000\000\000\000], [t\200\000\000\000\000\000\004\253_s\000\000\000\000\000\000\000\000])([t\200\000\000\000\000\000\004\253_r\000\000\000\000\000\000\000\000], [t\200\000\000\000\000\000\004\253_r\200\000\000\000\000\000\000\000])] } }, startTs: 440854942292115639} EstimatedCount:20837
> +- BroadcastExchange HashedRelationBroadcastMode(List(input[107, string, false]),false), [plan_id=32]
> +- Filter isnotnull(xxx#475)
> +- Scan hive xx.xxxxxxxx [, ... 100 more fields], HiveTableRelation [{{{}xx{}}}.{{{}xxx{}}}, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: [..., Partition Cols: [[#520|https://github.com/pingcap/tispark/issues/520], [#521|https://github.com/pingcap/tispark/issues/521], [#522|https://github.com/pingcap/tispark/issues/522], [#523|https://github.com/pingcap/tispark/pull/523]], Pruned Partitions: [(, , , )]], [isnotnull(), (), (xx = xx)]
> Here I got some log info maybe helpful.
> The {{plan.stats.sizeInBytes}} of the LogicalPlan of the hive table is too small and the {{plan.stats.sizeInBytes}} of LogicalPlan of the tidb table is too big.
> The stats of the LogicalPlans of the two seems reversed.
> *Spark and TiSpark version info*
> Spark 3.2.3
> TiSpark 3.1.2(with a profile of spark-3.2)
> *Additional context*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org