You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhang Qi (Jira)" <ji...@apache.org> on 2023/04/17 11:37:00 UTC

[jira] [Created] (SPARK-43163) An exception occurred while hive table join tidb table

Zhang Qi created SPARK-43163:
--------------------------------

             Summary: An exception occurred while hive table join tidb table
                 Key: SPARK-43163
                 URL: https://issues.apache.org/jira/browse/SPARK-43163
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.3
            Reporter: Zhang Qi


When executing a query of a hive partition table (big one) inner join a tidb table(small one), the hive partition table is auto broadcasted, which leads an error.
The query is somelike
 {{select hive_table.col1,tidb_table.col2 from hive_table inner join tidb_table on hive_table.col2=tidb_table.col3 where ...}}
== Physical Plan ==
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Project [... 109 more fields]
+- Generate HiveGenericUDTF#udf.json.JsonExtractValueUDTF(xxx), [, ... 101 more fields], false, [...]
+- Project [, ... 102 more fields]
+- BroadcastHashJoin [xxx#94], [xxxx#475], Inner, BuildRight, false
:- TiKV CoprocessorRDD\{[table: xxx] TableReader, Columns: xxxx(): { TableRangeScan: { RangeFilter: [], Range: [([t\200\000\000\000\000\000\004\253_r\200\000\000\000\000\000\000\000], [t\200\000\000\000\000\000\004\253_s\000\000\000\000\000\000\000\000])([t\200\000\000\000\000\000\004\253_r\000\000\000\000\000\000\000\000], [t\200\000\000\000\000\000\004\253_r\200\000\000\000\000\000\000\000])] } }, startTs: 440854942292115639} EstimatedCount:20837
+- BroadcastExchange HashedRelationBroadcastMode(List(input[107, string, false]),false), [plan_id=32]
+- Filter isnotnull(xxx#475)
+- Scan hive xx.xxxxxxxx [, ... 100 more fields], HiveTableRelation [{{{}xx{}}}.{{{}xxx{}}}, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: [..., Partition Cols: [[#520|https://github.com/pingcap/tispark/issues/520], [#521|https://github.com/pingcap/tispark/issues/521], [#522|https://github.com/pingcap/tispark/issues/522], [#523|https://github.com/pingcap/tispark/pull/523]], Pruned Partitions: [(, , , )]], [isnotnull(), (), (xx = xx)]

Here I got some log info maybe helpful.
The {{plan.stats.sizeInBytes}} of the LogicalPlan of the hive table is too small and the {{plan.stats.sizeInBytes}} of LogicalPlan of the tidb table is too big.
The stats of the LogicalPlans of the two seems reversed.

*Spark and TiSpark version info*
Spark 3.2.3
TiSpark 3.1.2(with a profile of spark-3.2)
*Additional context*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org