You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ramakrishna (Jira)" <ji...@apache.org> on 2022/11/09 05:43:00 UTC
[jira] [Created] (SPARK-41070) Performance issue when Spark SQL connects with TeraData
Ramakrishna created SPARK-41070:
-----------------------------------
Summary: Performance issue when Spark SQL connects with TeraData
Key: SPARK-41070
URL: https://issues.apache.org/jira/browse/SPARK-41070
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.4
Reporter: Ramakrishna
We are connecting Tera data from spark SQL with below API
Dataset<Row> jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, connectionProperties);
We are facing one issue when we execute this logic on large table with million rows every time we are seeing below extra query is executing every times as this resulting performance hit on DB.
This below information we got from DBA. We dont have any logs on SPARK SQL.
SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
Can you please clarify why this query is executing or is there any chance that this query is executing from our code it self while check for rows count from dataframe.
Please provide me your inputs on this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org