You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ramakrishna (Jira)" <ji...@apache.org> on 2022/11/28 12:39:00 UTC
[jira] [Created] (SPARK-41298) Getting Count on data frame is giving the performance issue
Ramakrishna created SPARK-41298:
-----------------------------------
Summary: Getting Count on data frame is giving the performance issue
Key: SPARK-41298
URL: https://issues.apache.org/jira/browse/SPARK-41298
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.4
Reporter: Ramakrishna
We are invoking below query on Teradata
1) Dataframe<Row> df = spark.format("jdbc"). . . load();
2) int count = df.count();
When we executed the df.count spark internally issuing the below query on teradata which is wasting the lot of CPU on teradata and DBAs are making noise by seeing this query.
Query : SELECT 1 FROM (<ONE_MILLION_ROWS_TABLE>)SPARK_SUB_TAB
Response:
1
1
1
1
1
..
1
Is this expected behavior form spark.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org