You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chetan Khatri <ch...@gmail.com> on 2021/05/06 02:15:18 UTC
Performance Improvement: Collect in spark taking huge time
Hi All, Collect in spark is taking huge time. I want to get list of values
of one column to Scala collection. How can I do this?
val newDynamicFieldTablesDF = cachedPhoenixAppMetaDataForCreateTableDF
.select(col("reporting_table")).except(clientSchemaDF)
logger.info(s"####### except with client-schema done " +
LocalDateTime.now())
// newDynamicFieldTablesDF.cache()
val detailsForCreateTableDF =
cachedPhoenixAppMetaDataForCreateTableDF
.join(broadcast(newDynamicFieldTablesDF),
Seq("reporting_table"), "inner")
logger.info(s"####### join with newDF done " +
LocalDateTime.now())
// detailsForCreateTableDF.cache()
val newDynamicFieldTablesList = newDynamicFieldTablesDF.map(r
=> r.getString(0)).collect().toSet
Later, I am iterating this list for one the use case to create a custom
definition table:
newDynamicFieldTablesList.foreach(table => { // running here Create table
DDL/SQL query })
Thank you so much
Re: Performance Improvement: Collect in spark taking huge time
Posted by Chetan Khatri <ch...@gmail.com>.
Hi All,
Do you think, replacing the collect() (for having scala collection for
loop) with below codeblock will have any benefit?
cachedColumnsAddTableDF.select("reporting_table").distinct().foreach(r => {
r.getAs("reporting_table").asInstanceOf[String]
})
On Wed, May 5, 2021 at 10:15 PM Chetan Khatri <ch...@gmail.com>
wrote:
> Hi All, Collect in spark is taking huge time. I want to get list of values
> of one column to Scala collection. How can I do this?
> val newDynamicFieldTablesDF = cachedPhoenixAppMetaDataForCreateTableDF
> .select(col("reporting_table")).except(clientSchemaDF)
> logger.info(s"####### except with client-schema done " +
> LocalDateTime.now())
> // newDynamicFieldTablesDF.cache()
>
>
> val detailsForCreateTableDF =
> cachedPhoenixAppMetaDataForCreateTableDF
> .join(broadcast(newDynamicFieldTablesDF),
> Seq("reporting_table"), "inner")
> logger.info(s"####### join with newDF done " +
> LocalDateTime.now())
>
> // detailsForCreateTableDF.cache()
>
> val newDynamicFieldTablesList = newDynamicFieldTablesDF.map(r
> => r.getString(0)).collect().toSet
>
>
> Later, I am iterating this list for one the use case to create a custom
> definition table:
>
> newDynamicFieldTablesList.foreach(table => { // running here Create table
> DDL/SQL query })
>
> Thank you so much
>