You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "gaokui (Jira)" <ji...@apache.org> on 2020/07/23 02:40:00 UTC
[jira] [Updated] (SPARK-32341) add mutiple filter in rdd function
[ https://issues.apache.org/jira/browse/SPARK-32341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
gaokui updated SPARK-32341:
---------------------------
Affects Version/s: 3.0.0
Description:
when i use spark rdd . i often use to read kafka data.And kafka data has lots of kinds data set.
I filter these rdd by kafka key , then i can use Array[rdd] to fill every topic rdd.
But at that , i use rdd.filter,that will generate more than one stage.Data will process by many task, that consume too many time. And it is not necessary.
i hope add multiple filter function not rdd.filter ,that will return Array[RDD] in one stage by dividing all mixture data RDD to single data set RDD .
function like Array[RDD]=rdd.multiplefilter(setcondition).
was:
when i use spark rdd . i often use to read kafka data.
but kafka data has lots of kinds data set.
when i use rdd.filter,that will generate more stage.
i hope add mutiple filter function not rdd.filter ,that will return in one stage with all single data set.
like Array[RDD]=rdd.mutiplefilter(setcondition).
Summary: add mutiple filter in rdd function (was: wish to add mutiple filter in rdd function)
> add mutiple filter in rdd function
> ----------------------------------
>
> Key: SPARK-32341
> URL: https://issues.apache.org/jira/browse/SPARK-32341
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 2.4.6, 3.0.0
> Reporter: gaokui
> Priority: Major
>
> when i use spark rdd . i often use to read kafka data.And kafka data has lots of kinds data set.
> I filter these rdd by kafka key , then i can use Array[rdd] to fill every topic rdd.
> But at that , i use rdd.filter,that will generate more than one stage.Data will process by many task, that consume too many time. And it is not necessary.
> i hope add multiple filter function not rdd.filter ,that will return Array[RDD] in one stage by dividing all mixture data RDD to single data set RDD .
> function like Array[RDD]=rdd.multiplefilter(setcondition).
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org