You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by nirmit jain <ni...@gmail.com> on 2020/03/20 07:20:44 UTC

Data Getting Inconsistent while running a Persist Job multiple time

Hi Team,

I am getting an issue where I am running a join on my persisted data and
storing it to multiple different locations.

And getting Duplicates Record in one set of data and no duplicate in
another set of data.

I have verified my source also and not getting any kind of duplicate
joining key that might lead to the situation.

Can the following scenario occur where a persisted class can get malformed
due to cluster resource dissociation and scaling and can end up with
multiple same rows?

Can some assists me on the same

Regards
Nirmit Jain

Partition by Custom Wrapping

Posted by nirmit jain <ni...@gmail.com>.

Hi Developer,

Can someone help me to write coustom partitioning, where instead of writing
data in the hierarchical format like this:

Root-
        Data=A-
        Data=B-

To something like this

Data-
        A-
           ROOT-
        B-
           ROOT-

If you could tell me what are the class that took care for such kind of
requirement.

Thanks and Regards
Nirmit Jain