You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by srinivasarao daruna <sr...@gmail.com> on 2021/01/21 02:56:43 UTC

Spark job stuck after read and not starting next stage

Hi,
I am running a spark job on a huge dataset. I have allocated 10 R5.16xlarge
machines. (each consists 64cores, 512G).

The source data is json and i need to do some json transformations. So, i
read them as text and then convert to a dataframe.

ds = spark.read.textFile()
updated_dataset = ds.withColumn(applying my transformations).as[String]
df = spark.read.json(updated_dataset)

df.write.save()

Some notes:
The source data is heavy and deeply nested. The printSchema contains a lot
of nested structs.

in the spark ui, json stage is first and after that is completed, it is not
showing any jobs in the UI and it's just hanging there.

All executors were dead and only the driver was active.

Thank You,
Regards,
Srini

Re: Spark job stuck after read and not starting next stage

Posted by German Schiavon <gs...@gmail.com>.
Hi,
 not sure if it is your case, but if the source data is heavy and deeply
nested I'd recommend explicitly providing the schema when reading the json.

df = spark.read.schema(schema).json(updated_dataset)


On Thu, 21 Jan 2021 at 04:15, srinivasarao daruna <sr...@gmail.com>
wrote:

> Hi,
> I am running a spark job on a huge dataset. I have allocated 10
> R5.16xlarge machines. (each consists 64cores, 512G).
>
> The source data is json and i need to do some json transformations. So, i
> read them as text and then convert to a dataframe.
>
> ds = spark.read.textFile()
> updated_dataset = ds.withColumn(applying my transformations).as[String]
> df = spark.read.json(updated_dataset)
>
> df.write.save()
>
> Some notes:
> The source data is heavy and deeply nested. The printSchema contains a lot
> of nested structs.
>
> in the spark ui, json stage is first and after that is completed, it is
> not showing any jobs in the UI and it's just hanging there.
>
> All executors were dead and only the driver was active.
>
> Thank You,
> Regards,
> Srini
>