You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by mattbuttow <ma...@yandex.com> on 2018/07/31 14:44:27 UTC

Writing file

According to Stack Overflow (https://stackoverflow.com/q/40786093) it should
be possible to write file to a local path and the result should be available
on the driver node.

However when I try this:

     df.write.parquet("file:///some/path")

the data seems to be written on each node, not a driver.

I checked an answer (https://stackoverflow.com/a/31240494) by Holden Karau
but it seems ambigous and other users
(https://stackoverflow.com/questions/31239161/save-a-spark-rdd-to-the-local-file-system-using-java#comment50482201_31240494)
seem to have similar problem to mine.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Writing file

Posted by mattbuttow <ma...@yandex.com>.

Thank you cloud0fan. That's really helpful.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Writing file

Posted by Wenchen Fan <cl...@gmail.com>.

It depends on how you deploy Spark. The writer just writes data to your
specified path(HDFS or local path), but the writer is run on executors. If
you deploy Spark with the local mode, i.e. executor and driver are
together, then you will see the output file on the driver node.

If you deploy Spark with a cluster, then I'd suggest to writing data to
HDFS(or other distributed file system) and use another tool to dump the
files from HDFS to any node you want.

On Tue, Jul 31, 2018 at 10:44 PM mattbuttow <ma...@yandex.com> wrote:

> According to Stack Overflow (https://stackoverflow.com/q/40786093) it
> should
> be possible to write file to a local path and the result should be
> available
> on the driver node.
>
> However when I try this:
>
>      df.write.parquet("file:///some/path")
>
> the data seems to be written on each node, not a driver.
>
> I checked an answer (https://stackoverflow.com/a/31240494) by Holden Karau
> but it seems ambigous and other users
> (
> https://stackoverflow.com/questions/31239161/save-a-spark-rdd-to-the-local-file-system-using-java#comment50482201_31240494
> )
> seem to have similar problem to mine.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>