You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Siva <sb...@gmail.com> on 2016/01/30 00:38:29 UTC

saveAsTextFile is not writing to local fs

Hi Everyone,

We are using spark 1.4.1 and we have a requirement of writing data local fs
instead of hdfs.

When trying to save rdd to local fs with saveAsTextFile, it is just writing
_SUCCESS file in the folder with no part- files and also no error or
warning messages on console.

Is there any place to look at to fix this problem?

Thanks,
Sivakumar Bhavanari.

RE: saveAsTextFile is not writing to local fs

Posted by Mohammed Guller <mo...@glassbeam.com>.
If the data is not too big, one option is to call the collect method and then save the result to a local file using standard Java/Scala API. However, keep in mind that this will transfer data from all the worker nodes to the driver program. Looks like that is what you want to do anyway, but you need to be aware of how big that data is and related implications.

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Siva [mailto:sbhavanari@gmail.com]
Sent: Monday, February 1, 2016 6:00 PM
To: Mohammed Guller
Cc: spark users
Subject: Re: saveAsTextFile is not writing to local fs

Hi Mohamed,

Thanks for your response. Data is available in worker nodes. But looking for something to write directly to local fs. Seems like it is not an option.

Thanks,
Sivakumar Bhavanari.

On Mon, Feb 1, 2016 at 5:45 PM, Mohammed Guller <mo...@glassbeam.com>> wrote:
You should not be saving an RDD to local FS if Spark is running on a real cluster. Essentially, each Spark worker will save the partitions that it processes locally.

Check the directories on the worker nodes and you should find pieces of your file on each node.

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Siva [mailto:sbhavanari@gmail.com<ma...@gmail.com>]
Sent: Friday, January 29, 2016 5:40 PM
To: Mohammed Guller
Cc: spark users
Subject: Re: saveAsTextFile is not writing to local fs

Hi Mohammed,

Thanks for your quick response. I m submitting spark job to Yarn in "yarn-client" mode on a 6 node cluster. I ran the job by turning on DEBUG mode. I see the below exception, but this exception occurred after saveAsTextfile function is finished.

16/01/29 20:26:57 DEBUG HttpParser:
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:190)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at org.spark-project.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
        at org.spark-project.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
        at org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
        at org.spark-project.jetty.http.HttpParser.fill(HttpParser.java:1044)
        at org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:280)
        at org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at org.spark-project.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
        at org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
        at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
16/01/29 20:26:57 DEBUG HttpParser:
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:190)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at org.spark-project.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
        at org.spark-project.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
        at org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
        at org.spark-project.jetty.http.HttpParser.fill(HttpParser.java:1044)
        at org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:280)
        at org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at org.spark-project.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
        at org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
        at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
16/01/29 20:26:57 DEBUG HttpParser: HttpParser{s=-14,l=0,c=-3}
org.spark-project.jetty.io.EofException

Do you think this one this causing this?

Thanks,
Sivakumar Bhavanari.

On Fri, Jan 29, 2016 at 3:55 PM, Mohammed Guller <mo...@glassbeam.com>> wrote:
Is it a multi-node cluster or you running Spark on a single machine?

You can change Spark’s logging level to INFO or DEBUG to see what is going on.

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Siva [mailto:sbhavanari@gmail.com<ma...@gmail.com>]
Sent: Friday, January 29, 2016 3:38 PM
To: spark users
Subject: saveAsTextFile is not writing to local fs

Hi Everyone,

We are using spark 1.4.1 and we have a requirement of writing data local fs instead of hdfs.

When trying to save rdd to local fs with saveAsTextFile, it is just writing _SUCCESS file in the folder with no part- files and also no error or warning messages on console.

Is there any place to look at to fix this problem?

Thanks,
Sivakumar Bhavanari.



Re: saveAsTextFile is not writing to local fs

Posted by Siva <sb...@gmail.com>.
Hi Mohamed,

Thanks for your response. Data is available in worker nodes. But looking
for something to write directly to local fs. Seems like it is not an option.

Thanks,
Sivakumar Bhavanari.

On Mon, Feb 1, 2016 at 5:45 PM, Mohammed Guller <mo...@glassbeam.com>
wrote:

> You should not be saving an RDD to local FS if Spark is running on a real
> cluster. Essentially, each Spark worker will save the partitions that it
> processes locally.
>
>
>
> Check the directories on the worker nodes and you should find pieces of
> your file on each node.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Siva [mailto:sbhavanari@gmail.com]
> *Sent:* Friday, January 29, 2016 5:40 PM
> *To:* Mohammed Guller
> *Cc:* spark users
> *Subject:* Re: saveAsTextFile is not writing to local fs
>
>
>
> Hi Mohammed,
>
>
>
> Thanks for your quick response. I m submitting spark job to Yarn in
> "yarn-client" mode on a 6 node cluster. I ran the job by turning on DEBUG
> mode. I see the below exception, but this exception occurred after
> saveAsTextfile function is finished.
>
>
>
> 16/01/29 20:26:57 DEBUG HttpParser:
>
> java.net.SocketException: Socket closed
>
>         at java.net.SocketInputStream.read(SocketInputStream.java:190)
>
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>
>         at
> org.spark-project.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
>
>         at
> org.spark-project.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
>
>         at
> org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
>
>         at
> org.spark-project.jetty.http.HttpParser.fill(HttpParser.java:1044)
>
>         at
> org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:280)
>
>         at
> org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>
>         at
> org.spark-project.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>
>         at
> org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>
>         at
> org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>
>         at
> org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> 16/01/29 20:26:57 DEBUG HttpParser:
>
> java.net.SocketException: Socket closed
>
>         at java.net.SocketInputStream.read(SocketInputStream.java:190)
>
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>
>         at
> org.spark-project.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
>
>         at
> org.spark-project.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
>
>         at
> org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
>
>         at
> org.spark-project.jetty.http.HttpParser.fill(HttpParser.java:1044)
>
>         at
> org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:280)
>
>         at
> org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>
>         at
> org.spark-project.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>
>         at
> org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>
>         at
> org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>
>         at
> org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> 16/01/29 20:26:57 DEBUG HttpParser: HttpParser{s=-14,l=0,c=-3}
>
> org.spark-project.jetty.io.EofException
>
>
>
> Do you think this one this causing this?
>
>
> Thanks,
>
> Sivakumar Bhavanari.
>
>
>
> On Fri, Jan 29, 2016 at 3:55 PM, Mohammed Guller <mo...@glassbeam.com>
> wrote:
>
> Is it a multi-node cluster or you running Spark on a single machine?
>
>
>
> You can change Spark’s logging level to INFO or DEBUG to see what is going
> on.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Siva [mailto:sbhavanari@gmail.com]
> *Sent:* Friday, January 29, 2016 3:38 PM
> *To:* spark users
> *Subject:* saveAsTextFile is not writing to local fs
>
>
>
> Hi Everyone,
>
>
>
> We are using spark 1.4.1 and we have a requirement of writing data local
> fs instead of hdfs.
>
>
>
> When trying to save rdd to local fs with saveAsTextFile, it is just
> writing _SUCCESS file in the folder with no part- files and also no error
> or warning messages on console.
>
>
>
> Is there any place to look at to fix this problem?
>
>
> Thanks,
>
> Sivakumar Bhavanari.
>
>
>

RE: saveAsTextFile is not writing to local fs

Posted by Mohammed Guller <mo...@glassbeam.com>.
You should not be saving an RDD to local FS if Spark is running on a real cluster. Essentially, each Spark worker will save the partitions that it processes locally.

Check the directories on the worker nodes and you should find pieces of your file on each node.

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Siva [mailto:sbhavanari@gmail.com]
Sent: Friday, January 29, 2016 5:40 PM
To: Mohammed Guller
Cc: spark users
Subject: Re: saveAsTextFile is not writing to local fs

Hi Mohammed,

Thanks for your quick response. I m submitting spark job to Yarn in "yarn-client" mode on a 6 node cluster. I ran the job by turning on DEBUG mode. I see the below exception, but this exception occurred after saveAsTextfile function is finished.

16/01/29 20:26:57 DEBUG HttpParser:
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:190)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at org.spark-project.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
        at org.spark-project.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
        at org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
        at org.spark-project.jetty.http.HttpParser.fill(HttpParser.java:1044)
        at org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:280)
        at org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at org.spark-project.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
        at org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
        at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
16/01/29 20:26:57 DEBUG HttpParser:
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:190)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at org.spark-project.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
        at org.spark-project.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
        at org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
        at org.spark-project.jetty.http.HttpParser.fill(HttpParser.java:1044)
        at org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:280)
        at org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at org.spark-project.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
        at org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
        at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
16/01/29 20:26:57 DEBUG HttpParser: HttpParser{s=-14,l=0,c=-3}
org.spark-project.jetty.io.EofException

Do you think this one this causing this?

Thanks,
Sivakumar Bhavanari.

On Fri, Jan 29, 2016 at 3:55 PM, Mohammed Guller <mo...@glassbeam.com>> wrote:
Is it a multi-node cluster or you running Spark on a single machine?

You can change Spark’s logging level to INFO or DEBUG to see what is going on.

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Siva [mailto:sbhavanari@gmail.com<ma...@gmail.com>]
Sent: Friday, January 29, 2016 3:38 PM
To: spark users
Subject: saveAsTextFile is not writing to local fs

Hi Everyone,

We are using spark 1.4.1 and we have a requirement of writing data local fs instead of hdfs.

When trying to save rdd to local fs with saveAsTextFile, it is just writing _SUCCESS file in the folder with no part- files and also no error or warning messages on console.

Is there any place to look at to fix this problem?

Thanks,
Sivakumar Bhavanari.


Re: saveAsTextFile is not writing to local fs

Posted by Siva <sb...@gmail.com>.
Hi Mohammed,

Thanks for your quick response. I m submitting spark job to Yarn in
"yarn-client" mode on a 6 node cluster. I ran the job by turning on DEBUG
mode. I see the below exception, but this exception occurred after
saveAsTextfile function is finished.

16/01/29 20:26:57 DEBUG HttpParser:
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:190)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at
org.spark-project.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
        at
org.spark-project.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
        at
org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
        at
org.spark-project.jetty.http.HttpParser.fill(HttpParser.java:1044)
        at
org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:280)
        at
org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at
org.spark-project.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
        at
org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
        at
org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at
org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
16/01/29 20:26:57 DEBUG HttpParser:
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.read(SocketInputStream.java:190)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at
org.spark-project.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
        at
org.spark-project.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
        at
org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
        at
org.spark-project.jetty.http.HttpParser.fill(HttpParser.java:1044)
        at
org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:280)
        at
org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at
org.spark-project.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
        at
org.spark-project.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
        at
org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at
org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
16/01/29 20:26:57 DEBUG HttpParser: HttpParser{s=-14,l=0,c=-3}
org.spark-project.jetty.io.EofException

Do you think this one this causing this?

Thanks,
Sivakumar Bhavanari.

On Fri, Jan 29, 2016 at 3:55 PM, Mohammed Guller <mo...@glassbeam.com>
wrote:

> Is it a multi-node cluster or you running Spark on a single machine?
>
>
>
> You can change Spark’s logging level to INFO or DEBUG to see what is going
> on.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Siva [mailto:sbhavanari@gmail.com]
> *Sent:* Friday, January 29, 2016 3:38 PM
> *To:* spark users
> *Subject:* saveAsTextFile is not writing to local fs
>
>
>
> Hi Everyone,
>
>
>
> We are using spark 1.4.1 and we have a requirement of writing data local
> fs instead of hdfs.
>
>
>
> When trying to save rdd to local fs with saveAsTextFile, it is just
> writing _SUCCESS file in the folder with no part- files and also no error
> or warning messages on console.
>
>
>
> Is there any place to look at to fix this problem?
>
>
> Thanks,
>
> Sivakumar Bhavanari.
>

RE: saveAsTextFile is not writing to local fs

Posted by Mohammed Guller <mo...@glassbeam.com>.
Is it a multi-node cluster or you running Spark on a single machine?

You can change Spark’s logging level to INFO or DEBUG to see what is going on.

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Siva [mailto:sbhavanari@gmail.com]
Sent: Friday, January 29, 2016 3:38 PM
To: spark users
Subject: saveAsTextFile is not writing to local fs

Hi Everyone,

We are using spark 1.4.1 and we have a requirement of writing data local fs instead of hdfs.

When trying to save rdd to local fs with saveAsTextFile, it is just writing _SUCCESS file in the folder with no part- files and also no error or warning messages on console.

Is there any place to look at to fix this problem?

Thanks,
Sivakumar Bhavanari.