You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Joanne Contact <jo...@gmail.com> on 2014/11/21 23:48:24 UTC

Persist kafka streams to text file, tachyon error?

use the right email list.
---------- Forwarded message ----------
From: Joanne Contact <jo...@gmail.com>
Date: Fri, Nov 21, 2014 at 2:32 PM
Subject: Persist kafka streams to text file
To: user@spark.incubator.apache.org


Hello I am trying to read kafka stream to a text file by running spark from
my IDE (IntelliJ IDEA) . The code is similar as a previous thread on
persisting stream to a text file.

I am new to spark or scala. I believe the spark is on local mode as the
console shows
14/11/21 14:17:11 INFO spark.SparkContext: Spark configuration:
spark.app.name=local-mode

 I got the following errors. It is related to Tachyon. But I don't know if
I have tachyon or not.

14/11/21 14:17:54 WARN storage.TachyonBlockManager: Attempt 1 to create
tachyon dir null failed
java.io.IOException: Failed to connect to master localhost/127.0.0.1:19998
after 5 attempts
at tachyon.client.TachyonFS.connect(TachyonFS.java:293)
at tachyon.client.TachyonFS.getFileId(TachyonFS.java:1011)
at tachyon.client.TachyonFS.exist(TachyonFS.java:633)
at
org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:117)
at
org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:106)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at
org.apache.spark.storage.TachyonBlockManager.createTachyonDirs(TachyonBlockManager.scala:106)
at
org.apache.spark.storage.TachyonBlockManager.<init>(TachyonBlockManager.scala:57)
at
org.apache.spark.storage.BlockManager.tachyonStore$lzycompute(BlockManager.scala:88)
at org.apache.spark.storage.BlockManager.tachyonStore(BlockManager.scala:82)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:729)
at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:594)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:145)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: tachyon.org.apache.thrift.TException: Failed to connect to
master localhost/127.0.0.1:19998 after 5 attempts
at tachyon.master.MasterClient.connect(MasterClient.java:178)
at tachyon.client.TachyonFS.connect(TachyonFS.java:290)
... 28 more
Caused by: tachyon.org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused
at tachyon.org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at
tachyon.org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at tachyon.master.MasterClient.connect(MasterClient.java:156)
... 29 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at tachyon.org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 31 more
14/11/21 14:17:54 ERROR storage.TachyonBlockManager: Failed 10 attempts to
create tachyon dir in
/tmp_spark_tachyon/spark-3dbec68b-f5b8-45e1-bb68-370439839d4a/<driver>

I looked at the code. It has the following part. Is that a problem?

.persist(StorageLevel.OFF_HEAP)

Any advice?

Thank you!

J

Re: Persist kafka streams to text file, tachyon error?

Posted by Haoyuan Li <ha...@gmail.com>.
StorageLevel.OFF_HEAP requires to run Tachyon:
http://spark.apache.org/docs/latest/programming-guide.html

If you don't know if you have tachyon or not, you probably don't :)
http://tachyon-project.org/

For local testing, you can use other persist() solutions without running
Tachyon.

Best,

Haoyuan

On Fri, Nov 21, 2014 at 2:48 PM, Joanne Contact <jo...@gmail.com>
wrote:

> use the right email list.
> ---------- Forwarded message ----------
> From: Joanne Contact <jo...@gmail.com>
> Date: Fri, Nov 21, 2014 at 2:32 PM
> Subject: Persist kafka streams to text file
> To: user@spark.incubator.apache.org
>
>
> Hello I am trying to read kafka stream to a text file by running spark
> from my IDE (IntelliJ IDEA) . The code is similar as a previous thread on
> persisting stream to a text file.
>
> I am new to spark or scala. I believe the spark is on local mode as the
> console shows
> 14/11/21 14:17:11 INFO spark.SparkContext: Spark configuration:
> spark.app.name=local-mode
>
>  I got the following errors. It is related to Tachyon. But I don't know if
> I have tachyon or not.
>
> 14/11/21 14:17:54 WARN storage.TachyonBlockManager: Attempt 1 to create
> tachyon dir null failed
> java.io.IOException: Failed to connect to master localhost/127.0.0.1:19998
> after 5 attempts
> at tachyon.client.TachyonFS.connect(TachyonFS.java:293)
> at tachyon.client.TachyonFS.getFileId(TachyonFS.java:1011)
> at tachyon.client.TachyonFS.exist(TachyonFS.java:633)
> at
> org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:117)
> at
> org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:106)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
> at
> org.apache.spark.storage.TachyonBlockManager.createTachyonDirs(TachyonBlockManager.scala:106)
> at
> org.apache.spark.storage.TachyonBlockManager.<init>(TachyonBlockManager.scala:57)
> at
> org.apache.spark.storage.BlockManager.tachyonStore$lzycompute(BlockManager.scala:88)
> at
> org.apache.spark.storage.BlockManager.tachyonStore(BlockManager.scala:82)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:729)
> at
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:594)
> at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:145)
> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: tachyon.org.apache.thrift.TException: Failed to connect to
> master localhost/127.0.0.1:19998 after 5 attempts
> at tachyon.master.MasterClient.connect(MasterClient.java:178)
> at tachyon.client.TachyonFS.connect(TachyonFS.java:290)
> ... 28 more
> Caused by: tachyon.org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection refused
> at tachyon.org.apache.thrift.transport.TSocket.open(TSocket.java:185)
> at
> tachyon.org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
> at tachyon.master.MasterClient.connect(MasterClient.java:156)
> ... 29 more
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:579)
> at tachyon.org.apache.thrift.transport.TSocket.open(TSocket.java:180)
> ... 31 more
> 14/11/21 14:17:54 ERROR storage.TachyonBlockManager: Failed 10 attempts to
> create tachyon dir in
> /tmp_spark_tachyon/spark-3dbec68b-f5b8-45e1-bb68-370439839d4a/<driver>
>
> I looked at the code. It has the following part. Is that a problem?
>
> .persist(StorageLevel.OFF_HEAP)
>
> Any advice?
>
> Thank you!
>
> J
>
>


-- 
Haoyuan Li
AMPLab, EECS, UC Berkeley
http://www.cs.berkeley.edu/~haoyuan/