You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Matthew Molek <mm...@clearedgeit.com> on 2014/01/13 18:37:44 UTC

Accumulo and Spark

I just tried using AccumuloInputFormat as a data source for Spark running
in standalone mode on a single node 'cluster'. Everything seems to work
fine out of the box, as advertised. (Spark is supposed to work with any
hadoop InputFormat)

Just properly configure the AccumuloInputFormat, and pass it off to
JavaSparkContext.newAPIHadoopRDD(...) to load the data into an RDD.

The versions I tested with were Accumulo 1.5, Hadoop 1.2.1, and Spark 0.8.1.

Is anyone else using Spark with Accumulo?

Re: Accumulo and Spark

Posted by Jared Winick <ja...@gmail.com>.

I tried myself a few weeks ago and saw that it "just works" too for the
very simple test I ran. I did see some error messages when running from sbt
after the job successfully completed and the SparkContext was closing. I
assume this has to do with resources within the AccumuloInputFormat? This
was with Accumulo 1.5.0. Haven't had time to look into it but I was going
contribute an Accumulo example to
https://github.com/apache/incubator-spark/tree/master/examples/src/main/scala/org/apache/spark/examplesif
I could get these messages cleared up.

...
14/01/15 10:29:13 INFO spark.SparkContext: Successfully stopped SparkContext
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.run(ThriftTransportPool.java:129)
at java.lang.Thread.run(Thread.java:680)
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.run(ThriftTransportPool.java:129)
at java.lang.Thread.run(Thread.java:680)
14/01/15 10:29:13 ERROR zookeeper.ClientCnxn: Event thread exiting due to
interruption
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1961)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)
14/01/15 10:29:13 INFO zookeeper.ClientCnxn: EventThread shut down
[success] Total time: 3 s, completed Jan 15, 2014 10:29:13 AM
> 14/01/15 10:29:23 WARN zookeeper.ClientCnxn: Session 0x14396efc3be0022
for server vm/192.168.221.2:2181, unexpected error, closing socket
connection and attempting reconnect
java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:343)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)



On Mon, Jan 13, 2014 at 10:37 AM, Matthew Molek <mm...@clearedgeit.com>wrote:

> I just tried using AccumuloInputFormat as a data source for Spark running
> in standalone mode on a single node 'cluster'. Everything seems to work
> fine out of the box, as advertised. (Spark is supposed to work with any
> hadoop InputFormat)
>
> Just properly configure the AccumuloInputFormat, and pass it off to
> JavaSparkContext.newAPIHadoopRDD(...) to load the data into an RDD.
>
> The versions I tested with were Accumulo 1.5, Hadoop 1.2.1, and Spark
> 0.8.1.
>
> Is anyone else using Spark with Accumulo?
>
>