You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Saúl Nogueras <su...@gmail.com> on 2018/03/02 13:02:46 UTC
Problems connecting form Spark
I cannot properly connect to Kudu from Spark, error says “Kudu master has
no leader”
- CDH 5.14
- Kudu 1.6
- Spark 1.6.0 standalone and 2.2.0
When I use Impala in HUE to create and query kudu tables, it works
flawlessly.
However, connecting from Spark throws some errors I cannot decipher.
I have tried using both pyspark and spark-shell. With spark shell I had to
use spark 1.6 instead of 2.2 because some maven dependencies problems, that
I have localized but not been able to fix. More info here.
------------------------------
Case 1: using pyspark2 (Spark 2.2.0)
$ pyspark2 --master yarn --jars
/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/kudu/kudu-spark2_2.11.jar
> df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077", "kudu.table":"impala::default.test"}).load()
18/03/02 10:23:27 WARN client.ConnectToCluster: Error receiving
response from 172.17.0.43:7077
org.apache.kudu.client.RecoverableException: [peer
master-172.17.0.43:7077] encountered a read timeout; closing the
channel
at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
at org.apache.kudu.client.Connection.<init>(Connection.java:199)
at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
... 1 more
18/03/02 10:23:27 WARN client.ConnectToCluster: Unable to find the
leader master 172.17.0.43:7077; will retry
Py4JJavaError Traceback (most recent call last)
<ipython-input-1-e1dfaec7a544> in <module>()
----> 1 df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077",
"kudu.table":"impala::default.logika_dataset_kudu"}).load()
/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/readwriter.py
in load(self, path, format, schema, **options)
163 return
self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
164 else:
--> 165 return self._df(self._jreader.load())
166
167 @since(1.4)
/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py
in __call__(self, *args)
1131 answer = self.gateway_client.send_command(command)
1132 return_value = get_return_value(
-> 1133 answer, self.gateway_client, self.target_id, self.name)
1134
1135 for temp_arg in temp_args:
/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/utils.py
in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py
in get_return_value(answer, gateway_client, target_id, name)
317 raise Py4JJavaError(
318 "An error occurred while calling {0}{1}{2}.\n".
--> 319 format(target_id, ".", name), value)
320 else:
321 raise Py4JError(
Py4JJavaError: An error occurred while calling o59.load.
: java.security.PrivilegedActionException:
org.apache.kudu.client.NoLeaderFoundException: Master config
(172.17.0.43:7077) has no leader. Exceptions received:
org.apache.kudu.client.RecoverableException: [peer
master-172.17.0.43:7077] encountered a read timeout; closing the
channel
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kudu.client.NoLeaderFoundException: Master
config (172.17.0.43:7077) has no leader. Exceptions received:
org.apache.kudu.client.RecoverableException: [peer
master-172.17.0.43:7077] encountered a read timeout; closing the
channel
at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:272)
at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:49)
at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:349)
at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:338)
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1315)
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1286)
at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:238)
at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:292)
at org.apache.kudu.client.RpcProxy.failOrRetryRpc(RpcProxy.java:388)
at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:217)
at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:60)
at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:132)
at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:128)
at org.apache.kudu.client.Connection.cleanup(Connection.java:694)
at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:439)
at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: org.apache.kudu.client.RecoverableException: [peer
master-172.17.0.43:7077] encountered a read timeout; closing the
channel
at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
... 21 more
Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
at org.apache.kudu.client.Connection.<init>(Connection.java:199)
at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
... 1 more
Case 2: using spark-shell (Spark 1.6.0 standalone):
$ spark-shell --master spark://localhost:7077 --packages
org.apache.kudu:kudu-spark_2.10:1.1.0
> import org.apache.kudu.spark.kudu._
> import org.apache.kudu.client._
> import collection.JavaConverters._
> val df = sqlContext.read.options(Map("kudu.master" -> "localhost:7051","kudu.table" -> "impala::default.test")).kudu
df: org.apache.spark.sql.DataFrame = [dataset: string, id: string,
itemnumber: string, srcid: string, timestamp: string, year: string,
month: string, day: string, week: string, quarter: string, season:
string, city: string, region1: string, region2: string, region3:
string, region4: string, locality: string, itemname: string, itembqu:
string, product_category: string, amount: string, mapped_zipcode:
string, latitude: string, longitude: string, depositor_code: string,
depositor_name: string, customer_code: string, is_island: string]
It seems to be connecting, as it is able to show the column names, but if I
// register a temporary table and use SQL
df.registerTempTable("test")
val filteredDF = sqlContext.sql("select count(*) from test").show()
bang!
[Stage 0:> (0 + 6) / 6]
Lost task 1.0 in stage 0.0 (TID 1, tt-slave-2.novalocal, executor 1):
org.apache.kudu.client.NonRecoverableException: RPC can not complete
before timeout: KuduRpc(method=GetTableSchema, tablet=null,
attempt=30, DeadlineTracker(timeout=30000, elapsed=27307), Traces:
[0ms] querying master,
[48ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu
Master - localhost:7051,
[71ms] Sub rpc: GetMasterRegistration received from server Kudu Master
- localhost:7051 response
Network error:
[Peer Kudu Master - localhost:7051] Connection reset,
[75ms] delaying RPC due to Service unavailable: Master config
(localhost:7051) has no leader.
Exceptions received: org.apache.kudu.client.RecoverableException:
[Peer Kudu Master - localhost:7051] Connection reset,
...
(SAME MESSAGE REPEATS 25 TIMES)
...
[24262ms] querying master,
[24262ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu
Master - localhost:7051,
[24263ms] Sub rpc: GetMasterRegistration received from server Kudu
Master - localhost:7051 response
Network error:
[Peer Kudu Master - localhost:7051] Connection reset,
[24263ms] delaying RPC due to Service unavailable: Master config
(localhost:7051) has no leader.
Exceptions received: org.apache.kudu.client.RecoverableException:
[Peer Kudu Master - localhost:7051] Connection reset,
[24661ms] trace too long, truncated)
at org.apache.kudu.client.AsyncKuduClient.tooManyAttemptsOrTimeout(AsyncKuduClient.java:961)
at org.apache.kudu.client.AsyncKuduClient.delayedSendRpcToTablet(AsyncKuduClient.java:1203)
at org.apache.kudu.client.AsyncKuduClient.access$800(AsyncKuduClient.java:110)
at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:764)
at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:754)
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:156)
at org.apache.kudu.client.GetMasterRegistrationReceived.access$300(GetMasterRegistrationReceived.java:45)
at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:236)
at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:225)
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:220)
at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:274)
at org.apache.kudu.client.TabletClient.failOrRetryRpc(TabletClient.java:770)
at org.apache.kudu.client.TabletClient.failOrRetryRpcs(TabletClient.java:747)
at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:736)
at org.apache.kudu.client.TabletClient.channelClosed(TabletClient.java:698)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
at org.apache.kudu.client.TabletClient.handleUpstream(TabletClient.java:679)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.apache.kudu.client.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.apache.kudu.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.apache.kudu.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kudu.client.NoLeaderFoundException: Master
config (localhost:7051) has no leader. Exceptions received:
org.apache.kudu.client.RecoverableException: [Peer Kudu Master -
localhost:7051] Connection reset
at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:154)
... 32 more
Caused by: org.apache.kudu.client.RecoverableException: [Peer Kudu
Master - localhost:7051] Connection reset
at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:734)
... 21 more
As I said, Kudu service is up an running, and I am able to query kudu
tables from Hue using Impala.
What am I missing here? Is this the right approach to interfacing Spark
with Kudu?
Thanks
Re: Problems connecting form Spark
Posted by William Berkeley <wd...@cloudera.com>.
In each case the problem is that some part of your application can't find
the leader master of the Kudu cluster:
org.apache.kudu.client.NoLeaderFoundException: Master config (*172.17.0.43:7077
<http://172.17.0.43:7077>*) has no leader.
org.apache.kudu.client.NoLeaderFoundException: Master config (
*localhost:7051*) has no leader.
I think you're seeing these errors for two reasons:
1. Are you using multi-master? The first exception shows you specified one
remote master. If your cluster has multiple masters, you should specify all
of them. If you specify only one, and it's not the leader master, then
connecting to it will fail. You can check which master is the leader by
going to the /masters page on the web ui of any master.
2. In the "standalone" case, the Spark tasks are being distributed to
executors and fail there:
Lost task 1.0 in stage 0.0 (TID 1, tt-slave-2.novalocal, executor 1)
You've specified the master address as localhost. That address is passed
as-is to executors. Any task on an executor that doesn't have the leader
master locally at port 7051 will fail to connect to the leader master.
Getting the column names doesn't fail as that doesn't generate tasks sent
to remote executors.
I make this mistake all the time while playing with kudu-spark :)
-Will
On Mon, Mar 5, 2018 at 4:14 PM, Mac Noland <mc...@gmail.com> wrote:
> Any chance you can try spark2-shell with Kudu 1.6 and then re-try your
> tests?
>
> spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.6.0
>
> On Fri, Mar 2, 2018 at 5:02 AM, Saúl Nogueras <su...@gmail.com> wrote:
>
>> I cannot properly connect to Kudu from Spark, error says “Kudu master has
>> no leader”
>>
>> - CDH 5.14
>> - Kudu 1.6
>> - Spark 1.6.0 standalone and 2.2.0
>>
>> When I use Impala in HUE to create and query kudu tables, it works
>> flawlessly.
>>
>> However, connecting from Spark throws some errors I cannot decipher.
>>
>> I have tried using both pyspark and spark-shell. With spark shell I had
>> to use spark 1.6 instead of 2.2 because some maven dependencies problems,
>> that I have localized but not been able to fix. More info here.
>> ------------------------------
>> Case 1: using pyspark2 (Spark 2.2.0)
>>
>> $ pyspark2 --master yarn --jars /opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/kudu/kudu-spark2_2.11.jar
>>
>> > df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077", "kudu.table":"impala::default.test"}).load()
>>
>> 18/03/02 10:23:27 WARN client.ConnectToCluster: Error receiving response from 172.17.0.43:7077
>> org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>> at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>> at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
>> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
>> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>> at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>> at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
>> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
>> at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
>> at org.apache.kudu.client.Connection.<init>(Connection.java:199)
>> at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
>> at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
>> at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
>> at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
>> at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
>> at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
>> at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
>> at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
>> at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:360)
>> at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
>> at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
>> at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
>> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>> at py4j.Gateway.invoke(Gateway.java:280)
>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> at py4j.GatewayConnection.run(GatewayConnection.java:214)
>> ... 1 more
>> 18/03/02 10:23:27 WARN client.ConnectToCluster: Unable to find the leader master 172.17.0.43:7077; will retry
>>
>> Py4JJavaError Traceback (most recent call last)
>> <ipython-input-1-e1dfaec7a544> in <module>()
>> ----> 1 df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077", "kudu.table":"impala::default.logika_dataset_kudu"}).load()
>>
>> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
>> 163 return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
>> 164 else:
>> --> 165 return self._df(self._jreader.load())
>> 166
>> 167 @since(1.4)
>>
>> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
>> 1131 answer = self.gateway_client.send_command(command)
>> 1132 return_value = get_return_value(
>> -> 1133 answer, self.gateway_client, self.target_id, self.name)
>> 1134
>> 1135 for temp_arg in temp_args:
>>
>> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/utils.py in deco(*a, **kw)
>> 61 def deco(*a, **kw):
>> 62 try:
>> ---> 63 return f(*a, **kw)
>> 64 except py4j.protocol.Py4JJavaError as e:
>> 65 s = e.java_exception.toString()
>>
>> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
>> 317 raise Py4JJavaError(
>> 318 "An error occurred while calling {0}{1}{2}.\n".
>> --> 319 format(target_id, ".", name), value)
>> 320 else:
>> 321 raise Py4JError(
>>
>> Py4JJavaError: An error occurred while calling o59.load.
>> : java.security.PrivilegedActionException: org.apache.kudu.client.NoLeaderFoundException: Master config (172.17.0.43:7077) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:360)
>> at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
>> at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
>> at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
>> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>> at py4j.Gateway.invoke(Gateway.java:280)
>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> at py4j.GatewayConnection.run(GatewayConnection.java:214)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config (172.17.0.43:7077) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>> at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:272)
>> at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:49)
>> at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:349)
>> at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:338)
>> at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
>> at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
>> at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1315)
>> at com.stumbleupon.async.Deferred.doCall(Deferred.java:1286)
>> at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
>> at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
>> at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:238)
>> at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:292)
>> at org.apache.kudu.client.RpcProxy.failOrRetryRpc(RpcProxy.java:388)
>> at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:217)
>> at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:60)
>> at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:132)
>> at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:128)
>> at org.apache.kudu.client.Connection.cleanup(Connection.java:694)
>> at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:439)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>> at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
>> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
>> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>> at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>> at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> ... 1 more
>> Caused by: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>> at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
>> ... 21 more
>> Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
>> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
>> at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
>> at org.apache.kudu.client.Connection.<init>(Connection.java:199)
>> at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
>> at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
>> at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
>> at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
>> at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
>> at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
>> at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
>> at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
>> at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:360)
>> at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
>> at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
>> at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
>> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>> at py4j.Gateway.invoke(Gateway.java:280)
>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> at py4j.GatewayConnection.run(GatewayConnection.java:214)
>> ... 1 more
>>
>> Case 2: using spark-shell (Spark 1.6.0 standalone):
>>
>> $ spark-shell --master spark://localhost:7077 --packages org.apache.kudu:kudu-spark_2.10:1.1.0
>> > import org.apache.kudu.spark.kudu._
>> > import org.apache.kudu.client._
>> > import collection.JavaConverters._
>> > val df = sqlContext.read.options(Map("kudu.master" -> "localhost:7051","kudu.table" -> "impala::default.test")).kudu
>> df: org.apache.spark.sql.DataFrame = [dataset: string, id: string, itemnumber: string, srcid: string, timestamp: string, year: string, month: string, day: string, week: string, quarter: string, season: string, city: string, region1: string, region2: string, region3: string, region4: string, locality: string, itemname: string, itembqu: string, product_category: string, amount: string, mapped_zipcode: string, latitude: string, longitude: string, depositor_code: string, depositor_name: string, customer_code: string, is_island: string]
>>
>> It seems to be connecting, as it is able to show the column names, but if
>> I
>>
>> // register a temporary table and use SQL
>> df.registerTempTable("test")
>> val filteredDF = sqlContext.sql("select count(*) from test").show()
>>
>> bang!
>>
>> [Stage 0:> (0 + 6) / 6]
>> Lost task 1.0 in stage 0.0 (TID 1, tt-slave-2.novalocal, executor 1): org.apache.kudu.client.NonRecoverableException: RPC can not complete before timeout: KuduRpc(method=GetTableSchema, tablet=null, attempt=30, DeadlineTracker(timeout=30000, elapsed=27307), Traces:
>>
>> [0ms] querying master,
>> [48ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu Master - localhost:7051,
>> [71ms] Sub rpc: GetMasterRegistration received from server Kudu Master - localhost:7051 response
>> Network error:
>> [Peer Kudu Master - localhost:7051] Connection reset,
>> [75ms] delaying RPC due to Service unavailable: Master config (localhost:7051) has no leader.
>> Exceptions received: org.apache.kudu.client.RecoverableException:
>> [Peer Kudu Master - localhost:7051] Connection reset,
>>
>> ...
>> (SAME MESSAGE REPEATS 25 TIMES)
>> ...
>>
>> [24262ms] querying master,
>> [24262ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu Master - localhost:7051,
>> [24263ms] Sub rpc: GetMasterRegistration received from server Kudu Master - localhost:7051 response
>> Network error:
>> [Peer Kudu Master - localhost:7051] Connection reset,
>> [24263ms] delaying RPC due to Service unavailable: Master config (localhost:7051) has no leader.
>> Exceptions received: org.apache.kudu.client.RecoverableException:
>> [Peer Kudu Master - localhost:7051] Connection reset,
>> [24661ms] trace too long, truncated)
>>
>> at org.apache.kudu.client.AsyncKuduClient.tooManyAttemptsOrTimeout(AsyncKuduClient.java:961)
>> at org.apache.kudu.client.AsyncKuduClient.delayedSendRpcToTablet(AsyncKuduClient.java:1203)
>> at org.apache.kudu.client.AsyncKuduClient.access$800(AsyncKuduClient.java:110)
>> at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:764)
>> at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:754)
>> at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
>> at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
>> at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
>> at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:156)
>> at org.apache.kudu.client.GetMasterRegistrationReceived.access$300(GetMasterRegistrationReceived.java:45)
>> at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:236)
>> at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:225)
>> at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
>> at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
>> at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
>> at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:220)
>> at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:274)
>> at org.apache.kudu.client.TabletClient.failOrRetryRpc(TabletClient.java:770)
>> at org.apache.kudu.client.TabletClient.failOrRetryRpcs(TabletClient.java:747)
>> at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:736)
>> at org.apache.kudu.client.TabletClient.channelClosed(TabletClient.java:698)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>> at org.apache.kudu.client.TabletClient.handleUpstream(TabletClient.java:679)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>> at org.apache.kudu.client.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>> at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>> at org.apache.kudu.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>> at org.apache.kudu.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config (localhost:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [Peer Kudu Master - localhost:7051] Connection reset
>> at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:154)
>> ... 32 more
>> Caused by: org.apache.kudu.client.RecoverableException: [Peer Kudu Master - localhost:7051] Connection reset
>> at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:734)
>> ... 21 more
>>
>> As I said, Kudu service is up an running, and I am able to query kudu
>> tables from Hue using Impala.
>>
>> What am I missing here? Is this the right approach to interfacing Spark
>> with Kudu?
>>
>> Thanks
>>
>>
>
>
Re: Problems connecting form Spark
Posted by Mac Noland <mc...@gmail.com>.
Any chance you can try spark2-shell with Kudu 1.6 and then re-try your
tests?
spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.6.0
On Fri, Mar 2, 2018 at 5:02 AM, Saúl Nogueras <su...@gmail.com> wrote:
> I cannot properly connect to Kudu from Spark, error says “Kudu master has
> no leader”
>
> - CDH 5.14
> - Kudu 1.6
> - Spark 1.6.0 standalone and 2.2.0
>
> When I use Impala in HUE to create and query kudu tables, it works
> flawlessly.
>
> However, connecting from Spark throws some errors I cannot decipher.
>
> I have tried using both pyspark and spark-shell. With spark shell I had to
> use spark 1.6 instead of 2.2 because some maven dependencies problems, that
> I have localized but not been able to fix. More info here.
> ------------------------------
> Case 1: using pyspark2 (Spark 2.2.0)
>
> $ pyspark2 --master yarn --jars /opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/kudu/kudu-spark2_2.11.jar
>
> > df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077", "kudu.table":"impala::default.test"}).load()
>
> 18/03/02 10:23:27 WARN client.ConnectToCluster: Error receiving response from 172.17.0.43:7077
> org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
> at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
> at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
> at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
> at org.apache.kudu.client.Connection.<init>(Connection.java:199)
> at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
> at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
> at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
> at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
> at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
> at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
> at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
> at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
> at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
> at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
> at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:214)
> ... 1 more
> 18/03/02 10:23:27 WARN client.ConnectToCluster: Unable to find the leader master 172.17.0.43:7077; will retry
>
> Py4JJavaError Traceback (most recent call last)
> <ipython-input-1-e1dfaec7a544> in <module>()
> ----> 1 df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077", "kudu.table":"impala::default.logika_dataset_kudu"}).load()
>
> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
> 163 return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 164 else:
> --> 165 return self._df(self._jreader.load())
> 166
> 167 @since(1.4)
>
> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
> 1131 answer = self.gateway_client.send_command(command)
> 1132 return_value = get_return_value(
> -> 1133 answer, self.gateway_client, self.target_id, self.name)
> 1134
> 1135 for temp_arg in temp_args:
>
> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/utils.py in deco(*a, **kw)
> 61 def deco(*a, **kw):
> 62 try:
> ---> 63 return f(*a, **kw)
> 64 except py4j.protocol.Py4JJavaError as e:
> 65 s = e.java_exception.toString()
>
> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
> 317 raise Py4JJavaError(
> 318 "An error occurred while calling {0}{1}{2}.\n".
> --> 319 format(target_id, ".", name), value)
> 320 else:
> 321 raise Py4JError(
>
> Py4JJavaError: An error occurred while calling o59.load.
> : java.security.PrivilegedActionException: org.apache.kudu.client.NoLeaderFoundException: Master config (172.17.0.43:7077) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
> at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
> at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:214)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config (172.17.0.43:7077) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
> at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:272)
> at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:49)
> at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:349)
> at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:338)
> at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
> at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
> at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1315)
> at com.stumbleupon.async.Deferred.doCall(Deferred.java:1286)
> at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
> at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
> at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:238)
> at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:292)
> at org.apache.kudu.client.RpcProxy.failOrRetryRpc(RpcProxy.java:388)
> at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:217)
> at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:60)
> at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:132)
> at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:128)
> at org.apache.kudu.client.Connection.cleanup(Connection.java:694)
> at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:439)
> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
> at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
> at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
> at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ... 1 more
> Caused by: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
> at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
> ... 21 more
> Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
> at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
> at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
> at org.apache.kudu.client.Connection.<init>(Connection.java:199)
> at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
> at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
> at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
> at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
> at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
> at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
> at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
> at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
> at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
> at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
> at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:214)
> ... 1 more
>
> Case 2: using spark-shell (Spark 1.6.0 standalone):
>
> $ spark-shell --master spark://localhost:7077 --packages org.apache.kudu:kudu-spark_2.10:1.1.0
> > import org.apache.kudu.spark.kudu._
> > import org.apache.kudu.client._
> > import collection.JavaConverters._
> > val df = sqlContext.read.options(Map("kudu.master" -> "localhost:7051","kudu.table" -> "impala::default.test")).kudu
> df: org.apache.spark.sql.DataFrame = [dataset: string, id: string, itemnumber: string, srcid: string, timestamp: string, year: string, month: string, day: string, week: string, quarter: string, season: string, city: string, region1: string, region2: string, region3: string, region4: string, locality: string, itemname: string, itembqu: string, product_category: string, amount: string, mapped_zipcode: string, latitude: string, longitude: string, depositor_code: string, depositor_name: string, customer_code: string, is_island: string]
>
> It seems to be connecting, as it is able to show the column names, but if I
>
> // register a temporary table and use SQL
> df.registerTempTable("test")
> val filteredDF = sqlContext.sql("select count(*) from test").show()
>
> bang!
>
> [Stage 0:> (0 + 6) / 6]
> Lost task 1.0 in stage 0.0 (TID 1, tt-slave-2.novalocal, executor 1): org.apache.kudu.client.NonRecoverableException: RPC can not complete before timeout: KuduRpc(method=GetTableSchema, tablet=null, attempt=30, DeadlineTracker(timeout=30000, elapsed=27307), Traces:
>
> [0ms] querying master,
> [48ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu Master - localhost:7051,
> [71ms] Sub rpc: GetMasterRegistration received from server Kudu Master - localhost:7051 response
> Network error:
> [Peer Kudu Master - localhost:7051] Connection reset,
> [75ms] delaying RPC due to Service unavailable: Master config (localhost:7051) has no leader.
> Exceptions received: org.apache.kudu.client.RecoverableException:
> [Peer Kudu Master - localhost:7051] Connection reset,
>
> ...
> (SAME MESSAGE REPEATS 25 TIMES)
> ...
>
> [24262ms] querying master,
> [24262ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu Master - localhost:7051,
> [24263ms] Sub rpc: GetMasterRegistration received from server Kudu Master - localhost:7051 response
> Network error:
> [Peer Kudu Master - localhost:7051] Connection reset,
> [24263ms] delaying RPC due to Service unavailable: Master config (localhost:7051) has no leader.
> Exceptions received: org.apache.kudu.client.RecoverableException:
> [Peer Kudu Master - localhost:7051] Connection reset,
> [24661ms] trace too long, truncated)
>
> at org.apache.kudu.client.AsyncKuduClient.tooManyAttemptsOrTimeout(AsyncKuduClient.java:961)
> at org.apache.kudu.client.AsyncKuduClient.delayedSendRpcToTablet(AsyncKuduClient.java:1203)
> at org.apache.kudu.client.AsyncKuduClient.access$800(AsyncKuduClient.java:110)
> at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:764)
> at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:754)
> at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
> at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
> at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
> at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:156)
> at org.apache.kudu.client.GetMasterRegistrationReceived.access$300(GetMasterRegistrationReceived.java:45)
> at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:236)
> at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:225)
> at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
> at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
> at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
> at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:220)
> at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:274)
> at org.apache.kudu.client.TabletClient.failOrRetryRpc(TabletClient.java:770)
> at org.apache.kudu.client.TabletClient.failOrRetryRpcs(TabletClient.java:747)
> at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:736)
> at org.apache.kudu.client.TabletClient.channelClosed(TabletClient.java:698)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
> at org.apache.kudu.client.TabletClient.handleUpstream(TabletClient.java:679)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> at org.apache.kudu.client.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at org.apache.kudu.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at org.apache.kudu.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config (localhost:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [Peer Kudu Master - localhost:7051] Connection reset
> at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:154)
> ... 32 more
> Caused by: org.apache.kudu.client.RecoverableException: [Peer Kudu Master - localhost:7051] Connection reset
> at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:734)
> ... 21 more
>
> As I said, Kudu service is up an running, and I am able to query kudu
> tables from Hue using Impala.
>
> What am I missing here? Is this the right approach to interfacing Spark
> with Kudu?
>
> Thanks
>
>