You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Saúl Nogueras <su...@gmail.com> on 2018/03/02 13:02:46 UTC

Problems connecting form Spark

I cannot properly connect to Kudu from Spark, error says “Kudu master has
no leader”

   - CDH 5.14
   - Kudu 1.6
   - Spark 1.6.0 standalone and 2.2.0

When I use Impala in HUE to create and query kudu tables, it works
flawlessly.

However, connecting from Spark throws some errors I cannot decipher.

I have tried using both pyspark and spark-shell. With spark shell I had to
use spark 1.6 instead of 2.2 because some maven dependencies problems, that
I have localized but not been able to fix. More info here.
------------------------------
Case 1: using pyspark2 (Spark 2.2.0)

$ pyspark2 --master yarn --jars
/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/kudu/kudu-spark2_2.11.jar

> df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077", "kudu.table":"impala::default.test"}).load()

18/03/02 10:23:27 WARN client.ConnectToCluster: Error receiving
response from 172.17.0.43:7077
org.apache.kudu.client.RecoverableException: [peer
master-172.17.0.43:7077] encountered a read timeout; closing the
channel
        at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
        at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
        at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
        at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
        at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
        at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
        at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
        at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
        at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
        at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
        at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
        at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
        at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
        at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
        at org.apache.kudu.client.Connection.<init>(Connection.java:199)
        at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
        at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
        at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
        at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
        at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
        at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
        at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
        at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
        at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
        at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
        at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        ... 1 more
18/03/02 10:23:27 WARN client.ConnectToCluster: Unable to find the
leader master 172.17.0.43:7077; will retry

Py4JJavaError                             Traceback (most recent call last)
<ipython-input-1-e1dfaec7a544> in <module>()
----> 1 df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077",
"kudu.table":"impala::default.logika_dataset_kudu"}).load()

/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/readwriter.py
in load(self, path, format, schema, **options)
    163             return
self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
    164         else:
--> 165             return self._df(self._jreader.load())
    166
    167     @since(1.4)

/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py
in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134
   1135         for temp_arg in temp_args:

/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/utils.py
in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py
in get_return_value(answer, gateway_client, target_id, name)
    317                 raise Py4JJavaError(
    318                     "An error occurred while calling {0}{1}{2}.\n".
--> 319                     format(target_id, ".", name), value)
    320             else:
    321                 raise Py4JError(

Py4JJavaError: An error occurred while calling o59.load.
: java.security.PrivilegedActionException:
org.apache.kudu.client.NoLeaderFoundException: Master config
(172.17.0.43:7077) has no leader. Exceptions received:
org.apache.kudu.client.RecoverableException: [peer
master-172.17.0.43:7077] encountered a read timeout; closing the
channel
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
        at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
        at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kudu.client.NoLeaderFoundException: Master
config (172.17.0.43:7077) has no leader. Exceptions received:
org.apache.kudu.client.RecoverableException: [peer
master-172.17.0.43:7077] encountered a read timeout; closing the
channel
        at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:272)
        at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:49)
        at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:349)
        at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:338)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
        at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1315)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1286)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
        at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:238)
        at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:292)
        at org.apache.kudu.client.RpcProxy.failOrRetryRpc(RpcProxy.java:388)
        at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:217)
        at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:60)
        at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:132)
        at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:128)
        at org.apache.kudu.client.Connection.cleanup(Connection.java:694)
        at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:439)
        at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
        at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
        at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
        at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
        at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
        at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
        at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
        at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
        at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
        at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
        at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more
Caused by: org.apache.kudu.client.RecoverableException: [peer
master-172.17.0.43:7077] encountered a read timeout; closing the
channel
        at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
        ... 21 more
Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
        at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
        at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
        at org.apache.kudu.client.Connection.<init>(Connection.java:199)
        at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
        at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
        at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
        at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
        at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
        at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
        at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
        at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
        at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
        at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
        at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        ... 1 more

Case 2: using spark-shell (Spark 1.6.0 standalone):

$ spark-shell --master spark://localhost:7077 --packages
org.apache.kudu:kudu-spark_2.10:1.1.0
> import org.apache.kudu.spark.kudu._
> import org.apache.kudu.client._
> import collection.JavaConverters._
> val df = sqlContext.read.options(Map("kudu.master" -> "localhost:7051","kudu.table" -> "impala::default.test")).kudu
df: org.apache.spark.sql.DataFrame = [dataset: string, id: string,
itemnumber: string, srcid: string, timestamp: string, year: string,
month: string, day: string, week: string, quarter: string, season:
string, city: string, region1: string, region2: string, region3:
string, region4: string, locality: string, itemname: string, itembqu:
string, product_category: string, amount: string, mapped_zipcode:
string, latitude: string, longitude: string, depositor_code: string,
depositor_name: string, customer_code: string, is_island: string]

It seems to be connecting, as it is able to show the column names, but if I

// register a temporary table and use SQL
df.registerTempTable("test")
val filteredDF = sqlContext.sql("select count(*) from test").show()

bang!

[Stage 0:>                                                          (0 + 6) / 6]
Lost task 1.0 in stage 0.0 (TID 1, tt-slave-2.novalocal, executor 1):
org.apache.kudu.client.NonRecoverableException: RPC can not complete
before timeout: KuduRpc(method=GetTableSchema, tablet=null,
attempt=30, DeadlineTracker(timeout=30000, elapsed=27307), Traces:

[0ms] querying master,
[48ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu
Master - localhost:7051,
[71ms] Sub rpc: GetMasterRegistration received from server Kudu Master
- localhost:7051 response
Network error:
[Peer Kudu Master - localhost:7051] Connection reset,
[75ms] delaying RPC due to Service unavailable: Master config
(localhost:7051) has no leader.
Exceptions received: org.apache.kudu.client.RecoverableException:
[Peer Kudu Master - localhost:7051] Connection reset,

...
(SAME MESSAGE REPEATS 25 TIMES)
...

[24262ms] querying master,
[24262ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu
Master - localhost:7051,
[24263ms] Sub rpc: GetMasterRegistration received from server Kudu
Master - localhost:7051 response
Network error:
[Peer Kudu Master - localhost:7051] Connection reset,
[24263ms] delaying RPC due to Service unavailable: Master config
(localhost:7051) has no leader.
Exceptions received: org.apache.kudu.client.RecoverableException:
[Peer Kudu Master - localhost:7051] Connection reset,
[24661ms] trace too long, truncated)

        at org.apache.kudu.client.AsyncKuduClient.tooManyAttemptsOrTimeout(AsyncKuduClient.java:961)
        at org.apache.kudu.client.AsyncKuduClient.delayedSendRpcToTablet(AsyncKuduClient.java:1203)
        at org.apache.kudu.client.AsyncKuduClient.access$800(AsyncKuduClient.java:110)
        at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:764)
        at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:754)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
        at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:156)
        at org.apache.kudu.client.GetMasterRegistrationReceived.access$300(GetMasterRegistrationReceived.java:45)
        at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:236)
        at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:225)
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
        at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
        at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:220)
        at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:274)
        at org.apache.kudu.client.TabletClient.failOrRetryRpc(TabletClient.java:770)
        at org.apache.kudu.client.TabletClient.failOrRetryRpcs(TabletClient.java:747)
        at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:736)
        at org.apache.kudu.client.TabletClient.channelClosed(TabletClient.java:698)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
        at org.apache.kudu.client.TabletClient.handleUpstream(TabletClient.java:679)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.apache.kudu.client.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.apache.kudu.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.apache.kudu.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kudu.client.NoLeaderFoundException: Master
config (localhost:7051) has no leader. Exceptions received:
org.apache.kudu.client.RecoverableException: [Peer Kudu Master -
localhost:7051] Connection reset
        at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:154)
        ... 32 more
Caused by: org.apache.kudu.client.RecoverableException: [Peer Kudu
Master - localhost:7051] Connection reset
        at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:734)
        ... 21 more

As I said, Kudu service is up an running, and I am able to query kudu
tables from Hue using Impala.

What am I missing here? Is this the right approach to interfacing Spark
with Kudu?

Thanks
​

Re: Problems connecting form Spark

Posted by William Berkeley <wd...@cloudera.com>.
In each case the problem is that some part of your application can't find
the leader master of the Kudu cluster:

org.apache.kudu.client.NoLeaderFoundException: Master config (*172.17.0.43:7077
<http://172.17.0.43:7077>*) has no leader.
org.apache.kudu.client.NoLeaderFoundException: Master config (
*localhost:7051*) has no leader.

I think you're seeing these errors for two reasons:

1. Are you using multi-master? The first exception shows you specified one
remote master. If your cluster has multiple masters, you should specify all
of them. If you specify only one, and it's not the leader master, then
connecting to it will fail. You can check which master is the leader by
going to the /masters page on the web ui of any master.

2. In the "standalone" case, the Spark tasks are being distributed to
executors and fail there:

Lost task 1.0 in stage 0.0 (TID 1, tt-slave-2.novalocal, executor 1)

You've specified the master address as localhost. That address is passed
as-is to executors. Any task on an executor that doesn't have the leader
master locally at port 7051 will fail to connect to the leader master.
Getting the column names doesn't fail as that doesn't generate tasks sent
to remote executors.

I make this mistake all the time while playing with kudu-spark :)

-Will





On Mon, Mar 5, 2018 at 4:14 PM, Mac Noland <mc...@gmail.com> wrote:

> Any chance you can try spark2-shell with Kudu 1.6 and then re-try your
> tests?
>
> spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.6.0
>
> On Fri, Mar 2, 2018 at 5:02 AM, Saúl Nogueras <su...@gmail.com> wrote:
>
>> I cannot properly connect to Kudu from Spark, error says “Kudu master has
>> no leader”
>>
>>    - CDH 5.14
>>    - Kudu 1.6
>>    - Spark 1.6.0 standalone and 2.2.0
>>
>> When I use Impala in HUE to create and query kudu tables, it works
>> flawlessly.
>>
>> However, connecting from Spark throws some errors I cannot decipher.
>>
>> I have tried using both pyspark and spark-shell. With spark shell I had
>> to use spark 1.6 instead of 2.2 because some maven dependencies problems,
>> that I have localized but not been able to fix. More info here.
>> ------------------------------
>> Case 1: using pyspark2 (Spark 2.2.0)
>>
>> $ pyspark2 --master yarn --jars /opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/kudu/kudu-spark2_2.11.jar
>>
>> > df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077", "kudu.table":"impala::default.test"}).load()
>>
>> 18/03/02 10:23:27 WARN client.ConnectToCluster: Error receiving response from 172.17.0.43:7077
>> org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>>         at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>>         at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
>>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
>>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>         at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>         at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>         at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
>>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
>>         at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
>>         at org.apache.kudu.client.Connection.<init>(Connection.java:199)
>>         at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
>>         at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
>>         at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
>>         at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
>>         at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
>>         at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
>>         at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
>>         at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
>>         at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:360)
>>         at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
>>         at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
>>         at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
>>         at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>>         at py4j.Gateway.invoke(Gateway.java:280)
>>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>>         ... 1 more
>> 18/03/02 10:23:27 WARN client.ConnectToCluster: Unable to find the leader master 172.17.0.43:7077; will retry
>>
>> Py4JJavaError                             Traceback (most recent call last)
>> <ipython-input-1-e1dfaec7a544> in <module>()
>> ----> 1 df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077", "kudu.table":"impala::default.logika_dataset_kudu"}).load()
>>
>> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
>>     163             return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
>>     164         else:
>> --> 165             return self._df(self._jreader.load())
>>     166
>>     167     @since(1.4)
>>
>> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
>>    1131         answer = self.gateway_client.send_command(command)
>>    1132         return_value = get_return_value(
>> -> 1133             answer, self.gateway_client, self.target_id, self.name)
>>    1134
>>    1135         for temp_arg in temp_args:
>>
>> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/utils.py in deco(*a, **kw)
>>      61     def deco(*a, **kw):
>>      62         try:
>> ---> 63             return f(*a, **kw)
>>      64         except py4j.protocol.Py4JJavaError as e:
>>      65             s = e.java_exception.toString()
>>
>> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
>>     317                 raise Py4JJavaError(
>>     318                     "An error occurred while calling {0}{1}{2}.\n".
>> --> 319                     format(target_id, ".", name), value)
>>     320             else:
>>     321                 raise Py4JError(
>>
>> Py4JJavaError: An error occurred while calling o59.load.
>> : java.security.PrivilegedActionException: org.apache.kudu.client.NoLeaderFoundException: Master config (172.17.0.43:7077) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:360)
>>         at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
>>         at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
>>         at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
>>         at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>>         at py4j.Gateway.invoke(Gateway.java:280)
>>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>>         at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config (172.17.0.43:7077) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>>         at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:272)
>>         at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:49)
>>         at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:349)
>>         at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:338)
>>         at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
>>         at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
>>         at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1315)
>>         at com.stumbleupon.async.Deferred.doCall(Deferred.java:1286)
>>         at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
>>         at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
>>         at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:238)
>>         at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:292)
>>         at org.apache.kudu.client.RpcProxy.failOrRetryRpc(RpcProxy.java:388)
>>         at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:217)
>>         at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:60)
>>         at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:132)
>>         at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:128)
>>         at org.apache.kudu.client.Connection.cleanup(Connection.java:694)
>>         at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:439)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>>         at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
>>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
>>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>         at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>         at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>         ... 1 more
>> Caused by: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>>         at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
>>         ... 21 more
>> Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
>>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
>>         at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
>>         at org.apache.kudu.client.Connection.<init>(Connection.java:199)
>>         at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
>>         at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
>>         at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
>>         at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
>>         at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
>>         at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
>>         at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
>>         at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
>>         at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:360)
>>         at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
>>         at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
>>         at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
>>         at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>>         at py4j.Gateway.invoke(Gateway.java:280)
>>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>>         ... 1 more
>>
>> Case 2: using spark-shell (Spark 1.6.0 standalone):
>>
>> $ spark-shell --master spark://localhost:7077 --packages org.apache.kudu:kudu-spark_2.10:1.1.0
>> > import org.apache.kudu.spark.kudu._
>> > import org.apache.kudu.client._
>> > import collection.JavaConverters._
>> > val df = sqlContext.read.options(Map("kudu.master" -> "localhost:7051","kudu.table" -> "impala::default.test")).kudu
>> df: org.apache.spark.sql.DataFrame = [dataset: string, id: string, itemnumber: string, srcid: string, timestamp: string, year: string, month: string, day: string, week: string, quarter: string, season: string, city: string, region1: string, region2: string, region3: string, region4: string, locality: string, itemname: string, itembqu: string, product_category: string, amount: string, mapped_zipcode: string, latitude: string, longitude: string, depositor_code: string, depositor_name: string, customer_code: string, is_island: string]
>>
>> It seems to be connecting, as it is able to show the column names, but if
>> I
>>
>> // register a temporary table and use SQL
>> df.registerTempTable("test")
>> val filteredDF = sqlContext.sql("select count(*) from test").show()
>>
>> bang!
>>
>> [Stage 0:>                                                          (0 + 6) / 6]
>> Lost task 1.0 in stage 0.0 (TID 1, tt-slave-2.novalocal, executor 1): org.apache.kudu.client.NonRecoverableException: RPC can not complete before timeout: KuduRpc(method=GetTableSchema, tablet=null, attempt=30, DeadlineTracker(timeout=30000, elapsed=27307), Traces:
>>
>> [0ms] querying master,
>> [48ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu Master - localhost:7051,
>> [71ms] Sub rpc: GetMasterRegistration received from server Kudu Master - localhost:7051 response
>> Network error:
>> [Peer Kudu Master - localhost:7051] Connection reset,
>> [75ms] delaying RPC due to Service unavailable: Master config (localhost:7051) has no leader.
>> Exceptions received: org.apache.kudu.client.RecoverableException:
>> [Peer Kudu Master - localhost:7051] Connection reset,
>>
>> ...
>> (SAME MESSAGE REPEATS 25 TIMES)
>> ...
>>
>> [24262ms] querying master,
>> [24262ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu Master - localhost:7051,
>> [24263ms] Sub rpc: GetMasterRegistration received from server Kudu Master - localhost:7051 response
>> Network error:
>> [Peer Kudu Master - localhost:7051] Connection reset,
>> [24263ms] delaying RPC due to Service unavailable: Master config (localhost:7051) has no leader.
>> Exceptions received: org.apache.kudu.client.RecoverableException:
>> [Peer Kudu Master - localhost:7051] Connection reset,
>> [24661ms] trace too long, truncated)
>>
>>         at org.apache.kudu.client.AsyncKuduClient.tooManyAttemptsOrTimeout(AsyncKuduClient.java:961)
>>         at org.apache.kudu.client.AsyncKuduClient.delayedSendRpcToTablet(AsyncKuduClient.java:1203)
>>         at org.apache.kudu.client.AsyncKuduClient.access$800(AsyncKuduClient.java:110)
>>         at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:764)
>>         at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:754)
>>         at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
>>         at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
>>         at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
>>         at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:156)
>>         at org.apache.kudu.client.GetMasterRegistrationReceived.access$300(GetMasterRegistrationReceived.java:45)
>>         at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:236)
>>         at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:225)
>>         at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
>>         at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
>>         at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
>>         at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:220)
>>         at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:274)
>>         at org.apache.kudu.client.TabletClient.failOrRetryRpc(TabletClient.java:770)
>>         at org.apache.kudu.client.TabletClient.failOrRetryRpcs(TabletClient.java:747)
>>         at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:736)
>>         at org.apache.kudu.client.TabletClient.channelClosed(TabletClient.java:698)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>>         at org.apache.kudu.client.TabletClient.handleUpstream(TabletClient.java:679)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>         at org.apache.kudu.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>         at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config (localhost:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [Peer Kudu Master - localhost:7051] Connection reset
>>         at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:154)
>>         ... 32 more
>> Caused by: org.apache.kudu.client.RecoverableException: [Peer Kudu Master - localhost:7051] Connection reset
>>         at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:734)
>>         ... 21 more
>>
>> As I said, Kudu service is up an running, and I am able to query kudu
>> tables from Hue using Impala.
>>
>> What am I missing here? Is this the right approach to interfacing Spark
>> with Kudu?
>>
>> Thanks
>> ​
>>
>
>

Re: Problems connecting form Spark

Posted by Mac Noland <mc...@gmail.com>.
Any chance you can try spark2-shell with Kudu 1.6 and then re-try your
tests?

spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.6.0

On Fri, Mar 2, 2018 at 5:02 AM, Saúl Nogueras <su...@gmail.com> wrote:

> I cannot properly connect to Kudu from Spark, error says “Kudu master has
> no leader”
>
>    - CDH 5.14
>    - Kudu 1.6
>    - Spark 1.6.0 standalone and 2.2.0
>
> When I use Impala in HUE to create and query kudu tables, it works
> flawlessly.
>
> However, connecting from Spark throws some errors I cannot decipher.
>
> I have tried using both pyspark and spark-shell. With spark shell I had to
> use spark 1.6 instead of 2.2 because some maven dependencies problems, that
> I have localized but not been able to fix. More info here.
> ------------------------------
> Case 1: using pyspark2 (Spark 2.2.0)
>
> $ pyspark2 --master yarn --jars /opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/kudu/kudu-spark2_2.11.jar
>
> > df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077", "kudu.table":"impala::default.test"}).load()
>
> 18/03/02 10:23:27 WARN client.ConnectToCluster: Error receiving response from 172.17.0.43:7077
> org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>         at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>         at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>         at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>         at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
>         at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
>         at org.apache.kudu.client.Connection.<init>(Connection.java:199)
>         at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
>         at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
>         at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
>         at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
>         at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
>         at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
>         at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
>         at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
>         at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
>         at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
>         at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
>         at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>         at py4j.Gateway.invoke(Gateway.java:280)
>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>         ... 1 more
> 18/03/02 10:23:27 WARN client.ConnectToCluster: Unable to find the leader master 172.17.0.43:7077; will retry
>
> Py4JJavaError                             Traceback (most recent call last)
> <ipython-input-1-e1dfaec7a544> in <module>()
> ----> 1 df = sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077", "kudu.table":"impala::default.logika_dataset_kudu"}).load()
>
> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
>     163             return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
>     164         else:
> --> 165             return self._df(self._jreader.load())
>     166
>     167     @since(1.4)
>
> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
>    1131         answer = self.gateway_client.send_command(command)
>    1132         return_value = get_return_value(
> -> 1133             answer, self.gateway_client, self.target_id, self.name)
>    1134
>    1135         for temp_arg in temp_args:
>
> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/sql/utils.py in deco(*a, **kw)
>      61     def deco(*a, **kw):
>      62         try:
> ---> 63             return f(*a, **kw)
>      64         except py4j.protocol.Py4JJavaError as e:
>      65             s = e.java_exception.toString()
>
> /opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
>     317                 raise Py4JJavaError(
>     318                     "An error occurred while calling {0}{1}{2}.\n".
> --> 319                     format(target_id, ".", name), value)
>     320             else:
>     321                 raise Py4JError(
>
> Py4JJavaError: An error occurred while calling o59.load.
> : java.security.PrivilegedActionException: org.apache.kudu.client.NoLeaderFoundException: Master config (172.17.0.43:7077) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
>         at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
>         at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
>         at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>         at py4j.Gateway.invoke(Gateway.java:280)
>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config (172.17.0.43:7077) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>         at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:272)
>         at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:49)
>         at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:349)
>         at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:338)
>         at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
>         at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
>         at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1315)
>         at com.stumbleupon.async.Deferred.doCall(Deferred.java:1286)
>         at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
>         at com.stumbleupon.async.Deferred.callback(Deferred.java:1002)
>         at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:238)
>         at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:292)
>         at org.apache.kudu.client.RpcProxy.failOrRetryRpc(RpcProxy.java:388)
>         at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:217)
>         at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:60)
>         at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:132)
>         at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:128)
>         at org.apache.kudu.client.Connection.cleanup(Connection.java:694)
>         at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:439)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>         at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>         at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>         at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>         at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         ... 1 more
> Caused by: org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] encountered a read timeout; closing the channel
>         at org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
>         ... 21 more
> Caused by: org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
>         at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.<clinit>(ReadTimeoutHandler.java:84)
>         at org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
>         at org.apache.kudu.client.Connection.<init>(Connection.java:199)
>         at org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
>         at org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
>         at org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
>         at org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
>         at org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
>         at org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
>         at org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
>         at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
>         at org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at org.apache.kudu.spark.kudu.KuduContext.<init>(KuduContext.scala:96)
>         at org.apache.kudu.spark.kudu.KuduRelation.<init>(DefaultSource.scala:162)
>         at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:75)
>         at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>         at py4j.Gateway.invoke(Gateway.java:280)
>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>         ... 1 more
>
> Case 2: using spark-shell (Spark 1.6.0 standalone):
>
> $ spark-shell --master spark://localhost:7077 --packages org.apache.kudu:kudu-spark_2.10:1.1.0
> > import org.apache.kudu.spark.kudu._
> > import org.apache.kudu.client._
> > import collection.JavaConverters._
> > val df = sqlContext.read.options(Map("kudu.master" -> "localhost:7051","kudu.table" -> "impala::default.test")).kudu
> df: org.apache.spark.sql.DataFrame = [dataset: string, id: string, itemnumber: string, srcid: string, timestamp: string, year: string, month: string, day: string, week: string, quarter: string, season: string, city: string, region1: string, region2: string, region3: string, region4: string, locality: string, itemname: string, itembqu: string, product_category: string, amount: string, mapped_zipcode: string, latitude: string, longitude: string, depositor_code: string, depositor_name: string, customer_code: string, is_island: string]
>
> It seems to be connecting, as it is able to show the column names, but if I
>
> // register a temporary table and use SQL
> df.registerTempTable("test")
> val filteredDF = sqlContext.sql("select count(*) from test").show()
>
> bang!
>
> [Stage 0:>                                                          (0 + 6) / 6]
> Lost task 1.0 in stage 0.0 (TID 1, tt-slave-2.novalocal, executor 1): org.apache.kudu.client.NonRecoverableException: RPC can not complete before timeout: KuduRpc(method=GetTableSchema, tablet=null, attempt=30, DeadlineTracker(timeout=30000, elapsed=27307), Traces:
>
> [0ms] querying master,
> [48ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu Master - localhost:7051,
> [71ms] Sub rpc: GetMasterRegistration received from server Kudu Master - localhost:7051 response
> Network error:
> [Peer Kudu Master - localhost:7051] Connection reset,
> [75ms] delaying RPC due to Service unavailable: Master config (localhost:7051) has no leader.
> Exceptions received: org.apache.kudu.client.RecoverableException:
> [Peer Kudu Master - localhost:7051] Connection reset,
>
> ...
> (SAME MESSAGE REPEATS 25 TIMES)
> ...
>
> [24262ms] querying master,
> [24262ms] Sub rpc: GetMasterRegistration sending RPC to server Kudu Master - localhost:7051,
> [24263ms] Sub rpc: GetMasterRegistration received from server Kudu Master - localhost:7051 response
> Network error:
> [Peer Kudu Master - localhost:7051] Connection reset,
> [24263ms] delaying RPC due to Service unavailable: Master config (localhost:7051) has no leader.
> Exceptions received: org.apache.kudu.client.RecoverableException:
> [Peer Kudu Master - localhost:7051] Connection reset,
> [24661ms] trace too long, truncated)
>
>         at org.apache.kudu.client.AsyncKuduClient.tooManyAttemptsOrTimeout(AsyncKuduClient.java:961)
>         at org.apache.kudu.client.AsyncKuduClient.delayedSendRpcToTablet(AsyncKuduClient.java:1203)
>         at org.apache.kudu.client.AsyncKuduClient.access$800(AsyncKuduClient.java:110)
>         at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:764)
>         at org.apache.kudu.client.AsyncKuduClient$RetryRpcErrback.call(AsyncKuduClient.java:754)
>         at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
>         at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
>         at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
>         at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:156)
>         at org.apache.kudu.client.GetMasterRegistrationReceived.access$300(GetMasterRegistrationReceived.java:45)
>         at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:236)
>         at org.apache.kudu.client.GetMasterRegistrationReceived$GetMasterRegistrationErrCB.call(GetMasterRegistrationReceived.java:225)
>         at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278)
>         at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257)
>         at com.stumbleupon.async.Deferred.callback(Deferred.java:1005)
>         at org.apache.kudu.client.KuduRpc.handleCallback(KuduRpc.java:220)
>         at org.apache.kudu.client.KuduRpc.errback(KuduRpc.java:274)
>         at org.apache.kudu.client.TabletClient.failOrRetryRpc(TabletClient.java:770)
>         at org.apache.kudu.client.TabletClient.failOrRetryRpcs(TabletClient.java:747)
>         at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:736)
>         at org.apache.kudu.client.TabletClient.channelClosed(TabletClient.java:698)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>         at org.apache.kudu.client.TabletClient.handleUpstream(TabletClient.java:679)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at org.apache.kudu.client.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>         at org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>         at org.apache.kudu.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>         at org.apache.kudu.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config (localhost:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: [Peer Kudu Master - localhost:7051] Connection reset
>         at org.apache.kudu.client.GetMasterRegistrationReceived.incrementCountAndCheckExhausted(GetMasterRegistrationReceived.java:154)
>         ... 32 more
> Caused by: org.apache.kudu.client.RecoverableException: [Peer Kudu Master - localhost:7051] Connection reset
>         at org.apache.kudu.client.TabletClient.cleanup(TabletClient.java:734)
>         ... 21 more
>
> As I said, Kudu service is up an running, and I am able to query kudu
> tables from Hue using Impala.
>
> What am I missing here? Is this the right approach to interfacing Spark
> with Kudu?
>
> Thanks
> ​
>