You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kal El <pi...@yahoo.com> on 2014/01/28 14:29:59 UTC

cannot read file from HDFS

I am trying to read a file from hdfs. (The hdfs is working fine, I have uploaded the file from one machine - the master, and downloaded it on a slave to make sure that everything is ok) but I am receiving this error:

(this is how my reading line looks like: "val lines = sc.textFile("hdfs://10.237.114.143:8020/files/fisier_16mil_30D_R10k.txt")
")

java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "SparkTwo/127.0.0.1"; destination host is: "10.237.114.143":8020;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
        at org.apache.hadoop.ipc.Client.call(Client.java:1351)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
        at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1701)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1647)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:222)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
        at scala.Option.getOrElse(Option.scala:108)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
        at scala.Option.getOrElse(Option.scala:108)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
        at scala.Option.getOrElse(Option.scala:108)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:886)
        at org.apache.spark.rdd.RDD.count(RDD.scala:698)
        at org.apache.spark.rdd.RDD.takeSample(RDD.scala:323)
        at SparkKMeans$.main(SparkKMeans.scala:66)
        at SparkKMeans.main(SparkKMeans.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
        at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
        at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)
        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
        at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
        at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
        at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
        at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
        at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
        at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

Any thoughts on this ?

Thanks

Re: cannot read file from HDFS

Posted by Kal El <pi...@yahoo.com>.
1. so the only way to use the local file again, without using Hadoop, would be to clear .../hadoop/bin from $PATH ?
2. I can confirm that HDFS port is 8020
3. I recompiled Spark with "SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly" 

HDFS seems to be running ok, and I made the following test:
on master: copied a file from local storage to hdfs
on slave: copied the file from hdfs to a local path.
As a result, I got the file on both machines

Thanks



On Tuesday, January 28, 2014 7:32 PM, 尹绪森 <yi...@gmail.com> wrote:
 
1. textFile() function uses HadoopRDD in the behind, so the regular path will have same problem.
2. Could you confirm the HDFS port is 8020 ?
3. Does your Spark compiled with the right version of your Hadoop ?
2014-1-28 PM10:37于 "Kal El" <pi...@yahoo.com>写道:

I see that if I replace the hdfs path with a regular (local) one I get the same error ...
>
>
>
>On Tuesday, January 28, 2014 3:37 PM, Kal El <pi...@yahoo.com> wrote:
> 
>I am trying to read a file from hdfs. (The hdfs is working fine, I have uploaded the file from one machine - the master, and downloaded it on a slave to make sure that everything is ok) but I am receiving this error:
>
>
>(this is how my reading line looks like: "val lines = sc.textFile("hdfs://10.237.114.143:8020/files/fisier_16mil_30D_R10k.txt")
>")
>
>
>java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "SparkTwo/127.0.0.1"; destination host is: "10.237.114.143":8020;
>        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
>        at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
>        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
>        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>        at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1701)
>        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1647)
>        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:222)
>        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
>        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141)
>        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
>        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
>        at scala.Option.getOrElse(Option.scala:108)
>        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
>        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
>        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
>        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
>        at scala.Option.getOrElse(Option.scala:108)
>        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
>        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
>        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
>        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
>        at scala.Option.getOrElse(Option.scala:108)
>        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:886)
>        at org.apache.spark.rdd.RDD.count(RDD.scala:698)
>        at org.apache.spark.rdd.RDD.takeSample(RDD.scala:323)
>        at SparkKMeans$.main(SparkKMeans.scala:66)
>        at SparkKMeans.main(SparkKMeans.scala)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
>        at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
>        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
>        at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)
>        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
>        at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
>        at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
>        at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
>        at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
>        at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
>        at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
>
>
>Any thoughts on this ?
>
>
>Thanks
>
>
>
>

Re: cannot read file from HDFS

Posted by 尹绪森 <yi...@gmail.com>.
1. textFile() function uses HadoopRDD in the behind, so the regular path
will have same problem.

2. Could you confirm the HDFS port is 8020 ?

3. Does your Spark compiled with the right version of your Hadoop ?
2014-1-28 PM10:37于 "Kal El" <pi...@yahoo.com>写道:

> I see that if I replace the hdfs path with a regular (local) one I get the
> same error ...
>
>
>   On Tuesday, January 28, 2014 3:37 PM, Kal El <pi...@yahoo.com>
> wrote:
>  I am trying to read a file from hdfs. (The hdfs is working fine, I have
> uploaded the file from one machine - the master, and downloaded it on a
> slave to make sure that everything is ok) but I am receiving this error:
>
> (this is how my reading line looks like: "val lines = sc.textFile("hdfs://
> 10.237.114.143:8020/files/fisier_16mil_30D_R10k.txt")
> ")
>
> java.io.IOException: Failed on local exception: java.io.EOFException; Host
> Details : local host is: "SparkTwo/127.0.0.1"; destination host is:
> "10.237.114.143":8020;
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>         at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>         at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>         at
> org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1701)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1647)
>         at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:222)
>         at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
>         at
> org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141)
>         at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
>         at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
>         at scala.Option.getOrElse(Option.scala:108)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
>         at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
>         at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
>         at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
>         at scala.Option.getOrElse(Option.scala:108)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
>         at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
>         at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
>         at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
>         at scala.Option.getOrElse(Option.scala:108)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:886)
>         at org.apache.spark.rdd.RDD.count(RDD.scala:698)
>         at org.apache.spark.rdd.RDD.takeSample(RDD.scala:323)
>         at SparkKMeans$.main(SparkKMeans.scala:66)
>         at SparkKMeans.main(SparkKMeans.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
>         at
> scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
>         at
> scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
>         at
> scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)
>         at
> scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
>         at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
>         at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
>         at
> scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
>         at
> scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
>         at
> scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
>         at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
>
> Any thoughts on this ?
>
> Thanks
>
>
>
>

Re: cannot read file from HDFS

Posted by Kal El <pi...@yahoo.com>.
I see that if I replace the hdfs path with a regular (local) one I get the same error ...



On Tuesday, January 28, 2014 3:37 PM, Kal El <pi...@yahoo.com> wrote:
 
I am trying to read a file from hdfs. (The hdfs is working fine, I have uploaded the file from one machine - the master, and downloaded it on a slave to make sure that everything is ok) but I am receiving this error:

(this is how my reading line looks like: "val lines = sc.textFile("hdfs://10.237.114.143:8020/files/fisier_16mil_30D_R10k.txt")
")

java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "SparkTwo/127.0.0.1"; destination host is: "10.237.114.143":8020;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
        at org.apache.hadoop.ipc.Client.call(Client.java:1351)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
        at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1701)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1647)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:222)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
        at scala.Option.getOrElse(Option.scala:108)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
        at scala.Option.getOrElse(Option.scala:108)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
        at scala.Option.getOrElse(Option.scala:108)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:886)
        at org.apache.spark.rdd.RDD.count(RDD.scala:698)
        at org.apache.spark.rdd.RDD.takeSample(RDD.scala:323)
        at SparkKMeans$.main(SparkKMeans.scala:66)
        at SparkKMeans.main(SparkKMeans.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
        at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
        at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)
        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
        at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
        at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
        at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
        at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
        at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
        at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

Any thoughts on this ?

Thanks