You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Ranveer kumar <ra...@gmail.com> on 2015/10/11 04:34:50 UTC
Problem loading data from HDFS
Hi All,
I am new in Zepplin and HDFS. I manage to install zeppelin and working fine
while loading data from local directory . But when same I am trying to load
from HDFS (install locally standalone mode).
here is my code :
val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
above is working fine.
but when trying form hdfs :
val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>")
not working and giving error :
java.lang.VerifyError: class
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at
java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
java.net.URLClassLoader.access$100(URLClassLoader.java:71)
complete code is :
val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>")
// val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
case class Bank(age:Integer, job:String, marital : String, education :
String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", ""),
s(5).replaceAll("\"", "").toInt
)
)
// Below line works only in spark 1.3.0.
// For spark 1.1.x and spark 1.2.x,
// use bank.registerTempTable("bank") instead.
bank.toDF().registerTempTable("bank")
println(bankText.count())
my environment are :
spark version : 1.3.1 with hadoop 2.6
zeppelin : binary from apache 0.5
hadoop version : 2.6 binary from apache
java : 1.8
please help I am stuck here.
thanks
regards
Ranveer
RE: Problem loading data from HDFS
Posted by Felix Cheung <fe...@hotmail.com>.
Can you check `export` to see what do you have SPARK_HOME set to?
> Date: Sun, 11 Oct 2015 16:50:23 +0530
> Subject: Re: Problem loading data from HDFS
> From: ranveer.k.kumar@gmail.com
> To: users@zeppelin.incubator.apache.org
> CC: dev@zeppelin.incubator.apache.org
>
> Still same problem my environment is now exact :
>
> hadoop : 2.3
> zeppelin : zeppelin-0.5.0-incubating-bin-spark-1.3.1_hadoop-2.3
> spark : spark-1.3.1-bin-hadoop2.3
>
> while checking version in zeppelin using sc.version, it sowing 1.4.0
>
> pls help I am stuck..
>
> thanks
>
> On Sun, Oct 11, 2015 at 10:58 AM, Ranveer kumar <ra...@gmail.com>
> wrote:
>
> > I am using 0.5.0 now but earlier I was using own build with hadoop 2.6
> > with yarn.
> >
> > Is 0.5.0 binary not compatible with hadoop 2.6?
> >
> > I am again going to build zeppelin with hadoop 2.6 also any pre build
> > zeppelin binary for hadoop 2.6 so directly download.
> >
> > Regards
> > On 11 Oct 2015 10:07, "Felix Cheung" <fe...@hotmail.com> wrote:
> >
> >> Are you getting the last Zeppelin release binary from
> >> https://zeppelin.incubator.apache.org/download.html
> >> <https://zeppelin.incubator.apache.org/download.html?> or the Apache
> >> release page?
> >>
> >> It looks like the 0.5.0 release of Zeppelin is only built with Hadoop
> >> 2.3. We might be cutting another release soon or you might need to build
> >> you own...
> >>
> >> _____________________________
> >> From: Ranveer Kumar <ra...@gmail.com>
> >> Sent: Saturday, October 10, 2015 9:00 PM
> >> Subject: Re: Problem loading data from HDFS
> >> To: <de...@zeppelin.incubator.apache.org>
> >> Cc: <us...@zeppelin.incubator.apache.org>
> >>
> >>
> >> Hi Felix thanks for reply.
> >>
> >> I am using binary download from apache.
> >>
> >> I also tried using 2.6 hadoop, which build from source code.
> >>
> >> Which version of hadoop is compatible with zeppelin binary available on
> >> apache site.
> >> On 11 Oct 2015 09:05, "Felix Cheung" < felixcheung_m@hotmail.com> wrote:
> >>
> >>> Is your Zeppelin built with Hadoop 2.6?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" <
> >>> ranveer.k.kumar@gmail.com> wrote:
> >>> Hi All,
> >>>
> >>> I am new in Zepplin and HDFS. I manage to install zeppelin and working
> >>> fine
> >>> while loading data from local directory . But when same I am trying to
> >>> load
> >>> from HDFS (install locally standalone mode).
> >>>
> >>> here is my code :
> >>>
> >>> val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
> >>>
> >>> above is working fine.
> >>>
> >>> but when trying form hdfs :
> >>>
> >>> val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
> >>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
> >>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>/
> >>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
> >>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>")
> >>>
> >>> not working and giving error :
> >>>
> >>> java.lang.VerifyError: class
> >>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
> >>>
> >>> overrides final method
> >>> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
> >>> java.lang.ClassLoader.defineClass1(Native Method) at
> >>> java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
> >>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> >>> at
> >>> java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
> >>> java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> >>>
> >>> complete code is :
> >>>
> >>> val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
> >>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
> >>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>/
> >>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
> >>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>")
> >>> // val bankText =
> >>> sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
> >>> case class Bank(age:Integer, job:String, marital : String, education :
> >>> String, balance : Integer)
> >>>
> >>> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
> >>> s=>Bank(s(0).toInt,
> >>> s(1).replaceAll("\"", ""),
> >>> s(2).replaceAll("\"", ""),
> >>> s(3).replaceAll("\"", ""),
> >>> s(5).replaceAll("\"", "").toInt
> >>> )
> >>> )
> >>>
> >>> // Below line works only in spark 1.3.0.
> >>> // For spark 1.1.x and spark 1.2.x,
> >>> // use bank.registerTempTable("bank") instead.
> >>> bank.toDF().registerTempTable("bank")
> >>> println(bankText.count())
> >>>
> >>> my environment are :
> >>>
> >>> spark version : 1.3.1 with hadoop 2.6
> >>>
> >>> zeppelin : binary from apache 0.5
> >>>
> >>> hadoop version : 2.6 binary from apache
> >>>
> >>> java : 1.8
> >>>
> >>> please help I am stuck here.
> >>>
> >>> thanks
> >>>
> >>> regards
> >>>
> >>> Ranveer
> >>>
> >>
> >>
> >>
Re: Problem loading data from HDFS
Posted by Ranveer kumar <ra...@gmail.com>.
Still same problem my environment is now exact :
hadoop : 2.3
zeppelin : zeppelin-0.5.0-incubating-bin-spark-1.3.1_hadoop-2.3
spark : spark-1.3.1-bin-hadoop2.3
while checking version in zeppelin using sc.version, it sowing 1.4.0
pls help I am stuck..
thanks
On Sun, Oct 11, 2015 at 10:58 AM, Ranveer kumar <ra...@gmail.com>
wrote:
> I am using 0.5.0 now but earlier I was using own build with hadoop 2.6
> with yarn.
>
> Is 0.5.0 binary not compatible with hadoop 2.6?
>
> I am again going to build zeppelin with hadoop 2.6 also any pre build
> zeppelin binary for hadoop 2.6 so directly download.
>
> Regards
> On 11 Oct 2015 10:07, "Felix Cheung" <fe...@hotmail.com> wrote:
>
>> Are you getting the last Zeppelin release binary from
>> https://zeppelin.incubator.apache.org/download.html
>> <https://zeppelin.incubator.apache.org/download.html?> or the Apache
>> release page?
>>
>> It looks like the 0.5.0 release of Zeppelin is only built with Hadoop
>> 2.3. We might be cutting another release soon or you might need to build
>> you own...
>>
>> _____________________________
>> From: Ranveer Kumar <ra...@gmail.com>
>> Sent: Saturday, October 10, 2015 9:00 PM
>> Subject: Re: Problem loading data from HDFS
>> To: <de...@zeppelin.incubator.apache.org>
>> Cc: <us...@zeppelin.incubator.apache.org>
>>
>>
>> Hi Felix thanks for reply.
>>
>> I am using binary download from apache.
>>
>> I also tried using 2.6 hadoop, which build from source code.
>>
>> Which version of hadoop is compatible with zeppelin binary available on
>> apache site.
>> On 11 Oct 2015 09:05, "Felix Cheung" < felixcheung_m@hotmail.com> wrote:
>>
>>> Is your Zeppelin built with Hadoop 2.6?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" <
>>> ranveer.k.kumar@gmail.com> wrote:
>>> Hi All,
>>>
>>> I am new in Zepplin and HDFS. I manage to install zeppelin and working
>>> fine
>>> while loading data from local directory . But when same I am trying to
>>> load
>>> from HDFS (install locally standalone mode).
>>>
>>> here is my code :
>>>
>>> val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
>>>
>>> above is working fine.
>>>
>>> but when trying form hdfs :
>>>
>>> val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>/
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>")
>>>
>>> not working and giving error :
>>>
>>> java.lang.VerifyError: class
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
>>>
>>> overrides final method
>>> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
>>> java.lang.ClassLoader.defineClass1(Native Method) at
>>> java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>> at
>>> java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
>>> java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>>>
>>> complete code is :
>>>
>>> val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>/
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>")
>>> // val bankText =
>>> sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
>>> case class Bank(age:Integer, job:String, marital : String, education :
>>> String, balance : Integer)
>>>
>>> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>>> s=>Bank(s(0).toInt,
>>> s(1).replaceAll("\"", ""),
>>> s(2).replaceAll("\"", ""),
>>> s(3).replaceAll("\"", ""),
>>> s(5).replaceAll("\"", "").toInt
>>> )
>>> )
>>>
>>> // Below line works only in spark 1.3.0.
>>> // For spark 1.1.x and spark 1.2.x,
>>> // use bank.registerTempTable("bank") instead.
>>> bank.toDF().registerTempTable("bank")
>>> println(bankText.count())
>>>
>>> my environment are :
>>>
>>> spark version : 1.3.1 with hadoop 2.6
>>>
>>> zeppelin : binary from apache 0.5
>>>
>>> hadoop version : 2.6 binary from apache
>>>
>>> java : 1.8
>>>
>>> please help I am stuck here.
>>>
>>> thanks
>>>
>>> regards
>>>
>>> Ranveer
>>>
>>
>>
>>
Re: Problem loading data from HDFS
Posted by Ranveer kumar <ra...@gmail.com>.
Still same problem my environment is now exact :
hadoop : 2.3
zeppelin : zeppelin-0.5.0-incubating-bin-spark-1.3.1_hadoop-2.3
spark : spark-1.3.1-bin-hadoop2.3
while checking version in zeppelin using sc.version, it sowing 1.4.0
pls help I am stuck..
thanks
On Sun, Oct 11, 2015 at 10:58 AM, Ranveer kumar <ra...@gmail.com>
wrote:
> I am using 0.5.0 now but earlier I was using own build with hadoop 2.6
> with yarn.
>
> Is 0.5.0 binary not compatible with hadoop 2.6?
>
> I am again going to build zeppelin with hadoop 2.6 also any pre build
> zeppelin binary for hadoop 2.6 so directly download.
>
> Regards
> On 11 Oct 2015 10:07, "Felix Cheung" <fe...@hotmail.com> wrote:
>
>> Are you getting the last Zeppelin release binary from
>> https://zeppelin.incubator.apache.org/download.html
>> <https://zeppelin.incubator.apache.org/download.html?> or the Apache
>> release page?
>>
>> It looks like the 0.5.0 release of Zeppelin is only built with Hadoop
>> 2.3. We might be cutting another release soon or you might need to build
>> you own...
>>
>> _____________________________
>> From: Ranveer Kumar <ra...@gmail.com>
>> Sent: Saturday, October 10, 2015 9:00 PM
>> Subject: Re: Problem loading data from HDFS
>> To: <de...@zeppelin.incubator.apache.org>
>> Cc: <us...@zeppelin.incubator.apache.org>
>>
>>
>> Hi Felix thanks for reply.
>>
>> I am using binary download from apache.
>>
>> I also tried using 2.6 hadoop, which build from source code.
>>
>> Which version of hadoop is compatible with zeppelin binary available on
>> apache site.
>> On 11 Oct 2015 09:05, "Felix Cheung" < felixcheung_m@hotmail.com> wrote:
>>
>>> Is your Zeppelin built with Hadoop 2.6?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" <
>>> ranveer.k.kumar@gmail.com> wrote:
>>> Hi All,
>>>
>>> I am new in Zepplin and HDFS. I manage to install zeppelin and working
>>> fine
>>> while loading data from local directory . But when same I am trying to
>>> load
>>> from HDFS (install locally standalone mode).
>>>
>>> here is my code :
>>>
>>> val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
>>>
>>> above is working fine.
>>>
>>> but when trying form hdfs :
>>>
>>> val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>/
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>")
>>>
>>> not working and giving error :
>>>
>>> java.lang.VerifyError: class
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
>>>
>>> overrides final method
>>> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
>>> java.lang.ClassLoader.defineClass1(Native Method) at
>>> java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>> at
>>> java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
>>> java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>>>
>>> complete code is :
>>>
>>> val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>/
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
>>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>")
>>> // val bankText =
>>> sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
>>> case class Bank(age:Integer, job:String, marital : String, education :
>>> String, balance : Integer)
>>>
>>> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>>> s=>Bank(s(0).toInt,
>>> s(1).replaceAll("\"", ""),
>>> s(2).replaceAll("\"", ""),
>>> s(3).replaceAll("\"", ""),
>>> s(5).replaceAll("\"", "").toInt
>>> )
>>> )
>>>
>>> // Below line works only in spark 1.3.0.
>>> // For spark 1.1.x and spark 1.2.x,
>>> // use bank.registerTempTable("bank") instead.
>>> bank.toDF().registerTempTable("bank")
>>> println(bankText.count())
>>>
>>> my environment are :
>>>
>>> spark version : 1.3.1 with hadoop 2.6
>>>
>>> zeppelin : binary from apache 0.5
>>>
>>> hadoop version : 2.6 binary from apache
>>>
>>> java : 1.8
>>>
>>> please help I am stuck here.
>>>
>>> thanks
>>>
>>> regards
>>>
>>> Ranveer
>>>
>>
>>
>>
Re: Problem loading data from HDFS
Posted by Ranveer kumar <ra...@gmail.com>.
I am using 0.5.0 now but earlier I was using own build with hadoop 2.6 with
yarn.
Is 0.5.0 binary not compatible with hadoop 2.6?
I am again going to build zeppelin with hadoop 2.6 also any pre build
zeppelin binary for hadoop 2.6 so directly download.
Regards
On 11 Oct 2015 10:07, "Felix Cheung" <fe...@hotmail.com> wrote:
> Are you getting the last Zeppelin release binary from
> https://zeppelin.incubator.apache.org/download.html
> <https://zeppelin.incubator.apache.org/download.html?> or the Apache
> release page?
>
> It looks like the 0.5.0 release of Zeppelin is only built with Hadoop 2.3.
> We might be cutting another release soon or you might need to build you
> own...
>
> _____________________________
> From: Ranveer Kumar <ra...@gmail.com>
> Sent: Saturday, October 10, 2015 9:00 PM
> Subject: Re: Problem loading data from HDFS
> To: <de...@zeppelin.incubator.apache.org>
> Cc: <us...@zeppelin.incubator.apache.org>
>
>
> Hi Felix thanks for reply.
>
> I am using binary download from apache.
>
> I also tried using 2.6 hadoop, which build from source code.
>
> Which version of hadoop is compatible with zeppelin binary available on
> apache site.
> On 11 Oct 2015 09:05, "Felix Cheung" < felixcheung_m@hotmail.com> wrote:
>
>> Is your Zeppelin built with Hadoop 2.6?
>>
>>
>>
>>
>>
>>
>> On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" <
>> ranveer.k.kumar@gmail.com> wrote:
>> Hi All,
>>
>> I am new in Zepplin and HDFS. I manage to install zeppelin and working
>> fine
>> while loading data from local directory . But when same I am trying to
>> load
>> from HDFS (install locally standalone mode).
>>
>> here is my code :
>>
>> val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
>>
>> above is working fine.
>>
>> but when trying form hdfs :
>>
>> val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>/
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>")
>>
>> not working and giving error :
>>
>> java.lang.VerifyError: class
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
>>
>> overrides final method
>> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
>> java.lang.ClassLoader.defineClass1(Native Method) at
>> java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>> at
>> java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
>> java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>>
>> complete code is :
>>
>> val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>/
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>")
>> // val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
>> case class Bank(age:Integer, job:String, marital : String, education :
>> String, balance : Integer)
>>
>> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>> s=>Bank(s(0).toInt,
>> s(1).replaceAll("\"", ""),
>> s(2).replaceAll("\"", ""),
>> s(3).replaceAll("\"", ""),
>> s(5).replaceAll("\"", "").toInt
>> )
>> )
>>
>> // Below line works only in spark 1.3.0.
>> // For spark 1.1.x and spark 1.2.x,
>> // use bank.registerTempTable("bank") instead.
>> bank.toDF().registerTempTable("bank")
>> println(bankText.count())
>>
>> my environment are :
>>
>> spark version : 1.3.1 with hadoop 2.6
>>
>> zeppelin : binary from apache 0.5
>>
>> hadoop version : 2.6 binary from apache
>>
>> java : 1.8
>>
>> please help I am stuck here.
>>
>> thanks
>>
>> regards
>>
>> Ranveer
>>
>
>
>
Re: Problem loading data from HDFS
Posted by Ranveer kumar <ra...@gmail.com>.
I am using 0.5.0 now but earlier I was using own build with hadoop 2.6 with
yarn.
Is 0.5.0 binary not compatible with hadoop 2.6?
I am again going to build zeppelin with hadoop 2.6 also any pre build
zeppelin binary for hadoop 2.6 so directly download.
Regards
On 11 Oct 2015 10:07, "Felix Cheung" <fe...@hotmail.com> wrote:
> Are you getting the last Zeppelin release binary from
> https://zeppelin.incubator.apache.org/download.html
> <https://zeppelin.incubator.apache.org/download.html?> or the Apache
> release page?
>
> It looks like the 0.5.0 release of Zeppelin is only built with Hadoop 2.3.
> We might be cutting another release soon or you might need to build you
> own...
>
> _____________________________
> From: Ranveer Kumar <ra...@gmail.com>
> Sent: Saturday, October 10, 2015 9:00 PM
> Subject: Re: Problem loading data from HDFS
> To: <de...@zeppelin.incubator.apache.org>
> Cc: <us...@zeppelin.incubator.apache.org>
>
>
> Hi Felix thanks for reply.
>
> I am using binary download from apache.
>
> I also tried using 2.6 hadoop, which build from source code.
>
> Which version of hadoop is compatible with zeppelin binary available on
> apache site.
> On 11 Oct 2015 09:05, "Felix Cheung" < felixcheung_m@hotmail.com> wrote:
>
>> Is your Zeppelin built with Hadoop 2.6?
>>
>>
>>
>>
>>
>>
>> On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" <
>> ranveer.k.kumar@gmail.com> wrote:
>> Hi All,
>>
>> I am new in Zepplin and HDFS. I manage to install zeppelin and working
>> fine
>> while loading data from local directory . But when same I am trying to
>> load
>> from HDFS (install locally standalone mode).
>>
>> here is my code :
>>
>> val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
>>
>> above is working fine.
>>
>> but when trying form hdfs :
>>
>> val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>/
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>")
>>
>> not working and giving error :
>>
>> java.lang.VerifyError: class
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
>>
>> overrides final method
>> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
>> java.lang.ClassLoader.defineClass1(Native Method) at
>> java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>> at
>> java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
>> java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>>
>> complete code is :
>>
>> val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>/
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
>> < http://127.0.0.1:9000/demo/csv/bank-full.csv>")
>> // val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
>> case class Bank(age:Integer, job:String, marital : String, education :
>> String, balance : Integer)
>>
>> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>> s=>Bank(s(0).toInt,
>> s(1).replaceAll("\"", ""),
>> s(2).replaceAll("\"", ""),
>> s(3).replaceAll("\"", ""),
>> s(5).replaceAll("\"", "").toInt
>> )
>> )
>>
>> // Below line works only in spark 1.3.0.
>> // For spark 1.1.x and spark 1.2.x,
>> // use bank.registerTempTable("bank") instead.
>> bank.toDF().registerTempTable("bank")
>> println(bankText.count())
>>
>> my environment are :
>>
>> spark version : 1.3.1 with hadoop 2.6
>>
>> zeppelin : binary from apache 0.5
>>
>> hadoop version : 2.6 binary from apache
>>
>> java : 1.8
>>
>> please help I am stuck here.
>>
>> thanks
>>
>> regards
>>
>> Ranveer
>>
>
>
>
Re: Problem loading data from HDFS
Posted by Felix Cheung <fe...@hotmail.com>.
Are you getting the last Zeppelin release binary from
https://zeppelin.incubator.apache.org/download.html or the Apache release page?
It looks like the 0.5.0 release of Zeppelin is only built with Hadoop 2.3. We might be cutting another release soon or you might need to build you own...
_____________________________
From: Ranveer Kumar <ra...@gmail.com>
Sent: Saturday, October 10, 2015 9:00 PM
Subject: Re: Problem loading data from HDFS
To: <de...@zeppelin.incubator.apache.org>
Cc: <us...@zeppelin.incubator.apache.org>
Hi Felix thanks for reply.
I am using binary download from apache.
I also tried using 2.6 hadoop, which build from source code.
Which version of hadoop is compatible with zeppelin binary available on apache site. On 11 Oct 2015 09:05, "Felix Cheung" < felixcheung_m@hotmail.com> wrote:
Is your Zeppelin built with Hadoop 2.6?
On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" < ranveer.k.kumar@gmail.com> wrote:
Hi All,
I am new in Zepplin and HDFS. I manage to install zeppelin and working fine
while loading data from local directory . But when same I am trying to load
from HDFS (install locally standalone mode).
here is my code :
val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
above is working fine.
but when trying form hdfs :
val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
< http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
< http://127.0.0.1:9000/demo/csv/bank-full.csv>/
< http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
< http://127.0.0.1:9000/demo/csv/bank-full.csv>")
not working and giving error :
java.lang.VerifyError: class
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at
java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
java.net.URLClassLoader.access$100(URLClassLoader.java:71)
complete code is :
val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
< http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
< http://127.0.0.1:9000/demo/csv/bank-full.csv>/
< http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
< http://127.0.0.1:9000/demo/csv/bank-full.csv>")
// val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
case class Bank(age:Integer, job:String, marital : String, education :
String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", ""),
s(5).replaceAll("\"", "").toInt
)
)
// Below line works only in spark 1.3.0.
// For spark 1.1.x and spark 1.2.x,
// use bank.registerTempTable("bank") instead.
bank.toDF().registerTempTable("bank")
println(bankText.count())
my environment are :
spark version : 1.3.1 with hadoop 2.6
zeppelin : binary from apache 0.5
hadoop version : 2.6 binary from apache
java : 1.8
please help I am stuck here.
thanks
regards
Ranveer
Re: Problem loading data from HDFS
Posted by Felix Cheung <fe...@hotmail.com>.
Are you getting the last Zeppelin release binary from
https://zeppelin.incubator.apache.org/download.html or the Apache release page?
It looks like the 0.5.0 release of Zeppelin is only built with Hadoop 2.3. We might be cutting another release soon or you might need to build you own...
_____________________________
From: Ranveer Kumar <ra...@gmail.com>
Sent: Saturday, October 10, 2015 9:00 PM
Subject: Re: Problem loading data from HDFS
To: <de...@zeppelin.incubator.apache.org>
Cc: <us...@zeppelin.incubator.apache.org>
Hi Felix thanks for reply.
I am using binary download from apache.
I also tried using 2.6 hadoop, which build from source code.
Which version of hadoop is compatible with zeppelin binary available on apache site. On 11 Oct 2015 09:05, "Felix Cheung" < felixcheung_m@hotmail.com> wrote:
Is your Zeppelin built with Hadoop 2.6?
On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" < ranveer.k.kumar@gmail.com> wrote:
Hi All,
I am new in Zepplin and HDFS. I manage to install zeppelin and working fine
while loading data from local directory . But when same I am trying to load
from HDFS (install locally standalone mode).
here is my code :
val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
above is working fine.
but when trying form hdfs :
val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
< http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
< http://127.0.0.1:9000/demo/csv/bank-full.csv>/
< http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
< http://127.0.0.1:9000/demo/csv/bank-full.csv>")
not working and giving error :
java.lang.VerifyError: class
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at
java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
java.net.URLClassLoader.access$100(URLClassLoader.java:71)
complete code is :
val bankText = sc.textFile("hdfs:// 127.0.0.1:9000/demo/
< http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
< http://127.0.0.1:9000/demo/csv/bank-full.csv>/
< http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
< http://127.0.0.1:9000/demo/csv/bank-full.csv>")
// val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
case class Bank(age:Integer, job:String, marital : String, education :
String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", ""),
s(5).replaceAll("\"", "").toInt
)
)
// Below line works only in spark 1.3.0.
// For spark 1.1.x and spark 1.2.x,
// use bank.registerTempTable("bank") instead.
bank.toDF().registerTempTable("bank")
println(bankText.count())
my environment are :
spark version : 1.3.1 with hadoop 2.6
zeppelin : binary from apache 0.5
hadoop version : 2.6 binary from apache
java : 1.8
please help I am stuck here.
thanks
regards
Ranveer
Re: Problem loading data from HDFS
Posted by Ranveer Kumar <ra...@gmail.com>.
Hi Felix thanks for reply.
I am using binary download from apache.
I also tried using 2.6 hadoop, which build from source code.
Which version of hadoop is compatible with zeppelin binary available on
apache site.
On 11 Oct 2015 09:05, "Felix Cheung" <fe...@hotmail.com> wrote:
> Is your Zeppelin built with Hadoop 2.6?
>
>
>
>
>
>
> On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" <
> ranveer.k.kumar@gmail.com> wrote:
> Hi All,
>
> I am new in Zepplin and HDFS. I manage to install zeppelin and working fine
> while loading data from local directory . But when same I am trying to load
> from HDFS (install locally standalone mode).
>
> here is my code :
>
> val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
>
> above is working fine.
>
> but when trying form hdfs :
>
> val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>/
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>")
>
> not working and giving error :
>
> java.lang.VerifyError: class
>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
> overrides final method
> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
> java.lang.ClassLoader.defineClass1(Native Method) at
> java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at
> java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
> java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>
> complete code is :
>
> val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>/
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>")
> // val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
> case class Bank(age:Integer, job:String, marital : String, education :
> String, balance : Integer)
>
> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
> s=>Bank(s(0).toInt,
> s(1).replaceAll("\"", ""),
> s(2).replaceAll("\"", ""),
> s(3).replaceAll("\"", ""),
> s(5).replaceAll("\"", "").toInt
> )
> )
>
> // Below line works only in spark 1.3.0.
> // For spark 1.1.x and spark 1.2.x,
> // use bank.registerTempTable("bank") instead.
> bank.toDF().registerTempTable("bank")
> println(bankText.count())
>
> my environment are :
>
> spark version : 1.3.1 with hadoop 2.6
>
> zeppelin : binary from apache 0.5
>
> hadoop version : 2.6 binary from apache
>
> java : 1.8
>
> please help I am stuck here.
>
> thanks
>
> regards
>
> Ranveer
>
Re: Problem loading data from HDFS
Posted by Ranveer Kumar <ra...@gmail.com>.
Hi Felix thanks for reply.
I am using binary download from apache.
I also tried using 2.6 hadoop, which build from source code.
Which version of hadoop is compatible with zeppelin binary available on
apache site.
On 11 Oct 2015 09:05, "Felix Cheung" <fe...@hotmail.com> wrote:
> Is your Zeppelin built with Hadoop 2.6?
>
>
>
>
>
>
> On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" <
> ranveer.k.kumar@gmail.com> wrote:
> Hi All,
>
> I am new in Zepplin and HDFS. I manage to install zeppelin and working fine
> while loading data from local directory . But when same I am trying to load
> from HDFS (install locally standalone mode).
>
> here is my code :
>
> val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
>
> above is working fine.
>
> but when trying form hdfs :
>
> val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>/
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>")
>
> not working and giving error :
>
> java.lang.VerifyError: class
>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
> overrides final method
> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
> java.lang.ClassLoader.defineClass1(Native Method) at
> java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at
> java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
> java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>
> complete code is :
>
> val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>/
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
> <http://127.0.0.1:9000/demo/csv/bank-full.csv>")
> // val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
> case class Bank(age:Integer, job:String, marital : String, education :
> String, balance : Integer)
>
> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
> s=>Bank(s(0).toInt,
> s(1).replaceAll("\"", ""),
> s(2).replaceAll("\"", ""),
> s(3).replaceAll("\"", ""),
> s(5).replaceAll("\"", "").toInt
> )
> )
>
> // Below line works only in spark 1.3.0.
> // For spark 1.1.x and spark 1.2.x,
> // use bank.registerTempTable("bank") instead.
> bank.toDF().registerTempTable("bank")
> println(bankText.count())
>
> my environment are :
>
> spark version : 1.3.1 with hadoop 2.6
>
> zeppelin : binary from apache 0.5
>
> hadoop version : 2.6 binary from apache
>
> java : 1.8
>
> please help I am stuck here.
>
> thanks
>
> regards
>
> Ranveer
>
Re: Problem loading data from HDFS
Posted by Felix Cheung <fe...@hotmail.com>.
Is your Zeppelin built with Hadoop 2.6?
On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" <ra...@gmail.com> wrote:
Hi All,
I am new in Zepplin and HDFS. I manage to install zeppelin and working fine
while loading data from local directory . But when same I am trying to load
from HDFS (install locally standalone mode).
here is my code :
val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
above is working fine.
but when trying form hdfs :
val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>")
not working and giving error :
java.lang.VerifyError: class
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at
java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
java.net.URLClassLoader.access$100(URLClassLoader.java:71)
complete code is :
val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>")
// val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
case class Bank(age:Integer, job:String, marital : String, education :
String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", ""),
s(5).replaceAll("\"", "").toInt
)
)
// Below line works only in spark 1.3.0.
// For spark 1.1.x and spark 1.2.x,
// use bank.registerTempTable("bank") instead.
bank.toDF().registerTempTable("bank")
println(bankText.count())
my environment are :
spark version : 1.3.1 with hadoop 2.6
zeppelin : binary from apache 0.5
hadoop version : 2.6 binary from apache
java : 1.8
please help I am stuck here.
thanks
regards
Ranveer
Re: Problem loading data from HDFS
Posted by Felix Cheung <fe...@hotmail.com>.
Is your Zeppelin built with Hadoop 2.6?
On Sat, Oct 10, 2015 at 7:35 PM -0700, "Ranveer kumar" <ra...@gmail.com> wrote:
Hi All,
I am new in Zepplin and HDFS. I manage to install zeppelin and working fine
while loading data from local directory . But when same I am trying to load
from HDFS (install locally standalone mode).
here is my code :
val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
above is working fine.
but when trying form hdfs :
val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>")
not working and giving error :
java.lang.VerifyError: class
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at
java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
java.net.URLClassLoader.access$100(URLClassLoader.java:71)
complete code is :
val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>")
// val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
case class Bank(age:Integer, job:String, marital : String, education :
String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", ""),
s(5).replaceAll("\"", "").toInt
)
)
// Below line works only in spark 1.3.0.
// For spark 1.1.x and spark 1.2.x,
// use bank.registerTempTable("bank") instead.
bank.toDF().registerTempTable("bank")
println(bankText.count())
my environment are :
spark version : 1.3.1 with hadoop 2.6
zeppelin : binary from apache 0.5
hadoop version : 2.6 binary from apache
java : 1.8
please help I am stuck here.
thanks
regards
Ranveer