You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Richard Grossman <ri...@gmail.com> on 2015/05/18 19:07:47 UTC

Can't execute SQL on Ec2 Spark cluster

Hi

After switching from bad github repository I've succeed to run a command.
So now I would like to make the tutorial I've a new notebook
First I enter this code :
val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")

case class Bank(age:Integer, job:String, marital : String, education :
String, balance : Integer)

val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
    s=>Bank(s(0).toInt,
            s(1).replaceAll("\"", ""),
            s(2).replaceAll("\"", ""),
            s(3).replaceAll("\"", ""),
            s(5).replaceAll("\"", "").toInt
        )
)

bank.toDF().registerTempTable("bank")

All is OK I get result as :
bankText: org.apache.spark.rdd.RDD[String] =
s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at
<console>:24 defined class Bank bank: org.apache.spark.rdd.RDD[Bank] =
MapPartitionsRDD[4] at map at <console>:28

Now I'm trying to run the sql query
%sql select age, count(1) from bank where age < 30 group by age order by age

The operation never end just running. In the log I can see:

Initial job has not accepted any resources; check your cluster UI to ensure
that workers are registered and have sufficient resources

I can't debug the worker as the job worker URL on spark UI give an error on
port 4040:
app-20150518165912-0000
<http://ec2-54-91-146-31.compute-1.amazonaws.com:8080/app?appId=app-20150518165912-0000>
Zeppelin <http://ip-10-123-128-51.ec2.internal:4040/>166.0 GB2015/05/18
16:59:12ubuntuRUNNING5.5 min
The Zepellin like is on 4040
Could you help me to understand what going on ?

Thanks

Re: Can't execute SQL on Ec2 Spark cluster

Posted by Richard Grossman <ri...@gmail.com>.
Why you stole this thread ?
I've another problem please open a new thread

On Tue, May 19, 2015 at 3:17 PM, clark djilo kuissu <dj...@yahoo.fr>
wrote:

> I removed the space andI have this error:
>
> Failed to execute goal on project zeppelin-interpreter: could not resolved
> dependencies for project
> org.apache.zeppelin:zeppelin-interpreter:jar:0.5.0_incubating-SNAPSHOT
>
> Help me please
>
> Regards,
>
>
>
>
>   Le Mardi 19 mai 2015 14h08, clark djilo kuissu <dj...@yahoo.fr> a
> écrit :
>
>
> Yes it is the case. Shoul I remove this space ?
>
>
>
>   Le Mardi 19 mai 2015 14h06, moon soo Lee <mo...@apache.org> a écrit :
>
>
> Do you have space between  '-Dskip' and 'Tests' ? it looks like you have a
> space in between.
>
> Thanks,
> moon
>
> On Tue, May 19, 2015 at 8:38 PM clark djilo kuissu <dj...@yahoo.fr>
> wrote:
>
> Hi moon,
>
>    With the command you recommend to me I have this error:
>
> Unknown lifecycle phase "Tests"
> You must specify a valid lifecycle phase or a goal in the format
> <plugin-prefix>:<goal>
>
> I have this error both in windows and ubuntu
>
> What is the problem ?
>
> Regards,
>
>
>
>   Le Mardi 19 mai 2015 1h01, moon soo Lee <mo...@apache.org> a écrit :
>
>
> Hi.
>
> You can skip Test by doing
>
> mvn clean package -DskipTests
>
> And it is very recommended to use Spark 1.3.1 by doing
>
> mvn clean package -DskipTests -Pspark-1.3 -Dspark.version=1.3.1
>
> I hope these commands build Zeppelin for you without error.
>
> If you're interested, please post mailing list or jira a surefire-reports
> to investigate about test failure.
>
> Best,
> moon
>
> On Tue, May 19, 2015 at 5:34 AM clark djilo kuissu <dj...@yahoo.fr>
> wrote:
>
> I am trying to bulid Zeppelin in Cent Os in a virtual machine. I follow
> these instructions : https://github.com/apache/incubator-zeppelin
>
> sudo apt-get update
> sudo apt-get install openjdk-7-jdk
> sudo apt-get install git
> sudo apt-get install maven
> sudo apt-get install npm
>
> git clone https://github.com/apache/incubator-zeppelin.git
>
>
>
> cd incubator-zeppelin
>
> mvn clean package
>
>
> I have this error. Can someone help me please ?
>
> [image: Image en ligne]
>
>
>
>
>
>   Le Lundi 18 mai 2015 19h08, Richard Grossman <ri...@gmail.com> a
> écrit :
>
>
> Hi
>
> After switching from bad github repository I've succeed to run a command.
> So now I would like to make the tutorial I've a new notebook
> First I enter this code :
> val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
>
> case class Bank(age:Integer, job:String, marital : String, education :
> String, balance : Integer)
>
> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>     s=>Bank(s(0).toInt,
>             s(1).replaceAll("\"", ""),
>             s(2).replaceAll("\"", ""),
>             s(3).replaceAll("\"", ""),
>             s(5).replaceAll("\"", "").toInt
>         )
> )
>
> bank.toDF().registerTempTable("bank")
>
> All is OK I get result as :
> bankText: org.apache.spark.rdd.RDD[String] =
> s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at
> <console>:24 defined class Bank bank: org.apache.spark.rdd.RDD[Bank] =
> MapPartitionsRDD[4] at map at <console>:28
>
> Now I'm trying to run the sql query
> %sql select age, count(1) from bank where age < 30 group by age order by
> age
>
> The operation never end just running. In the log I can see:
>
> Initial job has not accepted any resources; check your cluster UI to
> ensure that workers are registered and have sufficient resources
>
> I can't debug the worker as the job worker URL on spark UI give an error
> on port 4040:
> app-20150518165912-0000
> <http://ec2-54-91-146-31.compute-1.amazonaws.com:8080/app?appId=app-20150518165912-0000>
> Zeppelin <http://ip-10-123-128-51.ec2.internal:4040/>166.0 GB2015/05/18
> 16:59:12ubuntuRUNNING5.5 min
> The Zepellin like is on 4040
> Could you help me to understand what going on ?
>
> Thanks
>
>
>
>
>
>
>
>
>
>

Re: Can't execute SQL on Ec2 Spark cluster

Posted by Corneau Damien <co...@gmail.com>.
Hi Richard,

Sorry that your thread was somehow taken over :)

In your case I can see that line in your code:

*val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")*

Could you try with a .csv file instead of .zip?

The example is taking the file and parsing it to make it into Table
material.
However that parser is made for a .csv file

Hope that helps

On Tue, May 19, 2015 at 11:04 PM, Richard Grossman <ri...@gmail.com>
wrote:

> Don't understand What are you talking about ?
>
> On Tue, May 19, 2015 at 4:06 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Do you have your own maven pom.xml file that has zeppelin-interpreter as
>> a dependency?
>> If so, please try use version '0.5.0-incubating-SNAPSHOT' instead of
>>  '0.5.0_incubating-SNAPSHOT'
>>
>> Thanks,
>> moon
>>
>>
>> On Tue, May 19, 2015 at 9:18 PM clark djilo kuissu <dj...@yahoo.fr>
>> wrote:
>>
>>> I removed the space andI have this error:
>>>
>>> Failed to execute goal on project zeppelin-interpreter: could not
>>> resolved dependencies for project
>>> org.apache.zeppelin:zeppelin-interpreter:jar:0.5.0_incubating-SNAPSHOT
>>>
>>> Help me please
>>>
>>> Regards,
>>>
>>>
>>>
>>>
>>>   Le Mardi 19 mai 2015 14h08, clark djilo kuissu <dj...@yahoo.fr>
>>> a écrit :
>>>
>>>
>>> Yes it is the case. Shoul I remove this space ?
>>>
>>>
>>>
>>>   Le Mardi 19 mai 2015 14h06, moon soo Lee <mo...@apache.org> a écrit :
>>>
>>>
>>> Do you have space between  '-Dskip' and 'Tests' ? it looks like you have
>>> a space in between.
>>>
>>> Thanks,
>>> moon
>>>
>>> On Tue, May 19, 2015 at 8:38 PM clark djilo kuissu <dj...@yahoo.fr>
>>> wrote:
>>>
>>> Hi moon,
>>>
>>>    With the command you recommend to me I have this error:
>>>
>>> Unknown lifecycle phase "Tests"
>>> You must specify a valid lifecycle phase or a goal in the format
>>> <plugin-prefix>:<goal>
>>>
>>> I have this error both in windows and ubuntu
>>>
>>> What is the problem ?
>>>
>>> Regards,
>>>
>>>
>>>
>>>   Le Mardi 19 mai 2015 1h01, moon soo Lee <mo...@apache.org> a écrit :
>>>
>>>
>>> Hi.
>>>
>>> You can skip Test by doing
>>>
>>> mvn clean package -DskipTests
>>>
>>> And it is very recommended to use Spark 1.3.1 by doing
>>>
>>> mvn clean package -DskipTests -Pspark-1.3 -Dspark.version=1.3.1
>>>
>>> I hope these commands build Zeppelin for you without error.
>>>
>>> If you're interested, please post mailing list or jira a
>>> surefire-reports to investigate about test failure.
>>>
>>> Best,
>>> moon
>>>
>>> On Tue, May 19, 2015 at 5:34 AM clark djilo kuissu <dj...@yahoo.fr>
>>> wrote:
>>>
>>> I am trying to bulid Zeppelin in Cent Os in a virtual machine. I follow
>>> these instructions : https://github.com/apache/incubator-zeppelin
>>>
>>> sudo apt-get update
>>> sudo apt-get install openjdk-7-jdk
>>> sudo apt-get install git
>>> sudo apt-get install maven
>>> sudo apt-get install npm
>>>
>>> git clone https://github.com/apache/incubator-zeppelin.git
>>>
>>>
>>>
>>> cd incubator-zeppelin
>>>
>>> mvn clean package
>>>
>>>
>>> I have this error. Can someone help me please ?
>>>
>>> [image: Image en ligne]
>>>
>>>
>>>
>>>
>>>
>>>   Le Lundi 18 mai 2015 19h08, Richard Grossman <ri...@gmail.com> a
>>> écrit :
>>>
>>>
>>> Hi
>>>
>>> After switching from bad github repository I've succeed to run a command.
>>> So now I would like to make the tutorial I've a new notebook
>>> First I enter this code :
>>> val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
>>>
>>> case class Bank(age:Integer, job:String, marital : String, education :
>>> String, balance : Integer)
>>>
>>> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>>>     s=>Bank(s(0).toInt,
>>>             s(1).replaceAll("\"", ""),
>>>             s(2).replaceAll("\"", ""),
>>>             s(3).replaceAll("\"", ""),
>>>             s(5).replaceAll("\"", "").toInt
>>>         )
>>> )
>>>
>>> bank.toDF().registerTempTable("bank")
>>>
>>> All is OK I get result as :
>>> bankText: org.apache.spark.rdd.RDD[String] =
>>> s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at
>>> <console>:24 defined class Bank bank: org.apache.spark.rdd.RDD[Bank] =
>>> MapPartitionsRDD[4] at map at <console>:28
>>>
>>> Now I'm trying to run the sql query
>>> %sql select age, count(1) from bank where age < 30 group by age order by
>>> age
>>>
>>> The operation never end just running. In the log I can see:
>>>
>>> Initial job has not accepted any resources; check your cluster UI to
>>> ensure that workers are registered and have sufficient resources
>>>
>>> I can't debug the worker as the job worker URL on spark UI give an error
>>> on port 4040:
>>> app-20150518165912-0000
>>> <http://ec2-54-91-146-31.compute-1.amazonaws.com:8080/app?appId=app-20150518165912-0000>
>>> Zeppelin <http://ip-10-123-128-51.ec2.internal:4040/>166.0 GB2015/05/18
>>> 16:59:12ubuntuRUNNING5.5 min
>>> The Zepellin like is on 4040
>>> Could you help me to understand what going on ?
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>

Re: Can't execute SQL on Ec2 Spark cluster

Posted by Richard Grossman <ri...@gmail.com>.
Don't understand What are you talking about ?

On Tue, May 19, 2015 at 4:06 PM, moon soo Lee <mo...@apache.org> wrote:

> Do you have your own maven pom.xml file that has zeppelin-interpreter as a
> dependency?
> If so, please try use version '0.5.0-incubating-SNAPSHOT' instead of
>  '0.5.0_incubating-SNAPSHOT'
>
> Thanks,
> moon
>
>
> On Tue, May 19, 2015 at 9:18 PM clark djilo kuissu <dj...@yahoo.fr>
> wrote:
>
>> I removed the space andI have this error:
>>
>> Failed to execute goal on project zeppelin-interpreter: could not
>> resolved dependencies for project
>> org.apache.zeppelin:zeppelin-interpreter:jar:0.5.0_incubating-SNAPSHOT
>>
>> Help me please
>>
>> Regards,
>>
>>
>>
>>
>>   Le Mardi 19 mai 2015 14h08, clark djilo kuissu <dj...@yahoo.fr>
>> a écrit :
>>
>>
>> Yes it is the case. Shoul I remove this space ?
>>
>>
>>
>>   Le Mardi 19 mai 2015 14h06, moon soo Lee <mo...@apache.org> a écrit :
>>
>>
>> Do you have space between  '-Dskip' and 'Tests' ? it looks like you have
>> a space in between.
>>
>> Thanks,
>> moon
>>
>> On Tue, May 19, 2015 at 8:38 PM clark djilo kuissu <dj...@yahoo.fr>
>> wrote:
>>
>> Hi moon,
>>
>>    With the command you recommend to me I have this error:
>>
>> Unknown lifecycle phase "Tests"
>> You must specify a valid lifecycle phase or a goal in the format
>> <plugin-prefix>:<goal>
>>
>> I have this error both in windows and ubuntu
>>
>> What is the problem ?
>>
>> Regards,
>>
>>
>>
>>   Le Mardi 19 mai 2015 1h01, moon soo Lee <mo...@apache.org> a écrit :
>>
>>
>> Hi.
>>
>> You can skip Test by doing
>>
>> mvn clean package -DskipTests
>>
>> And it is very recommended to use Spark 1.3.1 by doing
>>
>> mvn clean package -DskipTests -Pspark-1.3 -Dspark.version=1.3.1
>>
>> I hope these commands build Zeppelin for you without error.
>>
>> If you're interested, please post mailing list or jira a surefire-reports
>> to investigate about test failure.
>>
>> Best,
>> moon
>>
>> On Tue, May 19, 2015 at 5:34 AM clark djilo kuissu <dj...@yahoo.fr>
>> wrote:
>>
>> I am trying to bulid Zeppelin in Cent Os in a virtual machine. I follow
>> these instructions : https://github.com/apache/incubator-zeppelin
>>
>> sudo apt-get update
>> sudo apt-get install openjdk-7-jdk
>> sudo apt-get install git
>> sudo apt-get install maven
>> sudo apt-get install npm
>>
>> git clone https://github.com/apache/incubator-zeppelin.git
>>
>>
>>
>> cd incubator-zeppelin
>>
>> mvn clean package
>>
>>
>> I have this error. Can someone help me please ?
>>
>> [image: Image en ligne]
>>
>>
>>
>>
>>
>>   Le Lundi 18 mai 2015 19h08, Richard Grossman <ri...@gmail.com> a
>> écrit :
>>
>>
>> Hi
>>
>> After switching from bad github repository I've succeed to run a command.
>> So now I would like to make the tutorial I've a new notebook
>> First I enter this code :
>> val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
>>
>> case class Bank(age:Integer, job:String, marital : String, education :
>> String, balance : Integer)
>>
>> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>>     s=>Bank(s(0).toInt,
>>             s(1).replaceAll("\"", ""),
>>             s(2).replaceAll("\"", ""),
>>             s(3).replaceAll("\"", ""),
>>             s(5).replaceAll("\"", "").toInt
>>         )
>> )
>>
>> bank.toDF().registerTempTable("bank")
>>
>> All is OK I get result as :
>> bankText: org.apache.spark.rdd.RDD[String] =
>> s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at
>> <console>:24 defined class Bank bank: org.apache.spark.rdd.RDD[Bank] =
>> MapPartitionsRDD[4] at map at <console>:28
>>
>> Now I'm trying to run the sql query
>> %sql select age, count(1) from bank where age < 30 group by age order by
>> age
>>
>> The operation never end just running. In the log I can see:
>>
>> Initial job has not accepted any resources; check your cluster UI to
>> ensure that workers are registered and have sufficient resources
>>
>> I can't debug the worker as the job worker URL on spark UI give an error
>> on port 4040:
>> app-20150518165912-0000
>> <http://ec2-54-91-146-31.compute-1.amazonaws.com:8080/app?appId=app-20150518165912-0000>
>> Zeppelin <http://ip-10-123-128-51.ec2.internal:4040/>166.0 GB2015/05/18
>> 16:59:12ubuntuRUNNING5.5 min
>> The Zepellin like is on 4040
>> Could you help me to understand what going on ?
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

Re: Can't execute SQL on Ec2 Spark cluster

Posted by moon soo Lee <mo...@apache.org>.
Do you have your own maven pom.xml file that has zeppelin-interpreter as a
dependency?
If so, please try use version '0.5.0-incubating-SNAPSHOT' instead of
 '0.5.0_incubating-SNAPSHOT'

Thanks,
moon

On Tue, May 19, 2015 at 9:18 PM clark djilo kuissu <dj...@yahoo.fr>
wrote:

> I removed the space andI have this error:
>
> Failed to execute goal on project zeppelin-interpreter: could not resolved
> dependencies for project
> org.apache.zeppelin:zeppelin-interpreter:jar:0.5.0_incubating-SNAPSHOT
>
> Help me please
>
> Regards,
>
>
>
>
>   Le Mardi 19 mai 2015 14h08, clark djilo kuissu <dj...@yahoo.fr> a
> écrit :
>
>
> Yes it is the case. Shoul I remove this space ?
>
>
>
>   Le Mardi 19 mai 2015 14h06, moon soo Lee <mo...@apache.org> a écrit :
>
>
> Do you have space between  '-Dskip' and 'Tests' ? it looks like you have a
> space in between.
>
> Thanks,
> moon
>
> On Tue, May 19, 2015 at 8:38 PM clark djilo kuissu <dj...@yahoo.fr>
> wrote:
>
> Hi moon,
>
>    With the command you recommend to me I have this error:
>
> Unknown lifecycle phase "Tests"
> You must specify a valid lifecycle phase or a goal in the format
> <plugin-prefix>:<goal>
>
> I have this error both in windows and ubuntu
>
> What is the problem ?
>
> Regards,
>
>
>
>   Le Mardi 19 mai 2015 1h01, moon soo Lee <mo...@apache.org> a écrit :
>
>
> Hi.
>
> You can skip Test by doing
>
> mvn clean package -DskipTests
>
> And it is very recommended to use Spark 1.3.1 by doing
>
> mvn clean package -DskipTests -Pspark-1.3 -Dspark.version=1.3.1
>
> I hope these commands build Zeppelin for you without error.
>
> If you're interested, please post mailing list or jira a surefire-reports
> to investigate about test failure.
>
> Best,
> moon
>
> On Tue, May 19, 2015 at 5:34 AM clark djilo kuissu <dj...@yahoo.fr>
> wrote:
>
> I am trying to bulid Zeppelin in Cent Os in a virtual machine. I follow
> these instructions : https://github.com/apache/incubator-zeppelin
>
> sudo apt-get update
> sudo apt-get install openjdk-7-jdk
> sudo apt-get install git
> sudo apt-get install maven
> sudo apt-get install npm
>
> git clone https://github.com/apache/incubator-zeppelin.git
>
>
>
> cd incubator-zeppelin
>
> mvn clean package
>
>
> I have this error. Can someone help me please ?
>
> [image: Image en ligne]
>
>
>
>
>
>   Le Lundi 18 mai 2015 19h08, Richard Grossman <ri...@gmail.com> a
> écrit :
>
>
> Hi
>
> After switching from bad github repository I've succeed to run a command.
> So now I would like to make the tutorial I've a new notebook
> First I enter this code :
> val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
>
> case class Bank(age:Integer, job:String, marital : String, education :
> String, balance : Integer)
>
> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>     s=>Bank(s(0).toInt,
>             s(1).replaceAll("\"", ""),
>             s(2).replaceAll("\"", ""),
>             s(3).replaceAll("\"", ""),
>             s(5).replaceAll("\"", "").toInt
>         )
> )
>
> bank.toDF().registerTempTable("bank")
>
> All is OK I get result as :
> bankText: org.apache.spark.rdd.RDD[String] =
> s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at
> <console>:24 defined class Bank bank: org.apache.spark.rdd.RDD[Bank] =
> MapPartitionsRDD[4] at map at <console>:28
>
> Now I'm trying to run the sql query
> %sql select age, count(1) from bank where age < 30 group by age order by
> age
>
> The operation never end just running. In the log I can see:
>
> Initial job has not accepted any resources; check your cluster UI to
> ensure that workers are registered and have sufficient resources
>
> I can't debug the worker as the job worker URL on spark UI give an error
> on port 4040:
> app-20150518165912-0000
> <http://ec2-54-91-146-31.compute-1.amazonaws.com:8080/app?appId=app-20150518165912-0000>
> Zeppelin <http://ip-10-123-128-51.ec2.internal:4040/>166.0 GB2015/05/18
> 16:59:12ubuntuRUNNING5.5 min
> The Zepellin like is on 4040
> Could you help me to understand what going on ?
>
> Thanks
>
>
>
>
>
>
>
>
>
>

Re: Can't execute SQL on Ec2 Spark cluster

Posted by clark djilo kuissu <dj...@yahoo.fr>.
I removed the space andI have this error:
Failed to execute goal on project zeppelin-interpreter: could not resolved dependencies for project org.apache.zeppelin:zeppelin-interpreter:jar:0.5.0_incubating-SNAPSHOT
Help me please 

Regards,
 


     Le Mardi 19 mai 2015 14h08, clark djilo kuissu <dj...@yahoo.fr> a écrit :
   

 Yes it is the case. Shoul I remove this space ?
 


     Le Mardi 19 mai 2015 14h06, moon soo Lee <mo...@apache.org> a écrit :
   

 Do you have space between  '-Dskip' and 'Tests' ? it looks like you have a space in between.
Thanks,moon

On Tue, May 19, 2015 at 8:38 PM clark djilo kuissu <dj...@yahoo.fr> wrote:

Hi moon,
   With the command you recommend to me I have this error:
Unknown lifecycle phase "Tests"You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal>
I have this error both in windows and ubuntu
What is the problem ?
Regards, 


     Le Mardi 19 mai 2015 1h01, moon soo Lee <mo...@apache.org> a écrit :
   

 Hi.
You can skip Test by doing 
mvn clean package -DskipTests
And it is very recommended to use Spark 1.3.1 by doing 
mvn clean package -DskipTests -Pspark-1.3 -Dspark.version=1.3.1 I hope these commands build Zeppelin for you without error.
If you're interested, please post mailing list or jira a surefire-reports to investigate about test failure.
Best,moon
On Tue, May 19, 2015 at 5:34 AM clark djilo kuissu <dj...@yahoo.fr> wrote:

I am trying to bulid Zeppelin in Cent Os in a virtual machine. I follow these instructions : https://github.com/apache/incubator-zeppelin
sudo apt-get update
sudo apt-get install openjdk-7-jdk
sudo apt-get install git
sudo apt-get install maven
sudo apt-get install npmgit clone https://github.com/apache/incubator-zeppelin.git  cd incubator-zeppelinmvn clean package
I have this error. Can someone help me please ?

   


     Le Lundi 18 mai 2015 19h08, Richard Grossman <ri...@gmail.com> a écrit :
   

 Hi
After switching from bad github repository I've succeed to run a command.So now I would like to make the tutorial I've a new notebook First I enter this code :val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(    s=>Bank(s(0).toInt,             s(1).replaceAll("\"", ""),            s(2).replaceAll("\"", ""),            s(3).replaceAll("\"", ""),            s(5).replaceAll("\"", "").toInt        ))
bank.toDF().registerTempTable("bank")
All is OK I get result as :bankText: org.apache.spark.rdd.RDD[String] = s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at <console>:24defined class Bankbank: org.apache.spark.rdd.RDD[Bank] = MapPartitionsRDD[4] at map at <console>:28

Now I'm trying to run the sql query%sql select age, count(1) from bank where age < 30 group by age order by age

The operation never end just running. In the log I can see:
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I can't debug the worker as the job worker URL on spark UI give an error on port 4040:
| app-20150518165912-0000 | Zeppelin | 16 | 6.0 GB | 2015/05/18 16:59:12 | ubuntu | RUNNING | 5.5 min |

The Zepellin like is on 4040 Could you help me to understand what going on ?
Thanks


   


 


   

  

Re: Can't execute SQL on Ec2 Spark cluster

Posted by clark djilo kuissu <dj...@yahoo.fr>.
Yes it is the case. Shoul I remove this space ?
 


     Le Mardi 19 mai 2015 14h06, moon soo Lee <mo...@apache.org> a écrit :
   

 Do you have space between  '-Dskip' and 'Tests' ? it looks like you have a space in between.
Thanks,moon

On Tue, May 19, 2015 at 8:38 PM clark djilo kuissu <dj...@yahoo.fr> wrote:

Hi moon,
   With the command you recommend to me I have this error:
Unknown lifecycle phase "Tests"You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal>
I have this error both in windows and ubuntu
What is the problem ?
Regards, 


     Le Mardi 19 mai 2015 1h01, moon soo Lee <mo...@apache.org> a écrit :
   

 Hi.
You can skip Test by doing 
mvn clean package -DskipTests
And it is very recommended to use Spark 1.3.1 by doing 
mvn clean package -DskipTests -Pspark-1.3 -Dspark.version=1.3.1 I hope these commands build Zeppelin for you without error.
If you're interested, please post mailing list or jira a surefire-reports to investigate about test failure.
Best,moon
On Tue, May 19, 2015 at 5:34 AM clark djilo kuissu <dj...@yahoo.fr> wrote:

I am trying to bulid Zeppelin in Cent Os in a virtual machine. I follow these instructions : https://github.com/apache/incubator-zeppelin
sudo apt-get update
sudo apt-get install openjdk-7-jdk
sudo apt-get install git
sudo apt-get install maven
sudo apt-get install npmgit clone https://github.com/apache/incubator-zeppelin.git  cd incubator-zeppelinmvn clean package
I have this error. Can someone help me please ?

   


     Le Lundi 18 mai 2015 19h08, Richard Grossman <ri...@gmail.com> a écrit :
   

 Hi
After switching from bad github repository I've succeed to run a command.So now I would like to make the tutorial I've a new notebook First I enter this code :val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(    s=>Bank(s(0).toInt,             s(1).replaceAll("\"", ""),            s(2).replaceAll("\"", ""),            s(3).replaceAll("\"", ""),            s(5).replaceAll("\"", "").toInt        ))
bank.toDF().registerTempTable("bank")
All is OK I get result as :bankText: org.apache.spark.rdd.RDD[String] = s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at <console>:24defined class Bankbank: org.apache.spark.rdd.RDD[Bank] = MapPartitionsRDD[4] at map at <console>:28

Now I'm trying to run the sql query%sql select age, count(1) from bank where age < 30 group by age order by age

The operation never end just running. In the log I can see:
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I can't debug the worker as the job worker URL on spark UI give an error on port 4040:
| app-20150518165912-0000 | Zeppelin | 16 | 6.0 GB | 2015/05/18 16:59:12 | ubuntu | RUNNING | 5.5 min |

The Zepellin like is on 4040 Could you help me to understand what going on ?
Thanks


   


 


  

Re: Can't execute SQL on Ec2 Spark cluster

Posted by moon soo Lee <mo...@apache.org>.
Do you have space between  '-Dskip' and 'Tests' ? it looks like you have a
space in between.

Thanks,
moon

On Tue, May 19, 2015 at 8:38 PM clark djilo kuissu <dj...@yahoo.fr>
wrote:

> Hi moon,
>
>    With the command you recommend to me I have this error:
>
> Unknown lifecycle phase "Tests"
> You must specify a valid lifecycle phase or a goal in the format
> <plugin-prefix>:<goal>
>
> I have this error both in windows and ubuntu
>
> What is the problem ?
>
> Regards,
>
>
>
>   Le Mardi 19 mai 2015 1h01, moon soo Lee <mo...@apache.org> a écrit :
>
>
> Hi.
>
> You can skip Test by doing
>
> mvn clean package -DskipTests
>
> And it is very recommended to use Spark 1.3.1 by doing
>
> mvn clean package -DskipTests -Pspark-1.3 -Dspark.version=1.3.1
>
> I hope these commands build Zeppelin for you without error.
>
> If you're interested, please post mailing list or jira a surefire-reports
> to investigate about test failure.
>
> Best,
> moon
>
> On Tue, May 19, 2015 at 5:34 AM clark djilo kuissu <dj...@yahoo.fr>
> wrote:
>
> I am trying to bulid Zeppelin in Cent Os in a virtual machine. I follow
> these instructions : https://github.com/apache/incubator-zeppelin
>
> sudo apt-get update
> sudo apt-get install openjdk-7-jdk
> sudo apt-get install git
> sudo apt-get install maven
> sudo apt-get install npm
>
> git clone https://github.com/apache/incubator-zeppelin.git
>
>
>
> cd incubator-zeppelin
>
> mvn clean package
>
>
> I have this error. Can someone help me please ?
>
> [image: Image en ligne]
>
>
>
>
>
>   Le Lundi 18 mai 2015 19h08, Richard Grossman <ri...@gmail.com> a
> écrit :
>
>
> Hi
>
> After switching from bad github repository I've succeed to run a command.
> So now I would like to make the tutorial I've a new notebook
> First I enter this code :
> val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
>
> case class Bank(age:Integer, job:String, marital : String, education :
> String, balance : Integer)
>
> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>     s=>Bank(s(0).toInt,
>             s(1).replaceAll("\"", ""),
>             s(2).replaceAll("\"", ""),
>             s(3).replaceAll("\"", ""),
>             s(5).replaceAll("\"", "").toInt
>         )
> )
>
> bank.toDF().registerTempTable("bank")
>
> All is OK I get result as :
> bankText: org.apache.spark.rdd.RDD[String] =
> s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at
> <console>:24 defined class Bank bank: org.apache.spark.rdd.RDD[Bank] =
> MapPartitionsRDD[4] at map at <console>:28
>
> Now I'm trying to run the sql query
> %sql select age, count(1) from bank where age < 30 group by age order by
> age
>
> The operation never end just running. In the log I can see:
>
> Initial job has not accepted any resources; check your cluster UI to
> ensure that workers are registered and have sufficient resources
>
> I can't debug the worker as the job worker URL on spark UI give an error
> on port 4040:
> app-20150518165912-0000
> <http://ec2-54-91-146-31.compute-1.amazonaws.com:8080/app?appId=app-20150518165912-0000>
> Zeppelin <http://ip-10-123-128-51.ec2.internal:4040/>166.0 GB2015/05/18
> 16:59:12ubuntuRUNNING5.5 min
> The Zepellin like is on 4040
> Could you help me to understand what going on ?
>
> Thanks
>
>
>
>
>
>

Re: Can't execute SQL on Ec2 Spark cluster

Posted by clark djilo kuissu <dj...@yahoo.fr>.
Hi moon,
   With the command you recommend to me I have this error:
Unknown lifecycle phase "Tests"You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal>
I have this error both in windows and ubuntu
What is the problem ?
Regards, 


     Le Mardi 19 mai 2015 1h01, moon soo Lee <mo...@apache.org> a écrit :
   

 Hi.
You can skip Test by doing 
mvn clean package -DskipTests
And it is very recommended to use Spark 1.3.1 by doing 
mvn clean package -DskipTests -Pspark-1.3 -Dspark.version=1.3.1 I hope these commands build Zeppelin for you without error.
If you're interested, please post mailing list or jira a surefire-reports to investigate about test failure.
Best,moon
On Tue, May 19, 2015 at 5:34 AM clark djilo kuissu <dj...@yahoo.fr> wrote:

I am trying to bulid Zeppelin in Cent Os in a virtual machine. I follow these instructions : https://github.com/apache/incubator-zeppelin
sudo apt-get update
sudo apt-get install openjdk-7-jdk
sudo apt-get install git
sudo apt-get install maven
sudo apt-get install npmgit clone https://github.com/apache/incubator-zeppelin.git  cd incubator-zeppelinmvn clean package
I have this error. Can someone help me please ?

   


     Le Lundi 18 mai 2015 19h08, Richard Grossman <ri...@gmail.com> a écrit :
   

 Hi
After switching from bad github repository I've succeed to run a command.So now I would like to make the tutorial I've a new notebook First I enter this code :val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(    s=>Bank(s(0).toInt,             s(1).replaceAll("\"", ""),            s(2).replaceAll("\"", ""),            s(3).replaceAll("\"", ""),            s(5).replaceAll("\"", "").toInt        ))
bank.toDF().registerTempTable("bank")
All is OK I get result as :bankText: org.apache.spark.rdd.RDD[String] = s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at <console>:24defined class Bankbank: org.apache.spark.rdd.RDD[Bank] = MapPartitionsRDD[4] at map at <console>:28

Now I'm trying to run the sql query%sql select age, count(1) from bank where age < 30 group by age order by age

The operation never end just running. In the log I can see:
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I can't debug the worker as the job worker URL on spark UI give an error on port 4040:
| app-20150518165912-0000 | Zeppelin | 16 | 6.0 GB | 2015/05/18 16:59:12 | ubuntu | RUNNING | 5.5 min |

The Zepellin like is on 4040 Could you help me to understand what going on ?
Thanks


   


  

Re: Can't execute SQL on Ec2 Spark cluster

Posted by moon soo Lee <mo...@apache.org>.
Hi.

You can skip Test by doing

mvn clean package -DskipTests

And it is very recommended to use Spark 1.3.1 by doing

mvn clean package -DskipTests -Pspark-1.3 -Dspark.version=1.3.1

I hope these commands build Zeppelin for you without error.

If you're interested, please post mailing list or jira a surefire-reports
to investigate about test failure.

Best,
moon

On Tue, May 19, 2015 at 5:34 AM clark djilo kuissu <dj...@yahoo.fr>
wrote:

> I am trying to bulid Zeppelin in Cent Os in a virtual machine. I follow
> these instructions : https://github.com/apache/incubator-zeppelin
>
> sudo apt-get update
> sudo apt-get install openjdk-7-jdk
> sudo apt-get install git
> sudo apt-get install maven
> sudo apt-get install npm
>
> git clone https://github.com/apache/incubator-zeppelin.git
>
>
>
> cd incubator-zeppelin
>
> mvn clean package
>
>
> I have this error. Can someone help me please ?
>
> [image: Image en ligne]
>
>
>
>
>
>   Le Lundi 18 mai 2015 19h08, Richard Grossman <ri...@gmail.com> a
> écrit :
>
>
> Hi
>
> After switching from bad github repository I've succeed to run a command.
> So now I would like to make the tutorial I've a new notebook
> First I enter this code :
> val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
>
> case class Bank(age:Integer, job:String, marital : String, education :
> String, balance : Integer)
>
> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>     s=>Bank(s(0).toInt,
>             s(1).replaceAll("\"", ""),
>             s(2).replaceAll("\"", ""),
>             s(3).replaceAll("\"", ""),
>             s(5).replaceAll("\"", "").toInt
>         )
> )
>
> bank.toDF().registerTempTable("bank")
>
> All is OK I get result as :
> bankText: org.apache.spark.rdd.RDD[String] =
> s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at
> <console>:24 defined class Bank bank: org.apache.spark.rdd.RDD[Bank] =
> MapPartitionsRDD[4] at map at <console>:28
>
> Now I'm trying to run the sql query
> %sql select age, count(1) from bank where age < 30 group by age order by
> age
>
> The operation never end just running. In the log I can see:
>
> Initial job has not accepted any resources; check your cluster UI to
> ensure that workers are registered and have sufficient resources
>
> I can't debug the worker as the job worker URL on spark UI give an error
> on port 4040:
> app-20150518165912-0000
> <http://ec2-54-91-146-31.compute-1.amazonaws.com:8080/app?appId=app-20150518165912-0000>
> Zeppelin <http://ip-10-123-128-51.ec2.internal:4040/>166.0 GB2015/05/18
> 16:59:12ubuntuRUNNING5.5 min
> The Zepellin like is on 4040
> Could you help me to understand what going on ?
>
> Thanks
>
>
>
>

Re: Can't execute SQL on Ec2 Spark cluster

Posted by clark djilo kuissu <dj...@yahoo.fr>.
I am trying to bulid Zeppelin in Cent Os in a virtual machine. I follow these instructions : https://github.com/apache/incubator-zeppelin
sudo apt-get update
sudo apt-get install openjdk-7-jdk
sudo apt-get install git
sudo apt-get install maven
sudo apt-get install npmgit clone https://github.com/apache/incubator-zeppelin.git  cd incubator-zeppelinmvn clean package
I have this error. Can someone help me please ?

   


     Le Lundi 18 mai 2015 19h08, Richard Grossman <ri...@gmail.com> a écrit :
   

 Hi
After switching from bad github repository I've succeed to run a command.So now I would like to make the tutorial I've a new notebook First I enter this code :val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(    s=>Bank(s(0).toInt,             s(1).replaceAll("\"", ""),            s(2).replaceAll("\"", ""),            s(3).replaceAll("\"", ""),            s(5).replaceAll("\"", "").toInt        ))
bank.toDF().registerTempTable("bank")
All is OK I get result as :bankText: org.apache.spark.rdd.RDD[String] = s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at <console>:24defined class Bankbank: org.apache.spark.rdd.RDD[Bank] = MapPartitionsRDD[4] at map at <console>:28

Now I'm trying to run the sql query%sql select age, count(1) from bank where age < 30 group by age order by age

The operation never end just running. In the log I can see:
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I can't debug the worker as the job worker URL on spark UI give an error on port 4040:
| app-20150518165912-0000 | Zeppelin | 16 | 6.0 GB | 2015/05/18 16:59:12 | ubuntu | RUNNING | 5.5 min |

The Zepellin like is on 4040 Could you help me to understand what going on ?
Thanks


  

Re: Can't execute SQL on Ec2 Spark cluster

Posted by moon soo Lee <mo...@apache.org>.
Hi,

it looks like your spark workers are not connected to spark master, Or
resources Zeppelin trying to use is bigger than your spark cluster has.

Could you check Spark master UI? it displays enough information that how
many resources (cpu, memory) spark cluster have, how many resources
Zeppelin trying to use it.

Thanks,
moon

On Tue, May 19, 2015 at 2:07 AM Richard Grossman <ri...@gmail.com>
wrote:

> Hi
>
> After switching from bad github repository I've succeed to run a command.
> So now I would like to make the tutorial I've a new notebook
> First I enter this code :
> val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
>
> case class Bank(age:Integer, job:String, marital : String, education :
> String, balance : Integer)
>
> val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
>     s=>Bank(s(0).toInt,
>             s(1).replaceAll("\"", ""),
>             s(2).replaceAll("\"", ""),
>             s(3).replaceAll("\"", ""),
>             s(5).replaceAll("\"", "").toInt
>         )
> )
>
> bank.toDF().registerTempTable("bank")
>
> All is OK I get result as :
> bankText: org.apache.spark.rdd.RDD[String] =
> s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at
> <console>:24 defined class Bank bank: org.apache.spark.rdd.RDD[Bank] =
> MapPartitionsRDD[4] at map at <console>:28
>
> Now I'm trying to run the sql query
> %sql select age, count(1) from bank where age < 30 group by age order by
> age
>
> The operation never end just running. In the log I can see:
>
> Initial job has not accepted any resources; check your cluster UI to
> ensure that workers are registered and have sufficient resources
>
> I can't debug the worker as the job worker URL on spark UI give an error
> on port 4040:
> app-20150518165912-0000
> <http://ec2-54-91-146-31.compute-1.amazonaws.com:8080/app?appId=app-20150518165912-0000>
> Zeppelin <http://ip-10-123-128-51.ec2.internal:4040/>166.0 GB2015/05/18
> 16:59:12ubuntuRUNNING5.5 min
> The Zepellin like is on 4040
> Could you help me to understand what going on ?
>
> Thanks
>
>