You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by qingyang li <li...@gmail.com> on 2014/03/20 02:57:44 UTC

Shark does not give any results with SELECT count(*) command

Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select * from
src , i can get result, but when i run select count(*) from src or select *
from src limit 1,  there is no result output.

i have found similiar problem on google groups:
https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
but , there is no solution on it.

Does anyone encounter such problem?

Re: Shark does not give any results with SELECT count(*) command

Posted by qingyang li <li...@gmail.com>.

spark is deloyed on bigdata001 bigdata002 bigdata003 bigdata004
 bigdata001 is master
i have also copied shark's files on the four machines.
when i run " select count(*) from b " on bigdata003's shark shell
"bin/shark" , i could get the result.
but when i run "select count(*) from b" on other nodes's shark shell
"bin/shark",  i can not get the result.

it seems the result has been sent to bigdata003,
i have found such log on bigdata003:

14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection
from [bigdata001/192.168.1.101]
14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection
from [bigdata002/192.168.1.102]
14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection
from [bigdata004/192.168.1.104]

and also found such log on bigdata004 002 001:

09/01/13 09:32:29 INFO network.ConnectionManager: Accepted connection
from [bigdata003/192.168.1.103]
09/01/13 09:32:29 INFO network.SendingConnection: Initiating
connection to [bigdata003/192.168.1.103:39848]
09/01/13 09:32:29 INFO network.SendingConnection: Connected to
[bigdata003/192.168.1.103:39848], 1 messages pending





2014-03-25 16:19 GMT+08:00 qingyang li <li...@gmail.com>:

> reopen this thread because i encounter this problem again.
> Here is my env:
> scala 2.10.3 s
> spark 0.9.0    tandalone mode
> shark 0.9.0    downlaod the source code and build by myself
> hive hive-shark-0.11
> I have copied hive-site.xml from my hadoop cluster , it's hive version is
> 0.12,  after copied , i deleted some attributes from hive-site.xml
>
> When run select count(*) from xxx, no resut and no errors output.
>
> Can someone give me some suggestions to debug ?
>
>
>
>
>
> 2014-03-20 11:27 GMT+08:00 qingyang li <li...@gmail.com>:
>
> have found the cause , my problem is :
>> the style of file salves is not correct, so the task only be run on
>> master.
>>
>> explain here to help other guy who also encounter similiar problem.
>>
>>
>> 2014-03-20 9:57 GMT+08:00 qingyang li <li...@gmail.com>:
>>
>> Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select *
>>> from src , i can get result, but when i run select count(*) from src or
>>> select * from src limit 1,  there is no result output.
>>>
>>> i have found similiar problem on google groups:
>>>
>>> https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
>>> but , there is no solution on it.
>>>
>>> Does anyone encounter such problem?
>>>
>>
>>
>

Re: Shark does not give any results with SELECT count(*) command

Posted by qingyang li <li...@gmail.com>.

i have found such log on bigdata003:

14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection
from [bigdata001/192.168.1.101]
14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection
from [bigdata002/192.168.1.102]
14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection
from [bigdata004/192.168.1.104]

and also found such log on bigdata004 002 001:

09/01/13 09:32:29 INFO network.ConnectionManager: Accepted connection
from [bigdata003/192.168.1.103]
09/01/13 09:32:29 INFO network.SendingConnection: Initiating
connection to [bigdata003/192.168.1.103:39848]
09/01/13 09:32:29 INFO network.SendingConnection: Connected to
[bigdata003/192.168.1.103:39848], 1 messages pending



from the log, it seems bigdata003 becomes master, but i config
bigdata001 as master.

Another clue ：

sometimes, after i re-start spark cluster, it becomes ok again, i can
get result on bigdata001, but fail on bigdata003,

so, if spark choose one node randomly to store the result?


if i did not say the problem clearly, please let me know. thanks.



2014-03-26 16:55 GMT+08:00 qingyang li <li...@gmail.com>:

> hi, Praveen, I can start server on bigdata001 using "/bin/shark --service
> sharkserver", i  can also connect this server using  "./bin/shark -h
> bigdata001" .
> but, the problem still there:
> run "select count(*) from  b " on  bigdata001, no result , no error.
> run "select count(*) from  b " on  bigdata002, no result , no error.
> run "select count(*) from  b " on  bigdata004, no result , no error.
> run "select count(*) from  b " on  bigdata003, have result.
>
>
> 2014-03-26 15:49 GMT+08:00 Praveen R <pr...@sigmoidanalytics.com>:
>
> Oh k. You must be running shark server on bigdata001 to use it from other
>> machines.
>>  ./bin/shark --service sharkserver  # runs shark server on port 10000
>>
>> You could connect to shark server as ./bin/shark -h <bigdata001>, this
>> should work unless there is a firewall blocking it. You might use telnet
>> bigdata001 10000 from bigdata003 to check if port is accessible. Hope that
>> helps.
>>
>>
>> On Wed, Mar 26, 2014 at 12:57 PM, qingyang li <li...@gmail.com>wrote:
>>
>>> hi, Praveen, thanks for replying.
>>>
>>> I am using hive-0.11 which comes from amplab,  at the begining , the
>>> hive-site.xml of amplab is empty, so , i copy one hive-site.xml from my
>>> cluster and then remove some attributes and aslo add some atrributs.
>>>
>>> i think it is not the reason for my problem,
>>> i think the reason is shark is runing on local mode , not cluster mode,
>>> when i run bin/shark on bigdata001, it certainly can not get the result
>>> which exist on bigdata003.   while i run bin/shark on bigdata003, i can get
>>> result.
>>>
>>> though it is the reason, i still can not understand why the result is on
>>> bigdata003(master is bigdata001)?
>>>
>>>
>>>
>>>
>>> 2014-03-25 18:41 GMT+08:00 Praveen R <pr...@mobipulse.in>:
>>>
>>> Hi Qingyang Li,
>>>>
>>>> Shark-0.9.0 uses a patched version of hive-0.11 and using
>>>> configuration/metastore of hive-0.12 could be incompatible.
>>>>
>>>> May I know the reason you are using hive-site.xml from previous hive
>>>> version(to use existing metastore?). You might just leave hive-site.xml
>>>> blank, otherwise. Something like this:
>>>> <?xml version="1.0"?>
>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>> <configuration>
>>>> </configuration>
>>>>
>>>> In any case you could run ./bin/shark-withdebug for any errors.
>>>>
>>>> Regards,
>>>> Praveen
>>>>
>>>> On 25-Mar-2014, at 1:49 pm, qingyang li <li...@gmail.com>
>>>> wrote:
>>>>
>>>> reopen this thread because i encounter this problem again.
>>>> Here is my env:
>>>> scala 2.10.3 s
>>>> spark 0.9.0    tandalone mode
>>>> shark 0.9.0    downlaod the source code and build by myself
>>>> hive hive-shark-0.11
>>>> I have copied hive-site.xml from my hadoop cluster , it's hive version
>>>> is 0.12,  after copied , i deleted some attributes from hive-site.xml
>>>>
>>>> When run select count(*) from xxx, no resut and no errors output.
>>>>
>>>> Can someone give me some suggestions to debug ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2014-03-20 11:27 GMT+08:00 qingyang li <li...@gmail.com>:
>>>>
>>>>> have found the cause , my problem is :
>>>>> the style of file salves is not correct, so the task only be run on
>>>>> master.
>>>>>
>>>>> explain here to help other guy who also encounter similiar problem.
>>>>>
>>>>>
>>>>> 2014-03-20 9:57 GMT+08:00 qingyang li <li...@gmail.com>:
>>>>>
>>>>> Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select *
>>>>>> from src , i can get result, but when i run select count(*) from src or
>>>>>> select * from src limit 1,  there is no result output.
>>>>>>
>>>>>> i have found similiar problem on google groups:
>>>>>>
>>>>>> https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
>>>>>> but , there is no solution on it.
>>>>>>
>>>>>> Does anyone encounter such problem?
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Shark does not give any results with SELECT count(*) command

Posted by qingyang li <li...@gmail.com>.

hi, Praveen, I can start server on bigdata001 using "/bin/shark --service
sharkserver", i  can also connect this server using  "./bin/shark -h
bigdata001" .
but, the problem still there:
run "select count(*) from  b " on  bigdata001, no result , no error.
run "select count(*) from  b " on  bigdata002, no result , no error.
run "select count(*) from  b " on  bigdata004, no result , no error.
run "select count(*) from  b " on  bigdata003, have result.


2014-03-26 15:49 GMT+08:00 Praveen R <pr...@sigmoidanalytics.com>:

> Oh k. You must be running shark server on bigdata001 to use it from other
> machines.
> ./bin/shark --service sharkserver  # runs shark server on port 10000
>
> You could connect to shark server as ./bin/shark -h <bigdata001>, this
> should work unless there is a firewall blocking it. You might use telnet
> bigdata001 10000 from bigdata003 to check if port is accessible. Hope that
> helps.
>
>
> On Wed, Mar 26, 2014 at 12:57 PM, qingyang li <li...@gmail.com>wrote:
>
>> hi, Praveen, thanks for replying.
>>
>> I am using hive-0.11 which comes from amplab,  at the begining , the
>> hive-site.xml of amplab is empty, so , i copy one hive-site.xml from my
>> cluster and then remove some attributes and aslo add some atrributs.
>>
>> i think it is not the reason for my problem,
>> i think the reason is shark is runing on local mode , not cluster mode,
>> when i run bin/shark on bigdata001, it certainly can not get the result
>> which exist on bigdata003.   while i run bin/shark on bigdata003, i can get
>> result.
>>
>> though it is the reason, i still can not understand why the result is on
>> bigdata003(master is bigdata001)?
>>
>>
>>
>>
>> 2014-03-25 18:41 GMT+08:00 Praveen R <pr...@mobipulse.in>:
>>
>> Hi Qingyang Li,
>>>
>>> Shark-0.9.0 uses a patched version of hive-0.11 and using
>>> configuration/metastore of hive-0.12 could be incompatible.
>>>
>>> May I know the reason you are using hive-site.xml from previous hive
>>> version(to use existing metastore?). You might just leave hive-site.xml
>>> blank, otherwise. Something like this:
>>> <?xml version="1.0"?>
>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>> <configuration>
>>> </configuration>
>>>
>>> In any case you could run ./bin/shark-withdebug for any errors.
>>>
>>> Regards,
>>> Praveen
>>>
>>> On 25-Mar-2014, at 1:49 pm, qingyang li <li...@gmail.com>
>>> wrote:
>>>
>>> reopen this thread because i encounter this problem again.
>>> Here is my env:
>>> scala 2.10.3 s
>>> spark 0.9.0    tandalone mode
>>> shark 0.9.0    downlaod the source code and build by myself
>>> hive hive-shark-0.11
>>> I have copied hive-site.xml from my hadoop cluster , it's hive version
>>> is 0.12,  after copied , i deleted some attributes from hive-site.xml
>>>
>>> When run select count(*) from xxx, no resut and no errors output.
>>>
>>> Can someone give me some suggestions to debug ?
>>>
>>>
>>>
>>>
>>>
>>> 2014-03-20 11:27 GMT+08:00 qingyang li <li...@gmail.com>:
>>>
>>>> have found the cause , my problem is :
>>>> the style of file salves is not correct, so the task only be run on
>>>> master.
>>>>
>>>> explain here to help other guy who also encounter similiar problem.
>>>>
>>>>
>>>> 2014-03-20 9:57 GMT+08:00 qingyang li <li...@gmail.com>:
>>>>
>>>> Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select *
>>>>> from src , i can get result, but when i run select count(*) from src or
>>>>> select * from src limit 1,  there is no result output.
>>>>>
>>>>> i have found similiar problem on google groups:
>>>>>
>>>>> https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
>>>>> but , there is no solution on it.
>>>>>
>>>>> Does anyone encounter such problem?
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: Shark does not give any results with SELECT count(*) command

Posted by Praveen R <pr...@sigmoidanalytics.com>.

Oh k. You must be running shark server on bigdata001 to use it from other
machines.
./bin/shark --service sharkserver  # runs shark server on port 10000

You could connect to shark server as ./bin/shark -h <bigdata001>, this
should work unless there is a firewall blocking it. You might use telnet
bigdata001 10000 from bigdata003 to check if port is accessible. Hope that
helps.


On Wed, Mar 26, 2014 at 12:57 PM, qingyang li <li...@gmail.com>wrote:

> hi, Praveen, thanks for replying.
>
> I am using hive-0.11 which comes from amplab,  at the begining , the
> hive-site.xml of amplab is empty, so , i copy one hive-site.xml from my
> cluster and then remove some attributes and aslo add some atrributs.
>
> i think it is not the reason for my problem,
> i think the reason is shark is runing on local mode , not cluster mode,
> when i run bin/shark on bigdata001, it certainly can not get the result
> which exist on bigdata003.   while i run bin/shark on bigdata003, i can get
> result.
>
> though it is the reason, i still can not understand why the result is on
> bigdata003(master is bigdata001)?
>
>
>
>
> 2014-03-25 18:41 GMT+08:00 Praveen R <pr...@mobipulse.in>:
>
> Hi Qingyang Li,
>>
>> Shark-0.9.0 uses a patched version of hive-0.11 and using
>> configuration/metastore of hive-0.12 could be incompatible.
>>
>> May I know the reason you are using hive-site.xml from previous hive
>> version(to use existing metastore?). You might just leave hive-site.xml
>> blank, otherwise. Something like this:
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> <configuration>
>> </configuration>
>>
>> In any case you could run ./bin/shark-withdebug for any errors.
>>
>> Regards,
>> Praveen
>>
>> On 25-Mar-2014, at 1:49 pm, qingyang li <li...@gmail.com> wrote:
>>
>> reopen this thread because i encounter this problem again.
>> Here is my env:
>> scala 2.10.3 s
>> spark 0.9.0    tandalone mode
>> shark 0.9.0    downlaod the source code and build by myself
>> hive hive-shark-0.11
>> I have copied hive-site.xml from my hadoop cluster , it's hive version is
>> 0.12,  after copied , i deleted some attributes from hive-site.xml
>>
>> When run select count(*) from xxx, no resut and no errors output.
>>
>> Can someone give me some suggestions to debug ?
>>
>>
>>
>>
>>
>> 2014-03-20 11:27 GMT+08:00 qingyang li <li...@gmail.com>:
>>
>>> have found the cause , my problem is :
>>> the style of file salves is not correct, so the task only be run on
>>> master.
>>>
>>> explain here to help other guy who also encounter similiar problem.
>>>
>>>
>>> 2014-03-20 9:57 GMT+08:00 qingyang li <li...@gmail.com>:
>>>
>>> Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select *
>>>> from src , i can get result, but when i run select count(*) from src or
>>>> select * from src limit 1,  there is no result output.
>>>>
>>>> i have found similiar problem on google groups:
>>>>
>>>> https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
>>>> but , there is no solution on it.
>>>>
>>>> Does anyone encounter such problem?
>>>>
>>>
>>>
>>
>>
>

Re: Shark does not give any results with SELECT count(*) command

Posted by qingyang li <li...@gmail.com>.

hi, Praveen, thanks for replying.

I am using hive-0.11 which comes from amplab,  at the begining , the
hive-site.xml of amplab is empty, so , i copy one hive-site.xml from my
cluster and then remove some attributes and aslo add some atrributs.

i think it is not the reason for my problem,
i think the reason is shark is runing on local mode , not cluster mode,
when i run bin/shark on bigdata001, it certainly can not get the result
which exist on bigdata003.   while i run bin/shark on bigdata003, i can get
result.

though it is the reason, i still can not understand why the result is on
bigdata003(master is bigdata001)?




2014-03-25 18:41 GMT+08:00 Praveen R <pr...@mobipulse.in>:

> Hi Qingyang Li,
>
> Shark-0.9.0 uses a patched version of hive-0.11 and using
> configuration/metastore of hive-0.12 could be incompatible.
>
> May I know the reason you are using hive-site.xml from previous hive
> version(to use existing metastore?). You might just leave hive-site.xml
> blank, otherwise. Something like this:
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <configuration>
> </configuration>
>
> In any case you could run ./bin/shark-withdebug for any errors.
>
> Regards,
> Praveen
>
> On 25-Mar-2014, at 1:49 pm, qingyang li <li...@gmail.com> wrote:
>
> reopen this thread because i encounter this problem again.
> Here is my env:
> scala 2.10.3 s
> spark 0.9.0    tandalone mode
> shark 0.9.0    downlaod the source code and build by myself
> hive hive-shark-0.11
> I have copied hive-site.xml from my hadoop cluster , it's hive version is
> 0.12,  after copied , i deleted some attributes from hive-site.xml
>
> When run select count(*) from xxx, no resut and no errors output.
>
> Can someone give me some suggestions to debug ?
>
>
>
>
>
> 2014-03-20 11:27 GMT+08:00 qingyang li <li...@gmail.com>:
>
>> have found the cause , my problem is :
>> the style of file salves is not correct, so the task only be run on
>> master.
>>
>> explain here to help other guy who also encounter similiar problem.
>>
>>
>> 2014-03-20 9:57 GMT+08:00 qingyang li <li...@gmail.com>:
>>
>> Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select *
>>> from src , i can get result, but when i run select count(*) from src or
>>> select * from src limit 1,  there is no result output.
>>>
>>> i have found similiar problem on google groups:
>>>
>>> https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
>>> but , there is no solution on it.
>>>
>>> Does anyone encounter such problem?
>>>
>>
>>
>
>

Re: Shark does not give any results with SELECT count(*) command

Posted by Praveen R <pr...@mobipulse.in>.

Hi Qingyang Li,

Shark-0.9.0 uses a patched version of hive-0.11 and using configuration/metastore of hive-0.12 could be incompatible.

May I know the reason you are using hive-site.xml from previous hive version(to use existing metastore?). You might just leave hive-site.xml blank, otherwise. Something like this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>

In any case you could run ./bin/shark-withdebug for any errors.

Regards,
Praveen

On 25-Mar-2014, at 1:49 pm, qingyang li <li...@gmail.com> wrote:

> reopen this thread because i encounter this problem again.
> Here is my env:
> scala 2.10.3 s  
> spark 0.9.0    tandalone mode
> shark 0.9.0    downlaod the source code and build by myself
> hive hive-shark-0.11
> I have copied hive-site.xml from my hadoop cluster , it's hive version is 0.12,  after copied , i deleted some attributes from hive-site.xml 
> 
> When run select count(*) from xxx, no resut and no errors output.
> 
> Can someone give me some suggestions to debug ?
> 
> 
> 
> 
> 
> 2014-03-20 11:27 GMT+08:00 qingyang li <li...@gmail.com>:
> have found the cause , my problem is :  
> the style of file salves is not correct, so the task only be run on master.
> 
> explain here to help other guy who also encounter similiar problem.
> 
> 
> 2014-03-20 9:57 GMT+08:00 qingyang li <li...@gmail.com>:
> 
> Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select * from src , i can get result, but when i run select count(*) from src or select * from src limit 1,  there is no result output.
> 
> i have found similiar problem on google groups:
> https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
> but , there is no solution on it.
> 
> Does anyone encounter such problem?
> 
>

Re: Shark does not give any results with SELECT count(*) command

Posted by qingyang li <li...@gmail.com>.

reopen this thread because i encounter this problem again.
Here is my env:
scala 2.10.3 s
spark 0.9.0    tandalone mode
shark 0.9.0    downlaod the source code and build by myself
hive hive-shark-0.11
I have copied hive-site.xml from my hadoop cluster , it's hive version is
0.12,  after copied , i deleted some attributes from hive-site.xml

When run select count(*) from xxx, no resut and no errors output.

Can someone give me some suggestions to debug ?





2014-03-20 11:27 GMT+08:00 qingyang li <li...@gmail.com>:

> have found the cause , my problem is :
> the style of file salves is not correct, so the task only be run on master.
>
> explain here to help other guy who also encounter similiar problem.
>
>
> 2014-03-20 9:57 GMT+08:00 qingyang li <li...@gmail.com>:
>
> Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select *
>> from src , i can get result, but when i run select count(*) from src or
>> select * from src limit 1,  there is no result output.
>>
>> i have found similiar problem on google groups:
>>
>> https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
>> but , there is no solution on it.
>>
>> Does anyone encounter such problem?
>>
>
>

Re: Shark does not give any results with SELECT count(*) command

Posted by qingyang li <li...@gmail.com>.

have found the cause , my problem is :
the style of file salves is not correct, so the task only be run on master.

explain here to help other guy who also encounter similiar problem.


2014-03-20 9:57 GMT+08:00 qingyang li <li...@gmail.com>:

> Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select *
> from src , i can get result, but when i run select count(*) from src or
> select * from src limit 1,  there is no result output.
>
> i have found similiar problem on google groups:
>
> https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
> but , there is no solution on it.
>
> Does anyone encounter such problem?
>