You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Brad Heller <br...@gmail.com> on 2014/04/20 23:50:17 UTC

Hung inserts?

Hey list,

I've got some CSV data I'm importing from S3. I can create the external
table well enough, and I can also do a CREATE TABLE ... AS SELECT ... from
it to pull the data internal to Spark.

Here's the HQL for my external table:
https://gist.github.com/bradhe/11126024

Now I'd like to add partitioning and clustering to my permanent table. So,
I create a new table and try to do an INSERT ... SELECT

Here's the HQL for my internal, partitioned table and the insert select:
https://gist.github.com/bradhe/11126047

Oddly, the query is scheduled...but it never makes any progress!
http://i.imgur.com/vXvgpzD.png

Is this a bug? Am I doing something dumb?

Thanks,
Brad Heller

Re: Hung inserts?

Posted by Brad Heller <br...@gmail.com>.
So after a little more investigation it turns out this issue happens
specifically when I interact with shark server. If I log in to the master
and start a shark session (./bin/shark), everything works as expected.

i'm starting shark server with the following upstart script, am I doing
something wrong?? https://gist.github.com/bradhe/11159123


On Mon, Apr 21, 2014 at 3:31 PM, Brad Heller <br...@gmail.com> wrote:

> I tried removing the CLUSTERED directive and get the same results :( I
> also removed SORTED, same deal.
>
> I'm going to try removign partitioning all together for now.
>
>
> On Mon, Apr 21, 2014 at 4:58 AM, Mayur Rustagi <ma...@gmail.com>wrote:
>
>> Clustering is not supported. Can you remove that & give it a go.
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>
>>
>>
>> On Mon, Apr 21, 2014 at 3:20 AM, Brad Heller <br...@gmail.com>wrote:
>>
>>> Hey list,
>>>
>>> I've got some CSV data I'm importing from S3. I can create the external
>>> table well enough, and I can also do a CREATE TABLE ... AS SELECT ... from
>>> it to pull the data internal to Spark.
>>>
>>> Here's the HQL for my external table:
>>> https://gist.github.com/bradhe/11126024
>>>
>>> Now I'd like to add partitioning and clustering to my permanent table.
>>> So, I create a new table and try to do an INSERT ... SELECT
>>>
>>> Here's the HQL for my internal, partitioned table and the insert select:
>>> https://gist.github.com/bradhe/11126047
>>>
>>> Oddly, the query is scheduled...but it never makes any progress!
>>> http://i.imgur.com/vXvgpzD.png
>>>
>>> Is this a bug? Am I doing something dumb?
>>>
>>> Thanks,
>>> Brad Heller
>>>
>>
>>
>

Re: Hung inserts?

Posted by Brad Heller <br...@gmail.com>.
I tried removing the CLUSTERED directive and get the same results :( I also
removed SORTED, same deal.

I'm going to try removign partitioning all together for now.


On Mon, Apr 21, 2014 at 4:58 AM, Mayur Rustagi <ma...@gmail.com>wrote:

> Clustering is not supported. Can you remove that & give it a go.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Mon, Apr 21, 2014 at 3:20 AM, Brad Heller <br...@gmail.com>wrote:
>
>> Hey list,
>>
>> I've got some CSV data I'm importing from S3. I can create the external
>> table well enough, and I can also do a CREATE TABLE ... AS SELECT ... from
>> it to pull the data internal to Spark.
>>
>> Here's the HQL for my external table:
>> https://gist.github.com/bradhe/11126024
>>
>> Now I'd like to add partitioning and clustering to my permanent table.
>> So, I create a new table and try to do an INSERT ... SELECT
>>
>> Here's the HQL for my internal, partitioned table and the insert select:
>> https://gist.github.com/bradhe/11126047
>>
>> Oddly, the query is scheduled...but it never makes any progress!
>> http://i.imgur.com/vXvgpzD.png
>>
>> Is this a bug? Am I doing something dumb?
>>
>> Thanks,
>> Brad Heller
>>
>
>

Re: Hung inserts?

Posted by Mayur Rustagi <ma...@gmail.com>.
Clustering is not supported. Can you remove that & give it a go.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Mon, Apr 21, 2014 at 3:20 AM, Brad Heller <br...@gmail.com> wrote:

> Hey list,
>
> I've got some CSV data I'm importing from S3. I can create the external
> table well enough, and I can also do a CREATE TABLE ... AS SELECT ... from
> it to pull the data internal to Spark.
>
> Here's the HQL for my external table:
> https://gist.github.com/bradhe/11126024
>
> Now I'd like to add partitioning and clustering to my permanent table. So,
> I create a new table and try to do an INSERT ... SELECT
>
> Here's the HQL for my internal, partitioned table and the insert select:
> https://gist.github.com/bradhe/11126047
>
> Oddly, the query is scheduled...but it never makes any progress!
> http://i.imgur.com/vXvgpzD.png
>
> Is this a bug? Am I doing something dumb?
>
> Thanks,
> Brad Heller
>

Re: Hung inserts?

Posted by Rahul Chugh <ra...@gmail.com>.
 M   ¥
n vc  czwqq

On Sunday, April 20, 2014, Brad Heller <br...@gmail.com> wrote:

> Hey list,
>
> I've got some CSV data I'm importing from S3. I can create the external
> table well enough, and I can also do a CREATE TABLE ... AS SELECT ... from
> it to pull the data internal to Spark.
>
> Here's the HQL for my external table:
> https://gist.github.com/bradhe/11126024
>
> Now I'd like to add partitioning and clustering to my permanent table. So,
> I create a new table and try to do an INSERT ... SELECT
>
> Here's the HQL for my internal, partitioned table and the insert select:
> https://gist.github.com/bradhe/11126047
>
> Oddly, the query is scheduled...but it never makes any progress!
> http://i.imgur.com/vXvgpzD.png
>
> Is this a bug? Am I doing something dumb?
>
> Thanks,
> Brad Heller
>