You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/06/02 08:54:14 UTC

Spark support for update/delete operations on Hive ORC transactional tables

Hi,

Spark does not support transactions because as I understand there is a
piece in the execution side that needs to send heartbeats to Hive metastore
saying a transaction is still alive". That has not been implemented in
Spark yet to my knowledge."

Any idea on the timelines when we are going to have support for
transactions in Spark for Hive ORC tables. This will really be useful.


Thanks,


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Re: Spark support for update/delete operations on Hive ORC transactional tables

Posted by Ajay Chander <it...@gmail.com>.
Thanks for the confirmation Mich!

On Wednesday, June 22, 2016, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi Ajay,
>
> I am afraid for now transaction heart beat do not work through Spark, so I
> have no other solution.
>
> This is interesting point as with Hive running on Spark engine there is no
> issue with this as Hive handles the transactions,
>
> I gather in simplest form Hive has to deal with its metadata for
> transaction logic but Spark somehow cannot do that.
>
> In short that is it. You need to do that through Hive.
>
> Cheers,
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 22 June 2016 at 16:08, Ajay Chander <itschevva@gmail.com
> <javascript:_e(%7B%7D,'cvml','itschevva@gmail.com');>> wrote:
>
>> Hi Mich,
>>
>> Right now I have a similar usecase where I have to delete some rows
>> from a hive table. My hive table is of type ORC, Bucketed and included
>> transactional property. I can delete from hive shell but not from my
>> spark-shell or spark app. Were you able to find any work around? Thank
>> you.
>>
>> Regards,
>> Ajay
>>
>>
>> On Thursday, June 2, 2016, Mich Talebzadeh <mich.talebzadeh@gmail.com
>> <javascript:_e(%7B%7D,'cvml','mich.talebzadeh@gmail.com');>> wrote:
>>
>>> thanks for that.
>>>
>>> I will have a look
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 2 June 2016 at 10:46, Elliot West <te...@gmail.com> wrote:
>>>
>>>> Related to this, there exists an API in Hive to simplify the
>>>> integrations of other frameworks with Hive's ACID feature:
>>>>
>>>> See:
>>>> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API
>>>>
>>>> It contains code for maintaining heartbeats, handling locks and
>>>> transactions, and submitting mutations in a distributed environment.
>>>>
>>>> We have used it to write to transactional tables from Cascading based
>>>> processes.
>>>>
>>>> Elliot.
>>>>
>>>>
>>>> On 2 June 2016 at 09:54, Mich Talebzadeh <mi...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Spark does not support transactions because as I understand there is
>>>>> a piece in the execution side that needs to send heartbeats to Hive
>>>>> metastore saying a transaction is still alive". That has not been
>>>>> implemented in Spark yet to my knowledge."
>>>>>
>>>>> Any idea on the timelines when we are going to have support for
>>>>> transactions in Spark for Hive ORC tables. This will really be useful.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>

Re: Spark support for update/delete operations on Hive ORC transactional tables

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi Ajay,

I am afraid for now transaction heart beat do not work through Spark, so I
have no other solution.

This is interesting point as with Hive running on Spark engine there is no
issue with this as Hive handles the transactions,

I gather in simplest form Hive has to deal with its metadata for
transaction logic but Spark somehow cannot do that.

In short that is it. You need to do that through Hive.

Cheers,



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 22 June 2016 at 16:08, Ajay Chander <it...@gmail.com> wrote:

> Hi Mich,
>
> Right now I have a similar usecase where I have to delete some rows from a
> hive table. My hive table is of type ORC, Bucketed and included
> transactional property. I can delete from hive shell but not from my
> spark-shell or spark app. Were you able to find any work around? Thank
> you.
>
> Regards,
> Ajay
>
>
> On Thursday, June 2, 2016, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> thanks for that.
>>
>> I will have a look
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 2 June 2016 at 10:46, Elliot West <te...@gmail.com> wrote:
>>
>>> Related to this, there exists an API in Hive to simplify the
>>> integrations of other frameworks with Hive's ACID feature:
>>>
>>> See:
>>> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API
>>>
>>> It contains code for maintaining heartbeats, handling locks and
>>> transactions, and submitting mutations in a distributed environment.
>>>
>>> We have used it to write to transactional tables from Cascading based
>>> processes.
>>>
>>> Elliot.
>>>
>>>
>>> On 2 June 2016 at 09:54, Mich Talebzadeh <mi...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> Spark does not support transactions because as I understand there is a
>>>> piece in the execution side that needs to send heartbeats to Hive metastore
>>>> saying a transaction is still alive". That has not been implemented in
>>>> Spark yet to my knowledge."
>>>>
>>>> Any idea on the timelines when we are going to have support for
>>>> transactions in Spark for Hive ORC tables. This will really be useful.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>
>>>
>>

Re: Spark support for update/delete operations on Hive ORC transactional tables

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi Ajay,

I am afraid for now transaction heart beat do not work through Spark, so I
have no other solution.

This is interesting point as with Hive running on Spark engine there is no
issue with this as Hive handles the transactions,

I gather in simplest form Hive has to deal with its metadata for
transaction logic but Spark somehow cannot do that.

In short that is it. You need to do that through Hive.

Cheers,



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 22 June 2016 at 16:08, Ajay Chander <it...@gmail.com> wrote:

> Hi Mich,
>
> Right now I have a similar usecase where I have to delete some rows from a
> hive table. My hive table is of type ORC, Bucketed and included
> transactional property. I can delete from hive shell but not from my
> spark-shell or spark app. Were you able to find any work around? Thank
> you.
>
> Regards,
> Ajay
>
>
> On Thursday, June 2, 2016, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> thanks for that.
>>
>> I will have a look
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 2 June 2016 at 10:46, Elliot West <te...@gmail.com> wrote:
>>
>>> Related to this, there exists an API in Hive to simplify the
>>> integrations of other frameworks with Hive's ACID feature:
>>>
>>> See:
>>> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API
>>>
>>> It contains code for maintaining heartbeats, handling locks and
>>> transactions, and submitting mutations in a distributed environment.
>>>
>>> We have used it to write to transactional tables from Cascading based
>>> processes.
>>>
>>> Elliot.
>>>
>>>
>>> On 2 June 2016 at 09:54, Mich Talebzadeh <mi...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> Spark does not support transactions because as I understand there is a
>>>> piece in the execution side that needs to send heartbeats to Hive metastore
>>>> saying a transaction is still alive". That has not been implemented in
>>>> Spark yet to my knowledge."
>>>>
>>>> Any idea on the timelines when we are going to have support for
>>>> transactions in Spark for Hive ORC tables. This will really be useful.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>
>>>
>>

Re: Spark support for update/delete operations on Hive ORC transactional tables

Posted by Ajay Chander <it...@gmail.com>.
Hi Mich,

Right now I have a similar usecase where I have to delete some rows from a
hive table. My hive table is of type ORC, Bucketed and included
transactional property. I can delete from hive shell but not from my
spark-shell or spark app. Were you able to find any work around? Thank you.

Regards,
Ajay

On Thursday, June 2, 2016, Mich Talebzadeh <mi...@gmail.com>
wrote:

> thanks for that.
>
> I will have a look
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 2 June 2016 at 10:46, Elliot West <teabot@gmail.com
> <javascript:_e(%7B%7D,'cvml','teabot@gmail.com');>> wrote:
>
>> Related to this, there exists an API in Hive to simplify the integrations
>> of other frameworks with Hive's ACID feature:
>>
>> See:
>> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API
>>
>> It contains code for maintaining heartbeats, handling locks and
>> transactions, and submitting mutations in a distributed environment.
>>
>> We have used it to write to transactional tables from Cascading based
>> processes.
>>
>> Elliot.
>>
>>
>> On 2 June 2016 at 09:54, Mich Talebzadeh <mich.talebzadeh@gmail.com
>> <javascript:_e(%7B%7D,'cvml','mich.talebzadeh@gmail.com');>> wrote:
>>
>>>
>>> Hi,
>>>
>>> Spark does not support transactions because as I understand there is a
>>> piece in the execution side that needs to send heartbeats to Hive metastore
>>> saying a transaction is still alive". That has not been implemented in
>>> Spark yet to my knowledge."
>>>
>>> Any idea on the timelines when we are going to have support for
>>> transactions in Spark for Hive ORC tables. This will really be useful.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>
>>
>

Re: Spark support for update/delete operations on Hive ORC transactional tables

Posted by Mich Talebzadeh <mi...@gmail.com>.
thanks for that.

I will have a look

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 2 June 2016 at 10:46, Elliot West <te...@gmail.com> wrote:

> Related to this, there exists an API in Hive to simplify the integrations
> of other frameworks with Hive's ACID feature:
>
> See:
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API
>
> It contains code for maintaining heartbeats, handling locks and
> transactions, and submitting mutations in a distributed environment.
>
> We have used it to write to transactional tables from Cascading based
> processes.
>
> Elliot.
>
>
> On 2 June 2016 at 09:54, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>>
>> Hi,
>>
>> Spark does not support transactions because as I understand there is a
>> piece in the execution side that needs to send heartbeats to Hive metastore
>> saying a transaction is still alive". That has not been implemented in
>> Spark yet to my knowledge."
>>
>> Any idea on the timelines when we are going to have support for
>> transactions in Spark for Hive ORC tables. This will really be useful.
>>
>>
>> Thanks,
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>
>

Re: Spark support for update/delete operations on Hive ORC transactional tables

Posted by Mich Talebzadeh <mi...@gmail.com>.
thanks for that.

I will have a look

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 2 June 2016 at 10:46, Elliot West <te...@gmail.com> wrote:

> Related to this, there exists an API in Hive to simplify the integrations
> of other frameworks with Hive's ACID feature:
>
> See:
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API
>
> It contains code for maintaining heartbeats, handling locks and
> transactions, and submitting mutations in a distributed environment.
>
> We have used it to write to transactional tables from Cascading based
> processes.
>
> Elliot.
>
>
> On 2 June 2016 at 09:54, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>>
>> Hi,
>>
>> Spark does not support transactions because as I understand there is a
>> piece in the execution side that needs to send heartbeats to Hive metastore
>> saying a transaction is still alive". That has not been implemented in
>> Spark yet to my knowledge."
>>
>> Any idea on the timelines when we are going to have support for
>> transactions in Spark for Hive ORC tables. This will really be useful.
>>
>>
>> Thanks,
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>
>

Re: Spark support for update/delete operations on Hive ORC transactional tables

Posted by Elliot West <te...@gmail.com>.
Related to this, there exists an API in Hive to simplify the integrations
of other frameworks with Hive's ACID feature:

See:
https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API

It contains code for maintaining heartbeats, handling locks and
transactions, and submitting mutations in a distributed environment.

We have used it to write to transactional tables from Cascading based
processes.

Elliot.


On 2 June 2016 at 09:54, Mich Talebzadeh <mi...@gmail.com> wrote:

>
> Hi,
>
> Spark does not support transactions because as I understand there is a
> piece in the execution side that needs to send heartbeats to Hive metastore
> saying a transaction is still alive". That has not been implemented in
> Spark yet to my knowledge."
>
> Any idea on the timelines when we are going to have support for
> transactions in Spark for Hive ORC tables. This will really be useful.
>
>
> Thanks,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>

Re: Spark support for update/delete operations on Hive ORC transactional tables

Posted by Elliot West <te...@gmail.com>.
Related to this, there exists an API in Hive to simplify the integrations
of other frameworks with Hive's ACID feature:

See:
https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API

It contains code for maintaining heartbeats, handling locks and
transactions, and submitting mutations in a distributed environment.

We have used it to write to transactional tables from Cascading based
processes.

Elliot.


On 2 June 2016 at 09:54, Mich Talebzadeh <mi...@gmail.com> wrote:

>
> Hi,
>
> Spark does not support transactions because as I understand there is a
> piece in the execution side that needs to send heartbeats to Hive metastore
> saying a transaction is still alive". That has not been implemented in
> Spark yet to my knowledge."
>
> Any idea on the timelines when we are going to have support for
> transactions in Spark for Hive ORC tables. This will really be useful.
>
>
> Thanks,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>

Re: Spark support for update/delete operations on Hive ORC transactional tables

Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks for this update.

I can create a hive ORC transactional table with Spark no problem. the
whole thing in Hive on spark including updates works fine.

my Spark is 1.6.1 and Hive is version 2

Bur updates of ORC transactional table through Spark fails I am afraid

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

*scala> val HiveContext = new
org.apache.spark.sql.hive.HiveContext(sc)*HiveContext:
org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@eb9b76c
*//*
*//  My source table a plane text table called oraclehadoop.sales_staging*
*//*

*scala> HiveContext.sql("use oraclehadoop")*res0:
org.apache.spark.sql.DataFrame = [result: string]

*scala> val s = HiveContext.table("sales_staging")*s:
org.apache.spark.sql.DataFrame = [prod_id: bigint, cust_id: bigint,
time_id: timestamp, channel_id: bigint, promo_id: bigint, quantity_sold:
decimal(10,0), amount_sold: decimal(10,0)]
*//*
*// register it as temp table*
*scala> s.registerTempTable("tmp")*
//
// Create a new ORC transactional table through Spark
//

*scala>   HiveContext.sql("DROP TABLE IF EXISTS oraclehadoop.orctest")*res2:
org.apache.spark.sql.DataFrame = []

*scala>   var sqltext : String = ""sqltext: String = ""*





















*scala>   sqltext = """     | CREATE TABLE orctest     |  (     |
PROD_ID        bigint                       ,     |   CUST_ID
bigint                       ,     |   TIME_ID
timestamp                    ,     |   CHANNEL_ID
bigint                       ,     |   PROMO_ID
bigint                       ,     |   QUANTITY_SOLD
decimal(10,0)                  ,     |   AMOUNT_SOLD    decimal(10,0)     |
)     | CLUSTERED BY (PROD_ID) INTO 256 BUCKETS     | STORED AS ORC     |
TBLPROPERTIES (     |   "orc.compress"="SNAPPY",     |
"transactional"="true",     |   "orc.create.index"="true",     |
"orc.stripe.size"="16777216",     |   "orc.row.index.stride"="10000"     |
)     | """*
sqltext: String =
CREATE TABLE orctest
 (
  PROD_ID        bigint                       ,
  CUST_ID        bigint                       ,
  TIME_ID        timestamp                    ,
  CHANNEL_ID     bigint                       ,
  PROMO_ID       bigint                       ,
  QUANTITY_SOLD  decimal(10,0)                  ,
  AMOUNT_SOLD    decimal(10,0)
)
CLUSTERED BY (PROD_ID) INTO 256 BUCKETS
STORED AS ORC
TBLPROPERTIES (
  "orc.compress"="SNAPPY",
  "transactional"="true",
  "orc.create.index"="true",
  "orc.stripe.size"="16777216",
  "orc.row.index.stride"="10000"
)
"""

*scala>    HiveContext.sql(sqltext)*res3: org.apache.spark.sql.DataFrame =
[result: string]
scala>   //
scala>   // Put data in Hive table.
scala>   //




*scala> sqltext = """     | INSERT INTO TABLE oraclehadoop.orctest     |
select * from tmp     | """*sqltext: String =
INSERT INTO TABLE oraclehadoop.orctest
select * from tmp


*scala> HiveContext.sql(sqltext)*res4: org.apache.spark.sql.DataFrame = []
//
// Rows are there
//

*scala> sql("select count(1) from oraclehadoop.orctest").show*+------+
|   _c0|
+------+
|918843|
+------+
//
// Now let us try try few rows. This works fine in Hive. However, it fails
here
//
scala> sql("update orctest set amount_sold = 1300 where prod_id = 13")


*org.apache.spark.sql.AnalysisException:Unsupported language features in
query: update orctest set amount_sold = 1300 where prod_id =
13*TOK_UPDATE_TABLE
1, 0,18, 7
  TOK_TABNAME 1, 2,2, 7
    orctest 1, 2,2, 7
  TOK_SET_COLUMNS_CLAUSE 1, 4,10, 31
    = 1, 6,10, 31
      TOK_TABLE_OR_COL 1, 6,6, 19
        amount_sold 1, 6,6, 19
      1300 1, 10,10, 33
  TOK_WHERE 1, 12,18, 52
    = 1, 14,18, 52
      TOK_TABLE_OR_COL 1, 14,14, 44
        prod_id 1, 14,14, 44
      13 1, 18,18, 54
scala.NotImplementedError: No parse rules for TOK_UPDATE_TABLE:
 TOK_UPDATE_TABLE 1, 0,18, 7
  TOK_TABNAME 1, 2,2, 7
    orctest 1, 2,2, 7
  TOK_SET_COLUMNS_CLAUSE 1, 4,10, 31
    = 1, 6,10, 31
      TOK_TABLE_OR_COL 1, 6,6, 19
        amount_sold 1, 6,6, 19
      1300 1, 10,10, 33
  TOK_WHERE 1, 12,18, 52
    = 1, 14,18, 52
      TOK_TABLE_OR_COL 1, 14,14, 44
        prod_id 1, 14,14, 44
      13 1, 18,18, 54
org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:1217)
          ;
        at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:326)
        at
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
        at
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
        at
scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
        at
scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
        at
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
        at
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
        at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:295)
        at
org.apache.spark.sql.hive.HiveQLDialect$$anonfun$parse$1.apply(HiveContext.scala:66)
        at
org.apache.spark.sql.hive.HiveQLDialect$$anonfun$parse$1.apply(HiveContext.scala:66)
        at
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:290)
        at
org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237)
        at
org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236)
        at
org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279)
        at
org.apache.spark.sql.hive.HiveQLDialect.parse(HiveContext.scala:65)
        at
org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
        at
org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
        at
org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
        at
org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:113)
        at
scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
        at
scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
        at
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
        at
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
        at
org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:208)
        at
org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:208)
        at
org.apache.spark.sql.execution.datasources.DDLParser.parse(DDLParser.scala:43)
        at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:231)
        at
org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:331)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:37)
        at $iwC$$iwC$$iwC.<init>(<console>:39)
        at $iwC$$iwC.<init>(<console>:41)
        at $iwC.<init>(<console>:43)
        at <init>(<console>:45)
        at .<init>(<console>:49)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
        at
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at
org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at
org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org
$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org
$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 6 June 2016 at 19:15, Alan Gates <al...@gmail.com> wrote:

> This JIRA https://issues.apache.org/jira/browse/HIVE-12366 moved the
> heartbeat logic from the engine to the client.  AFAIK this was the only
> issue preventing working with Spark as an engine.  That JIRA was released
> in 2.0.
>
> I want to stress that to my knowledge no one has tested this combination
> of features, so there may be other problem.  But at least this issue has
> been resolved.
>
> Alan.
>
> > On Jun 2, 2016, at 01:54, Mich Talebzadeh <mi...@gmail.com>
> wrote:
> >
> >
> > Hi,
> >
> > Spark does not support transactions because as I understand there is a
> piece in the execution side that needs to send heartbeats to Hive metastore
> saying a transaction is still alive". That has not been implemented in
> Spark yet to my knowledge."
> >
> > Any idea on the timelines when we are going to have support for
> transactions in Spark for Hive ORC tables. This will really be useful.
> >
> >
> > Thanks,
> >
> >
> > Dr Mich Talebzadeh
> >
> > LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> > http://talebzadehmich.wordpress.com
> >
>
>

Re: Spark support for update/delete operations on Hive ORC transactional tables

Posted by Alan Gates <al...@gmail.com>.
This JIRA https://issues.apache.org/jira/browse/HIVE-12366 moved the heartbeat logic from the engine to the client.  AFAIK this was the only issue preventing working with Spark as an engine.  That JIRA was released in 2.0.

I want to stress that to my knowledge no one has tested this combination of features, so there may be other problem.  But at least this issue has been resolved.

Alan.

> On Jun 2, 2016, at 01:54, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> 
> Hi,
> 
> Spark does not support transactions because as I understand there is a piece in the execution side that needs to send heartbeats to Hive metastore saying a transaction is still alive". That has not been implemented in Spark yet to my knowledge."
> 
> Any idea on the timelines when we are going to have support for transactions in Spark for Hive ORC tables. This will really be useful.
> 
> 
> Thanks,
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>