You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Abhishek Somani <ab...@gmail.com> on 2019/07/26 12:37:55 UTC

New Spark Datasource for Hive ACID tables

Hi All,

We at Qubole <https://www.qubole.com/> have open sourced a datasource that
will enable users to work on their Hive ACID Transactional Tables
<https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions> using
Spark.

Github: https://github.com/qubole/spark-acid

Hive ACID tables allow users to work on their data transactionally, and
also gives them the ability to Delete, Update and Merge data efficiently
without having to rewrite all of their data in a table, partition or file.
We believe that being able to work on these tables from Spark is a much
desired value add, as is also apparent in
https://issues.apache.org/jira/browse/SPARK-15348 and
https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
looking for it. Currently the datasource supports reading from these ACID
tables only, and we are working on adding the ability to write into these
tables via Spark as well.

The datasource is also available as a spark package, and instructions on
how to use it are available on the Github page
<https://github.com/qubole/spark-acid>.

We welcome your feedback and suggestions.

Thanks,
Abhishek Somani

Re: New Spark Datasource for Hive ACID tables

Posted by Abhishek Somani <ab...@gmail.com>.

I realised that the build instructions in the README.md were not very clear
due to some recent changes. I have updated those now.

Thanks,
Abhishek Somani

On Sun, Jul 28, 2019 at 7:53 AM naresh Goud <na...@gmail.com>
wrote:

> Thanks Abhishek.
> I will check it out.
>
> Thank you,
> Naresh
>
> On Sat, Jul 27, 2019 at 9:21 PM Abhishek Somani <
> abhisheksomani88@gmail.com> wrote:
>
>> Hey Naresh,
>>
>> There is a `shaded-dependecies` project inside the root directory. You
>> need to go into that and build and publish that to local first.
>>
>> cd shaded-dependencies
>>> sbt clean publishLocal
>>>
>>
>> After that, come back out to the root directory and build that project.
>> The spark-acid-shaded-dependencies jar will now be found:
>>
>>> cd ..
>>> sbt assembly
>>
>>
>> This will create the jar which you can use.
>>
>> On another note, unless you are making changes in the code, you don't
>> need to build yourself as the jar is published in
>> https://spark-packages.org/package/qubole/spark-acid. So you can just
>> use it as:
>>
>> spark-shell --packages qubole:spark-acid:0.4.0-s_2.11
>>
>>
>> ...and it will be automatically fetched and used.
>>
>> Thanks,
>> Abhishek
>>
>>
>> On Sun, Jul 28, 2019 at 4:42 AM naresh Goud <na...@gmail.com>
>> wrote:
>>
>>> It looks there is some internal dependency missing.
>>>
>>> libraryDependencies ++= Seq(
>>> "com.qubole" %% "spark-acid-shaded-dependencies" % "0.1"
>>> )
>>>
>>> How do we get it?
>>>
>>>
>>> Thank you,
>>> Naresh
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Naresh
>>> www.linkedin.com/in/naresh-dulam
>>> http://hadoopandspark.blogspot.com/
>>>
>>>
>>>
>>> On Sat, Jul 27, 2019 at 5:34 PM naresh Goud <na...@gmail.com>
>>> wrote:
>>>
>>>> Hi Abhishek,
>>>>
>>>>
>>>> We are not able to build jar using git hub code with below error?
>>>>
>>>> Any others able to build jars? Is there anything else missing?
>>>>
>>>>
>>>>
>>>> Note: Unresolved dependencies path:
>>>> [warn]          com.qubole:spark-acid-shaded-dependencies_2.11:0.1
>>>> (C:\Data\Hadoop\spark-acid-master\build.sbt#L51-54)
>>>> [warn]            +- com.qubole:spark-acid_2.11:0.4.0
>>>> sbt.ResolveException: unresolved dependency:
>>>> com.qubole#spark-acid-shaded-dependencies_2.11;0.1: not found
>>>>         at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:313)
>>>>         at
>>>> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>>>>         at
>>>> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>>>>         at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>>>>         at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>>>>         at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>>>>         at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>>>>         at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>>>>         at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>>>>         at
>>>> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>>>>         at
>>>> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>>>>         at xsbt.boot.Using$.withResource(Using.scala:10)
>>>>         at xsbt.boot.Using$.apply(Using.scala:9)
>>>>         at
>>>> xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>>>>         at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>>>>         at xsbt.boot.Locks$.apply0(Locks.scala:31)
>>>>         at xsbt.boot.Locks$.apply(Locks.scala:28)
>>>>         at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>>>>         at sbt.IvySbt.withIvy(Ivy.scala:128)
>>>>         at sbt.IvySbt.withIvy(Ivy.scala:125)
>>>>         at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>>>>         at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>>>>         at
>>>> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1541)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Naresh
>>>> www.linkedin.com/in/naresh-dulam
>>>> http://hadoopandspark.blogspot.com/
>>>>
>>>>
>>>>
>>>> On Sat, Jul 27, 2019 at 3:25 PM Nicolas Paris <ni...@riseup.net>
>>>> wrote:
>>>>
>>>>> Congrats
>>>>>
>>>>> The read/write feature with hive3 is highly interesting
>>>>>
>>>>> On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
>>>>> > Hi All,
>>>>> >
>>>>> > We at Qubole have open sourced a datasource that will enable users
>>>>> to work on
>>>>> > their Hive ACID Transactional Tables using Spark.
>>>>> >
>>>>> > Github: https://github.com/qubole/spark-acid
>>>>> >
>>>>> > Hive ACID tables allow users to work on their data transactionally,
>>>>> and also
>>>>> > gives them the ability to Delete, Update and Merge data efficiently
>>>>> without
>>>>> > having to rewrite all of their data in a table, partition or file.
>>>>> We believe
>>>>> > that being able to work on these tables from Spark is a much desired
>>>>> value add,
>>>>> > as is also apparent in
>>>>> https://issues.apache.org/jira/browse/SPARK-15348 and
>>>>> > https://issues.apache.org/jira/browse/SPARK-16996 with multiple
>>>>> people looking
>>>>> > for it. Currently the datasource supports reading from these ACID
>>>>> tables only,
>>>>> > and we are working on adding the ability to write into these tables
>>>>> via Spark
>>>>> > as well.
>>>>> >
>>>>> > The datasource is also available as a spark package, and
>>>>> instructions on how to
>>>>> > use it are available on the Github page.
>>>>> >
>>>>> > We welcome your feedback and suggestions.
>>>>> >
>>>>> > Thanks,
>>>>> > Abhishek Somani
>>>>>
>>>>> --
>>>>> nicolas
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>
>>>>> --
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>

Re: New Spark Datasource for Hive ACID tables

Posted by naresh Goud <na...@gmail.com>.

Thanks Abhishek.
I will check it out.

Thank you,
Naresh

On Sat, Jul 27, 2019 at 9:21 PM Abhishek Somani <ab...@gmail.com>
wrote:

> Hey Naresh,
>
> There is a `shaded-dependecies` project inside the root directory. You
> need to go into that and build and publish that to local first.
>
> cd shaded-dependencies
>> sbt clean publishLocal
>>
>
> After that, come back out to the root directory and build that project.
> The spark-acid-shaded-dependencies jar will now be found:
>
>> cd ..
>> sbt assembly
>
>
> This will create the jar which you can use.
>
> On another note, unless you are making changes in the code, you don't need
> to build yourself as the jar is published in
> https://spark-packages.org/package/qubole/spark-acid. So you can just use
> it as:
>
> spark-shell --packages qubole:spark-acid:0.4.0-s_2.11
>
>
> ...and it will be automatically fetched and used.
>
> Thanks,
> Abhishek
>
>
> On Sun, Jul 28, 2019 at 4:42 AM naresh Goud <na...@gmail.com>
> wrote:
>
>> It looks there is some internal dependency missing.
>>
>> libraryDependencies ++= Seq(
>> "com.qubole" %% "spark-acid-shaded-dependencies" % "0.1"
>> )
>>
>> How do we get it?
>>
>>
>> Thank you,
>> Naresh
>>
>>
>>
>>
>> Thanks,
>> Naresh
>> www.linkedin.com/in/naresh-dulam
>> http://hadoopandspark.blogspot.com/
>>
>>
>>
>> On Sat, Jul 27, 2019 at 5:34 PM naresh Goud <na...@gmail.com>
>> wrote:
>>
>>> Hi Abhishek,
>>>
>>>
>>> We are not able to build jar using git hub code with below error?
>>>
>>> Any others able to build jars? Is there anything else missing?
>>>
>>>
>>>
>>> Note: Unresolved dependencies path:
>>> [warn]          com.qubole:spark-acid-shaded-dependencies_2.11:0.1
>>> (C:\Data\Hadoop\spark-acid-master\build.sbt#L51-54)
>>> [warn]            +- com.qubole:spark-acid_2.11:0.4.0
>>> sbt.ResolveException: unresolved dependency:
>>> com.qubole#spark-acid-shaded-dependencies_2.11;0.1: not found
>>>         at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:313)
>>>         at
>>> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>>>         at
>>> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>>>         at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>>>         at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>>>         at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>>>         at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>>>         at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>>>         at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>>>         at
>>> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>>>         at
>>> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>>>         at xsbt.boot.Using$.withResource(Using.scala:10)
>>>         at xsbt.boot.Using$.apply(Using.scala:9)
>>>         at
>>> xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>>>         at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>>>         at xsbt.boot.Locks$.apply0(Locks.scala:31)
>>>         at xsbt.boot.Locks$.apply(Locks.scala:28)
>>>         at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>>>         at sbt.IvySbt.withIvy(Ivy.scala:128)
>>>         at sbt.IvySbt.withIvy(Ivy.scala:125)
>>>         at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>>>         at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>>>         at
>>> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1541)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Naresh
>>> www.linkedin.com/in/naresh-dulam
>>> http://hadoopandspark.blogspot.com/
>>>
>>>
>>>
>>> On Sat, Jul 27, 2019 at 3:25 PM Nicolas Paris <ni...@riseup.net>
>>> wrote:
>>>
>>>> Congrats
>>>>
>>>> The read/write feature with hive3 is highly interesting
>>>>
>>>> On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
>>>> > Hi All,
>>>> >
>>>> > We at Qubole have open sourced a datasource that will enable users to
>>>> work on
>>>> > their Hive ACID Transactional Tables using Spark.
>>>> >
>>>> > Github: https://github.com/qubole/spark-acid
>>>> >
>>>> > Hive ACID tables allow users to work on their data transactionally,
>>>> and also
>>>> > gives them the ability to Delete, Update and Merge data efficiently
>>>> without
>>>> > having to rewrite all of their data in a table, partition or file. We
>>>> believe
>>>> > that being able to work on these tables from Spark is a much desired
>>>> value add,
>>>> > as is also apparent in
>>>> https://issues.apache.org/jira/browse/SPARK-15348 and
>>>> > https://issues.apache.org/jira/browse/SPARK-16996 with multiple
>>>> people looking
>>>> > for it. Currently the datasource supports reading from these ACID
>>>> tables only,
>>>> > and we are working on adding the ability to write into these tables
>>>> via Spark
>>>> > as well.
>>>> >
>>>> > The datasource is also available as a spark package, and instructions
>>>> on how to
>>>> > use it are available on the Github page.
>>>> >
>>>> > We welcome your feedback and suggestions.
>>>> >
>>>> > Thanks,
>>>> > Abhishek Somani
>>>>
>>>> --
>>>> nicolas
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>> --
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/

Re: New Spark Datasource for Hive ACID tables

Posted by Abhishek Somani <ab...@gmail.com>.

Hey Naresh,

There is a `shaded-dependecies` project inside the root directory. You need
to go into that and build and publish that to local first.

cd shaded-dependencies
> sbt clean publishLocal
>

After that, come back out to the root directory and build that project. The
spark-acid-shaded-dependencies jar will now be found:

> cd ..
> sbt assembly


This will create the jar which you can use.

On another note, unless you are making changes in the code, you don't need
to build yourself as the jar is published in
https://spark-packages.org/package/qubole/spark-acid. So you can just use
it as:

spark-shell --packages qubole:spark-acid:0.4.0-s_2.11


...and it will be automatically fetched and used.

Thanks,
Abhishek


On Sun, Jul 28, 2019 at 4:42 AM naresh Goud <na...@gmail.com>
wrote:

> It looks there is some internal dependency missing.
>
> libraryDependencies ++= Seq(
> "com.qubole" %% "spark-acid-shaded-dependencies" % "0.1"
> )
>
> How do we get it?
>
>
> Thank you,
> Naresh
>
>
>
>
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>
>
> On Sat, Jul 27, 2019 at 5:34 PM naresh Goud <na...@gmail.com>
> wrote:
>
>> Hi Abhishek,
>>
>>
>> We are not able to build jar using git hub code with below error?
>>
>> Any others able to build jars? Is there anything else missing?
>>
>>
>>
>> Note: Unresolved dependencies path:
>> [warn]          com.qubole:spark-acid-shaded-dependencies_2.11:0.1
>> (C:\Data\Hadoop\spark-acid-master\build.sbt#L51-54)
>> [warn]            +- com.qubole:spark-acid_2.11:0.4.0
>> sbt.ResolveException: unresolved dependency:
>> com.qubole#spark-acid-shaded-dependencies_2.11;0.1: not found
>>         at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:313)
>>         at
>> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>>         at
>> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>>         at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>>         at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>>         at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>>         at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>>         at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>>         at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>>         at
>> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>>         at
>> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>>         at xsbt.boot.Using$.withResource(Using.scala:10)
>>         at xsbt.boot.Using$.apply(Using.scala:9)
>>         at
>> xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>>         at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>>         at xsbt.boot.Locks$.apply0(Locks.scala:31)
>>         at xsbt.boot.Locks$.apply(Locks.scala:28)
>>         at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>>         at sbt.IvySbt.withIvy(Ivy.scala:128)
>>         at sbt.IvySbt.withIvy(Ivy.scala:125)
>>         at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>>         at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>>         at
>> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1541)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks,
>> Naresh
>> www.linkedin.com/in/naresh-dulam
>> http://hadoopandspark.blogspot.com/
>>
>>
>>
>> On Sat, Jul 27, 2019 at 3:25 PM Nicolas Paris <ni...@riseup.net>
>> wrote:
>>
>>> Congrats
>>>
>>> The read/write feature with hive3 is highly interesting
>>>
>>> On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
>>> > Hi All,
>>> >
>>> > We at Qubole have open sourced a datasource that will enable users to
>>> work on
>>> > their Hive ACID Transactional Tables using Spark.
>>> >
>>> > Github: https://github.com/qubole/spark-acid
>>> >
>>> > Hive ACID tables allow users to work on their data transactionally,
>>> and also
>>> > gives them the ability to Delete, Update and Merge data efficiently
>>> without
>>> > having to rewrite all of their data in a table, partition or file. We
>>> believe
>>> > that being able to work on these tables from Spark is a much desired
>>> value add,
>>> > as is also apparent in
>>> https://issues.apache.org/jira/browse/SPARK-15348 and
>>> > https://issues.apache.org/jira/browse/SPARK-16996 with multiple
>>> people looking
>>> > for it. Currently the datasource supports reading from these ACID
>>> tables only,
>>> > and we are working on adding the ability to write into these tables
>>> via Spark
>>> > as well.
>>> >
>>> > The datasource is also available as a spark package, and instructions
>>> on how to
>>> > use it are available on the Github page.
>>> >
>>> > We welcome your feedback and suggestions.
>>> >
>>> > Thanks,
>>> > Abhishek Somani
>>>
>>> --
>>> nicolas
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>

Re: New Spark Datasource for Hive ACID tables

Posted by naresh Goud <na...@gmail.com>.

It looks there is some internal dependency missing.

libraryDependencies ++= Seq(
"com.qubole" %% "spark-acid-shaded-dependencies" % "0.1"
)

How do we get it?


Thank you,
Naresh




Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/



On Sat, Jul 27, 2019 at 5:34 PM naresh Goud <na...@gmail.com>
wrote:

> Hi Abhishek,
>
>
> We are not able to build jar using git hub code with below error?
>
> Any others able to build jars? Is there anything else missing?
>
>
>
> Note: Unresolved dependencies path:
> [warn]          com.qubole:spark-acid-shaded-dependencies_2.11:0.1
> (C:\Data\Hadoop\spark-acid-master\build.sbt#L51-54)
> [warn]            +- com.qubole:spark-acid_2.11:0.4.0
> sbt.ResolveException: unresolved dependency:
> com.qubole#spark-acid-shaded-dependencies_2.11;0.1: not found
>         at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:313)
>         at
> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
>         at
> sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
>         at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>         at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
>         at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
>         at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
>         at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
>         at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
>         at
> xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
>         at
> xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
>         at xsbt.boot.Using$.withResource(Using.scala:10)
>         at xsbt.boot.Using$.apply(Using.scala:9)
>         at
> xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
>         at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
>         at xsbt.boot.Locks$.apply0(Locks.scala:31)
>         at xsbt.boot.Locks$.apply(Locks.scala:28)
>         at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
>         at sbt.IvySbt.withIvy(Ivy.scala:128)
>         at sbt.IvySbt.withIvy(Ivy.scala:125)
>         at sbt.IvySbt$Module.withModule(Ivy.scala:156)
>         at sbt.IvyActions$.updateEither(IvyActions.scala:168)
>         at
> sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1541)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>
>
> On Sat, Jul 27, 2019 at 3:25 PM Nicolas Paris <ni...@riseup.net>
> wrote:
>
>> Congrats
>>
>> The read/write feature with hive3 is highly interesting
>>
>> On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
>> > Hi All,
>> >
>> > We at Qubole have open sourced a datasource that will enable users to
>> work on
>> > their Hive ACID Transactional Tables using Spark.
>> >
>> > Github: https://github.com/qubole/spark-acid
>> >
>> > Hive ACID tables allow users to work on their data transactionally, and
>> also
>> > gives them the ability to Delete, Update and Merge data efficiently
>> without
>> > having to rewrite all of their data in a table, partition or file. We
>> believe
>> > that being able to work on these tables from Spark is a much desired
>> value add,
>> > as is also apparent in
>> https://issues.apache.org/jira/browse/SPARK-15348 and
>> > https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
>> looking
>> > for it. Currently the datasource supports reading from these ACID
>> tables only,
>> > and we are working on adding the ability to write into these tables via
>> Spark
>> > as well.
>> >
>> > The datasource is also available as a spark package, and instructions
>> on how to
>> > use it are available on the Github page.
>> >
>> > We welcome your feedback and suggestions.
>> >
>> > Thanks,
>> > Abhishek Somani
>>
>> --
>> nicolas
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: New Spark Datasource for Hive ACID tables

Posted by naresh Goud <na...@gmail.com>.

Hi Abhishek,


We are not able to build jar using git hub code with below error?

Any others able to build jars? Is there anything else missing?



Note: Unresolved dependencies path:
[warn]          com.qubole:spark-acid-shaded-dependencies_2.11:0.1
(C:\Data\Hadoop\spark-acid-master\build.sbt#L51-54)
[warn]            +- com.qubole:spark-acid_2.11:0.4.0
sbt.ResolveException: unresolved dependency:
com.qubole#spark-acid-shaded-dependencies_2.11;0.1: not found
        at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:313)
        at
sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:191)
        at
sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:168)
        at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
        at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:156)
        at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:133)
        at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:57)
        at sbt.IvySbt$$anon$4.call(Ivy.scala:65)
        at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
        at
xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:78)
        at
xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:97)
        at xsbt.boot.Using$.withResource(Using.scala:10)
        at xsbt.boot.Using$.apply(Using.scala:9)
        at
xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
        at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
        at xsbt.boot.Locks$.apply0(Locks.scala:31)
        at xsbt.boot.Locks$.apply(Locks.scala:28)
        at sbt.IvySbt.withDefaultLogger(Ivy.scala:65)
        at sbt.IvySbt.withIvy(Ivy.scala:128)
        at sbt.IvySbt.withIvy(Ivy.scala:125)
        at sbt.IvySbt$Module.withModule(Ivy.scala:156)
        at sbt.IvyActions$.updateEither(IvyActions.scala:168)
        at
sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1541)














Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/



On Sat, Jul 27, 2019 at 3:25 PM Nicolas Paris <ni...@riseup.net>
wrote:

> Congrats
>
> The read/write feature with hive3 is highly interesting
>
> On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
> > Hi All,
> >
> > We at Qubole have open sourced a datasource that will enable users to
> work on
> > their Hive ACID Transactional Tables using Spark.
> >
> > Github: https://github.com/qubole/spark-acid
> >
> > Hive ACID tables allow users to work on their data transactionally, and
> also
> > gives them the ability to Delete, Update and Merge data efficiently
> without
> > having to rewrite all of their data in a table, partition or file. We
> believe
> > that being able to work on these tables from Spark is a much desired
> value add,
> > as is also apparent in https://issues.apache.org/jira/browse/SPARK-15348
>  and
> > https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
> looking
> > for it. Currently the datasource supports reading from these ACID tables
> only,
> > and we are working on adding the ability to write into these tables via
> Spark
> > as well.
> >
> > The datasource is also available as a spark package, and instructions on
> how to
> > use it are available on the Github page.
> >
> > We welcome your feedback and suggestions.
> >
> > Thanks,
> > Abhishek Somani
>
> --
> nicolas
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: New Spark Datasource for Hive ACID tables

Posted by Nicolas Paris <ni...@riseup.net>.

Congrats

The read/write feature with hive3 is highly interesting

On Fri, Jul 26, 2019 at 06:07:55PM +0530, Abhishek Somani wrote:
> Hi All,
> 
> We at Qubole have open sourced a datasource that will enable users to work on
> their Hive ACID Transactional Tables using Spark. 
> 
> Github: https://github.com/qubole/spark-acid
> 
> Hive ACID tables allow users to work on their data transactionally, and also
> gives them the ability to Delete, Update and Merge data efficiently without
> having to rewrite all of their data in a table, partition or file. We believe
> that being able to work on these tables from Spark is a much desired value add,
> as is also apparent in https://issues.apache.org/jira/browse/SPARK-15348 and 
> https://issues.apache.org/jira/browse/SPARK-16996 with multiple people looking
> for it. Currently the datasource supports reading from these ACID tables only,
> and we are working on adding the ability to write into these tables via Spark
> as well.
> 
> The datasource is also available as a spark package, and instructions on how to
> use it are available on the Github page.
> 
> We welcome your feedback and suggestions.
> 
> Thanks,
> Abhishek Somani 

-- 
nicolas

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: New Spark Datasource for Hive ACID tables

Posted by Abhishek Somani <ab...@gmail.com>.

Hey Naresh,

Thanks for your question. Yes it will work!

Thanks,
Abhishek Somani

On Fri, Jul 26, 2019 at 7:08 PM naresh Goud <na...@gmail.com>
wrote:

> Thanks Abhishek.
>
> Will it work on hive acid table which is not compacted ? i.e table having
> base and delta files?
>
> Let’s say hive acid table customer
>
> Create table customer(customer_id int, customer_name string,
> customer_email string) cluster by customer_id buckets 10 location
> ‘/test/customer’ tableproperties(transactional=true)
>
>
> And table hdfs path having below directories
>
> /test/customer/base_15234/
> /test/customer/delta_1234_456
>
>
> That means table having updates and major compaction not run.
>
> Will it spark reader works ?
>
>
> Thank you,
> Naresh
>
>
>
>
>
>
>
> On Fri, Jul 26, 2019 at 7:38 AM Abhishek Somani <
> abhisheksomani88@gmail.com> wrote:
>
>> Hi All,
>>
>> We at Qubole <https://www.qubole.com/> have open sourced a datasource
>> that will enable users to work on their Hive ACID Transactional Tables
>> <https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions>
>> using Spark.
>>
>> Github: https://github.com/qubole/spark-acid
>>
>> Hive ACID tables allow users to work on their data transactionally, and
>> also gives them the ability to Delete, Update and Merge data efficiently
>> without having to rewrite all of their data in a table, partition or file.
>> We believe that being able to work on these tables from Spark is a much
>> desired value add, as is also apparent in
>> https://issues.apache.org/jira/browse/SPARK-15348 and
>> https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
>> looking for it. Currently the datasource supports reading from these ACID
>> tables only, and we are working on adding the ability to write into these
>> tables via Spark as well.
>>
>> The datasource is also available as a spark package, and instructions on
>> how to use it are available on the Github page
>> <https://github.com/qubole/spark-acid>.
>>
>> We welcome your feedback and suggestions.
>>
>> Thanks,
>> Abhishek Somani
>>
> --
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>

Re: New Spark Datasource for Hive ACID tables

Posted by Abhishek Somani <ab...@gmail.com>.

Hey Naresh,

Thanks for your question. Yes it will work!

Thanks,
Abhishek Somani

On Fri, Jul 26, 2019 at 7:08 PM naresh Goud <na...@gmail.com>
wrote:

> Thanks Abhishek.
>
> Will it work on hive acid table which is not compacted ? i.e table having
> base and delta files?
>
> Let’s say hive acid table customer
>
> Create table customer(customer_id int, customer_name string,
> customer_email string) cluster by customer_id buckets 10 location
> ‘/test/customer’ tableproperties(transactional=true)
>
>
> And table hdfs path having below directories
>
> /test/customer/base_15234/
> /test/customer/delta_1234_456
>
>
> That means table having updates and major compaction not run.
>
> Will it spark reader works ?
>
>
> Thank you,
> Naresh
>
>
>
>
>
>
>
> On Fri, Jul 26, 2019 at 7:38 AM Abhishek Somani <
> abhisheksomani88@gmail.com> wrote:
>
>> Hi All,
>>
>> We at Qubole <https://www.qubole.com/> have open sourced a datasource
>> that will enable users to work on their Hive ACID Transactional Tables
>> <https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions>
>> using Spark.
>>
>> Github: https://github.com/qubole/spark-acid
>>
>> Hive ACID tables allow users to work on their data transactionally, and
>> also gives them the ability to Delete, Update and Merge data efficiently
>> without having to rewrite all of their data in a table, partition or file.
>> We believe that being able to work on these tables from Spark is a much
>> desired value add, as is also apparent in
>> https://issues.apache.org/jira/browse/SPARK-15348 and
>> https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
>> looking for it. Currently the datasource supports reading from these ACID
>> tables only, and we are working on adding the ability to write into these
>> tables via Spark as well.
>>
>> The datasource is also available as a spark package, and instructions on
>> how to use it are available on the Github page
>> <https://github.com/qubole/spark-acid>.
>>
>> We welcome your feedback and suggestions.
>>
>> Thanks,
>> Abhishek Somani
>>
> --
> Thanks,
> Naresh
> www.linkedin.com/in/naresh-dulam
> http://hadoopandspark.blogspot.com/
>
>

Re: New Spark Datasource for Hive ACID tables

Posted by naresh Goud <na...@gmail.com>.

Thanks Abhishek.

Will it work on hive acid table which is not compacted ? i.e table having
base and delta files?

Let’s say hive acid table customer

Create table customer(customer_id int, customer_name string, customer_email
string) cluster by customer_id buckets 10 location ‘/test/customer’
tableproperties(transactional=true)


And table hdfs path having below directories

/test/customer/base_15234/
/test/customer/delta_1234_456


That means table having updates and major compaction not run.

Will it spark reader works ?


Thank you,
Naresh







On Fri, Jul 26, 2019 at 7:38 AM Abhishek Somani <ab...@gmail.com>
wrote:

> Hi All,
>
> We at Qubole <https://www.qubole.com/> have open sourced a datasource
> that will enable users to work on their Hive ACID Transactional Tables
> <https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions>
> using Spark.
>
> Github: https://github.com/qubole/spark-acid
>
> Hive ACID tables allow users to work on their data transactionally, and
> also gives them the ability to Delete, Update and Merge data efficiently
> without having to rewrite all of their data in a table, partition or file.
> We believe that being able to work on these tables from Spark is a much
> desired value add, as is also apparent in
> https://issues.apache.org/jira/browse/SPARK-15348 and
> https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
> looking for it. Currently the datasource supports reading from these ACID
> tables only, and we are working on adding the ability to write into these
> tables via Spark as well.
>
> The datasource is also available as a spark package, and instructions on
> how to use it are available on the Github page
> <https://github.com/qubole/spark-acid>.
>
> We welcome your feedback and suggestions.
>
> Thanks,
> Abhishek Somani
>
-- 
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/

Re: New Spark Datasource for Hive ACID tables

Posted by naresh Goud <na...@gmail.com>.

Thanks Abhishek.

Will it work on hive acid table which is not compacted ? i.e table having
base and delta files?

Let’s say hive acid table customer

Create table customer(customer_id int, customer_name string, customer_email
string) cluster by customer_id buckets 10 location ‘/test/customer’
tableproperties(transactional=true)


And table hdfs path having below directories

/test/customer/base_15234/
/test/customer/delta_1234_456


That means table having updates and major compaction not run.

Will it spark reader works ?


Thank you,
Naresh







On Fri, Jul 26, 2019 at 7:38 AM Abhishek Somani <ab...@gmail.com>
wrote:

> Hi All,
>
> We at Qubole <https://www.qubole.com/> have open sourced a datasource
> that will enable users to work on their Hive ACID Transactional Tables
> <https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions>
> using Spark.
>
> Github: https://github.com/qubole/spark-acid
>
> Hive ACID tables allow users to work on their data transactionally, and
> also gives them the ability to Delete, Update and Merge data efficiently
> without having to rewrite all of their data in a table, partition or file.
> We believe that being able to work on these tables from Spark is a much
> desired value add, as is also apparent in
> https://issues.apache.org/jira/browse/SPARK-15348 and
> https://issues.apache.org/jira/browse/SPARK-16996 with multiple people
> looking for it. Currently the datasource supports reading from these ACID
> tables only, and we are working on adding the ability to write into these
> tables via Spark as well.
>
> The datasource is also available as a spark package, and instructions on
> how to use it are available on the Github page
> <https://github.com/qubole/spark-acid>.
>
> We welcome your feedback and suggestions.
>
> Thanks,
> Abhishek Somani
>
-- 
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/