You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by Vinoth Chandar <vi...@apache.org> on 2019/05/03 15:19:09 UTC

Re: Not able to find HoodieJavaApp

Hi Umesh,

Did it work?

Thanks
Vinoth

On Tue, Apr 23, 2019 at 9:33 AM Vinoth Chandar <vi...@apache.org> wrote:

> Hi Umesh,
>
> I took a pass. Moving HoodieTestDataGenerator into src/java is not a good
> idea. However, I have written up a simple demo app using the stock data
> that we already use in our dockerized demo
> https://github.com/vinothchandar/incubator-hudi/tree/quickstart
>
> Once you grab the code, build it using mvn clean install -DskipTests
> -DskipITs
> you should be able to run  spark-submit --class HoodieDemoApp --master
> local[2] hoodie-utilities/target/hoodie-utilities-0.4.6-SNAPSHOT.jar and
> get a dataset written..
>
> You can make changes and iterator as you wish..
>
> I really recommend using the dockerized setup described here. It does the
> same thing, but lets you play with the entire ecosystem.
> https://hudi.apache.org/docker_demo.html
>
> Thanks
> Vinoth
>
>
> On Mon, Apr 22, 2019 at 9:14 AM Umesh Kacha <um...@gmail.com> wrote:
>
>> Hi Vinoth thanks much. Eventual our deployment will be in AWS and we will
>> be using Hoodie spark datasource to upsert delete as of now.
>>
>> Regards,
>> Umesh
>>
>> On Mon, Apr 22, 2019 at 8:24 PM Vinoth Chandar <vi...@apache.org> wrote:
>>
>> > Hi Umesh,
>> >
>> > This is on top of my list of the week. But If you already have input
>> data
>> > somewhere on s3/hdfs, nothing stops you from trying the DeltaStreamer
>> tool
>> > or writing a simple spark job depending on hoodie-spark. Whats your
>> > eventual deployment strategy?
>> >
>> > Thanks
>> > Vinoth
>> >
>> > On Mon, Apr 22, 2019 at 6:09 AM Umesh Kacha <um...@gmail.com>
>> wrote:
>> >
>> > > Hi Vinoth can you please help with this I quickly want to try
>> > HoodieJavaApp
>> > > it seems to be partially working in my local setup with some run time
>> > > dependencies failure as mentioned in the previous email.
>> > >
>> > > On Sat, Apr 20, 2019, 10:18 AM Umesh Kacha <um...@gmail.com>
>> > wrote:
>> > >
>> > > > Thanks Vinoth yes please that would be great HoodieJavaApp moved
>> out of
>> > > > test and working.
>> > > >
>> > > > On Sat, Apr 20, 2019, 6:09 AM Vinoth Chandar <
>> > > > mail.vinoth.chandar@gmail.com> wrote:
>> > > >
>> > > >> Sorry.  Not following. If you are building your own spark job using
>> > > hudi,
>> > > >> then you just pull in hoodie-spark module
>> > > >>
>> > > >> http://hudi.apache.org/writing_data.html#datasource-writer
>> > > >>
>> > > >>
>> > > >> Spark bundle can be used with —jars option on spark-shell etc to
>> query
>> > > the
>> > > >> datasets.
>> > > >>
>> > > >> Does that help? Can you describe what you are trying to accomplish?
>> > > >>
>> > > >> Checking again, do you need a patch with the HoodieJavaApp moved
>> out
>> > of
>> > > >> tests and working?
>> > > >>
>> > > >> On Fri, Apr 19, 2019 at 12:01 PM Umesh Kacha <
>> umesh.kacha@gmail.com>
>> > > >> wrote:
>> > > >>
>> > > >> > Thanks Vinoth how do I know what all spark jars and their
>> versions I
>> > > was
>> > > >> > expecting hoodie-spark-bundle-0.4.5.jar would do that since it's
>> an
>> > > uber
>> > > >> > jar but it's not recently I found I had to add spark maven
>> > coordinates
>> > > >> > separately in pom file. Anyways if you can give me list of jars I
>> > can
>> > > >> put
>> > > >> > in a classpath and run.
>> > > >> >
>> > > >> > On Fri, Apr 19, 2019, 11:40 PM Vinoth Chandar <vinoth@apache.org
>> >
>> > > >> wrote:
>> > > >> >
>> > > >> > > Looks like a class mismatch error on Hadoop jars.. Easiest way
>> to
>> > do
>> > > >> > this,
>> > > >> > > is to pull the code into IntelliJ, add the spark jars folder to
>> > > >> module's
>> > > >> > > class path and then run the test by right clicking > run
>> > > >> > >
>> > > >> > > I can prep a patch for you if you'd like. lmk
>> > > >> > >
>> > > >> > > Thanks
>> > > >> > > Vinoth
>> > > >> > >
>> > > >> > > On Thu, Apr 18, 2019 at 8:46 AM Umesh Kacha <
>> > umesh.kacha@gmail.com>
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > > Hi Vinoth, I could manage running HoodieJavaApp in my local
>> > maven
>> > > >> > project
>> > > >> > > > there I had to copy the following classes which were used by
>> > > >> > > HoodieJavaApp.
>> > > >> > > > Inside HoodieJavaTest main I am creating object of
>> HoodieJavaApp
>> > > >> which
>> > > >> > > just
>> > > >> > > > runs with all default options.
>> > > >> > > >
>> > > >> > > > [image: image.png]
>> > > >> > > >
>> > > >> > > > However I get the following error which seems like one of the
>> > run
>> > > >> time
>> > > >> > > > dependencies missing. Please guide.
>> > > >> > > >
>> > > >> > > > Exception in thread "main"
>> > > >> > > > com.uber.hoodie.exception.HoodieUpsertException: Failed to
>> > upsert
>> > > >> for
>> > > >> > > > commit time 20190418210326
>> > > >> > > > at
>> > > >>
>> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:175)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.DataSourceUtils.doWriteOperation(DataSourceUtils.java:153)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:149)
>> > > >> > > > at
>> > > >>
>> com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
>> > > >> > > > at
>> > > >>
>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
>> > > >> > > > at
>> > > >>
>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
>> > > >> > > > at HoodieJavaApp.run(HoodieJavaApp.java:143)
>> > > >> > > > at HoodieJavaApp.main(HoodieJavaApp.java:67)
>> > > >> > > > Caused by: org.apache.spark.SparkException: Job aborted due
>> to
>> > > stage
>> > > >> > > > failure: Task 0 in stage 27.0 failed 1 times, most recent
>> > failure:
>> > > >> Lost
>> > > >> > > > task 0.0 in stage 27.0 (TID 49, localhost, executor driver):
>> > > >> > > > java.lang.RuntimeException:
>> > > >> > > com.uber.hoodie.exception.HoodieIndexException:
>> > > >> > > > Error checking bloom filter index.
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
>> > > >> > > > at
>> > scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>> > > >> > > > at
>> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>> > > >> > > > at
>> > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
>> > > >> > > > at
>> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>> > > >> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99)
>> > > >> > > > at
>> > > >> >
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> > > >> > > > at java.lang.Thread.run(Thread.java:745)
>> > > >> > > > Caused by: com.uber.hoodie.exception.HoodieIndexException:
>> Error
>> > > >> > checking
>> > > >> > > > bloom filter index.
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
>> > > >> > > > ... 13 more
>> > > >> > > > Caused by: java.lang.NoSuchMethodError:
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166)
>> > > >> > > > ... 15 more
>> > > >> > > >
>> > > >> > > > Driver stacktrace:
>> > > >> > > > at org.apache.spark.scheduler.DAGScheduler.org
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> > > >> > > > at
>> > > >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>> > > >> > > > at scala.Option.foreach(Option.scala:257)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
>> > > >> > > > at
>> > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>> > > >> > > > at
>> > > >> >
>> > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
>> > > >> > > > at
>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
>> > > >> > > > at
>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
>> > > >> > > > at
>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
>> > > >> > > > at
>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
>> > > >> > > > at
>> > > org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>> > > >> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>> > > >> > > > at org.apache.spark.rdd.RDD.collect(RDD.scala:934)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>> > > >> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:374)
>> > > >> > > > at
>> > > >> > >
>> > > >>
>> > org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:312)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.table.WorkloadProfile.buildProfile(WorkloadProfile.java:64)
>> > > >> > > > at
>> > > >> >
>> > com.uber.hoodie.table.WorkloadProfile.<init>(WorkloadProfile.java:56)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:428)
>> > > >> > > > at
>> > > >>
>> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:170)
>> > > >> > > > ... 8 more
>> > > >> > > > Caused by: java.lang.RuntimeException:
>> > > >> > > > com.uber.hoodie.exception.HoodieIndexException: Error
>> checking
>> > > bloom
>> > > >> > > filter
>> > > >> > > > index.
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
>> > > >> > > > at
>> > scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>> > > >> > > > at
>> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>> > > >> > > > at
>> > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
>> > > >> > > > at
>> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>> > > >> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99)
>> > > >> > > > at
>> > > >> >
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> > > >> > > > at java.lang.Thread.run(Thread.java:745)
>> > > >> > > > Caused by: com.uber.hoodie.exception.HoodieIndexException:
>> Error
>> > > >> > checking
>> > > >> > > > bloom filter index.
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
>> > > >> > > > ... 13 more
>> > > >> > > > Caused by: java.lang.NoSuchMethodError:
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68)
>> > > >> > > > at
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166)
>> > > >> > > > ... 15 more
>> > > >> > > >
>> > > >> > > > On Thu, Apr 18, 2019 at 7:53 PM Vinoth Chandar <
>> > vinoth@apache.org
>> > > >
>> > > >> > > wrote:
>> > > >> > > >
>> > > >> > > >> Hi Umesh,
>> > > >> > > >>
>> > > >> > > >> IIUC, your suggestion is without the need to checkout/build
>> > > source
>> > > >> > code,
>> > > >> > > >> one should be able to run the sample app? That does seem
>> fair
>> > to
>> > > >> me.
>> > > >> > We
>> > > >> > > >> had to move test data generator out of tests to place this
>> > under
>> > > >> > source
>> > > >> > > >> code.
>> > > >> > > >>
>> > > >> > > >> I am hoping something like hoodie-bench could be a more
>> > > >> comprehensive
>> > > >> > > >> replacement for this mid term.
>> > > >> > > >> https://github.com/apache/incubator-hudi/pull/623/files
>> > > Thoughts?
>> > > >> > > >>
>> > > >> > > >> But, in the short term, let us know if it becomes too
>> > cumbersome
>> > > >> for
>> > > >> > you
>> > > >> > > >> to
>> > > >> > > >> try out HoodieJavaApp.
>> > > >> > > >>
>> > > >> > > >> Thanks
>> > > >> > > >> Vinoth
>> > > >> > > >>
>> > > >> > > >> On Thu, Apr 18, 2019 at 6:00 AM Umesh Kacha <
>> > > umesh.kacha@gmail.com
>> > > >> >
>> > > >> > > >> wrote:
>> > > >> > > >>
>> > > >> > > >> > I can see there is a todo do what I suggested,
>> > > >> > > >> >
>> > > >> > > >> > #TODO - Need to move TestDataGenerator and HoodieJavaApp
>> out
>> > of
>> > > >> > tests
>> > > >> > > >> >
>> > > >> > > >> > On Thu, Apr 18, 2019 at 2:23 PM Umesh Kacha <
>> > > >> umesh.kacha@gmail.com>
>> > > >> > > >> wrote:
>> > > >> > > >> >
>> > > >> > > >> > > Ok this useful class should have been part of utility
>> and
>> > > >> should
>> > > >> > be
>> > > >> > > >> able
>> > > >> > > >> > > to run out of the box as IMHO developer need not
>> > necessarily
>> > > >> build
>> > > >> > > >> > project.
>> > > >> > > >> > > I tried to create a maven project where I kept
>> > > >> hoodie-spark-bundle
>> > > >> > > as
>> > > >> > > >> > > dependency and copied HoodieJavaApp and
>> DataSourceTestUtils
>> > > >> class
>> > > >> > > >> into it
>> > > >> > > >> > > but it does not compile. I have bee told here that
>> > > >> > > >> hoodie-spark-bundle is
>> > > >> > > >> > > uber jar but I doubt it is.
>> > > >> > > >> > >
>> > > >> > > >> > > On Thu, Apr 18, 2019 at 1:44 PM Jing Chen <
>> > > >> milantracy@gmail.com>
>> > > >> > > >> wrote:
>> > > >> > > >> > >
>> > > >> > > >> > >> Hi Umesh,
>> > > >> > > >> > >> I believe *HoodieJavaApp *is a test class under
>> > > >> *hoodie-spark.*
>> > > >> > > >> > >> AFAIK, test classes are not supposed to be included in
>> the
>> > > >> > > artifact.
>> > > >> > > >> > >> However, if you want to build an artifact where you
>> have
>> > > >> access
>> > > >> > to
>> > > >> > > >> test
>> > > >> > > >> > >> classes, you would build from source code.
>> > > >> > > >> > >> Once you build the hoodie project, you are able to
>> find a
>> > > test
>> > > >> > jar
>> > > >> > > >> that
>> > > >> > > >> > >> includes *HoodieJavaApp *under
>> > > >> > > >> > >>
>> > > *hoodie-spark/target/hoodie-spark-0.4.5-SNAPSHOT-tests.jar**.*
>> > > >> > > >> > >>
>> > > >> > > >> > >> Thanks
>> > > >> > > >> > >> Jing
>> > > >> > > >> > >>
>> > > >> > > >> > >> On Wed, Apr 17, 2019 at 11:10 PM Umesh Kacha <
>> > > >> > > umesh.kacha@gmail.com>
>> > > >> > > >> > >> wrote:
>> > > >> > > >> > >>
>> > > >> > > >> > >> > Hi I am not able to import class HoodieJavaApp using
>> any
>> > > of
>> > > >> the
>> > > >> > > >> maven
>> > > >> > > >> > >> jars.
>> > > >> > > >> > >> > I tried hooodie-spark-bundle and hoodie-spark both.
>> It
>> > > >> simply
>> > > >> > > does
>> > > >> > > >> not
>> > > >> > > >> > >> find
>> > > >> > > >> > >> > this class. I am using 0.4.5. Please guide.
>> > > >> > > >> > >> >
>> > > >> > > >> > >> > Regards,
>> > > >> > > >> > >> > Umesh
>> > > >> > > >> > >> >
>> > > >> > > >> > >>
>> > > >> > > >> > >
>> > > >> > > >> >
>> > > >> > > >>
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: Not able to find HoodieJavaApp

Posted by Vinoth Chandar <vi...@apache.org>.

Cool.

On Fri, May 3, 2019 at 12:27 PM Umesh Kacha <um...@gmail.com> wrote:

> Hi Vinoth thanks I will come to this after sometime right now priority has
> changed.
>
> Regards,
> Umesh
>
> On Fri, May 3, 2019, 8:49 PM Vinoth Chandar <vi...@apache.org> wrote:
>
> > Hi Umesh,
> >
> > Did it work?
> >
> > Thanks
> > Vinoth
> >
> > On Tue, Apr 23, 2019 at 9:33 AM Vinoth Chandar <vi...@apache.org>
> wrote:
> >
> > > Hi Umesh,
> > >
> > > I took a pass. Moving HoodieTestDataGenerator into src/java is not a
> good
> > > idea. However, I have written up a simple demo app using the stock data
> > > that we already use in our dockerized demo
> > > https://github.com/vinothchandar/incubator-hudi/tree/quickstart
> > >
> > > Once you grab the code, build it using mvn clean install -DskipTests
> > > -DskipITs
> > > you should be able to run  spark-submit --class HoodieDemoApp --master
> > > local[2] hoodie-utilities/target/hoodie-utilities-0.4.6-SNAPSHOT.jar
> and
> > > get a dataset written..
> > >
> > > You can make changes and iterator as you wish..
> > >
> > > I really recommend using the dockerized setup described here. It does
> the
> > > same thing, but lets you play with the entire ecosystem.
> > > https://hudi.apache.org/docker_demo.html
> > >
> > > Thanks
> > > Vinoth
> > >
> > >
> > > On Mon, Apr 22, 2019 at 9:14 AM Umesh Kacha <um...@gmail.com>
> > wrote:
> > >
> > >> Hi Vinoth thanks much. Eventual our deployment will be in AWS and we
> > will
> > >> be using Hoodie spark datasource to upsert delete as of now.
> > >>
> > >> Regards,
> > >> Umesh
> > >>
> > >> On Mon, Apr 22, 2019 at 8:24 PM Vinoth Chandar <vi...@apache.org>
> > wrote:
> > >>
> > >> > Hi Umesh,
> > >> >
> > >> > This is on top of my list of the week. But If you already have input
> > >> data
> > >> > somewhere on s3/hdfs, nothing stops you from trying the
> DeltaStreamer
> > >> tool
> > >> > or writing a simple spark job depending on hoodie-spark. Whats your
> > >> > eventual deployment strategy?
> > >> >
> > >> > Thanks
> > >> > Vinoth
> > >> >
> > >> > On Mon, Apr 22, 2019 at 6:09 AM Umesh Kacha <um...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hi Vinoth can you please help with this I quickly want to try
> > >> > HoodieJavaApp
> > >> > > it seems to be partially working in my local setup with some run
> > time
> > >> > > dependencies failure as mentioned in the previous email.
> > >> > >
> > >> > > On Sat, Apr 20, 2019, 10:18 AM Umesh Kacha <umesh.kacha@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Thanks Vinoth yes please that would be great HoodieJavaApp moved
> > >> out of
> > >> > > > test and working.
> > >> > > >
> > >> > > > On Sat, Apr 20, 2019, 6:09 AM Vinoth Chandar <
> > >> > > > mail.vinoth.chandar@gmail.com> wrote:
> > >> > > >
> > >> > > >> Sorry.  Not following. If you are building your own spark job
> > using
> > >> > > hudi,
> > >> > > >> then you just pull in hoodie-spark module
> > >> > > >>
> > >> > > >> http://hudi.apache.org/writing_data.html#datasource-writer
> > >> > > >>
> > >> > > >>
> > >> > > >> Spark bundle can be used with —jars option on spark-shell etc
> to
> > >> query
> > >> > > the
> > >> > > >> datasets.
> > >> > > >>
> > >> > > >> Does that help? Can you describe what you are trying to
> > accomplish?
> > >> > > >>
> > >> > > >> Checking again, do you need a patch with the HoodieJavaApp
> moved
> > >> out
> > >> > of
> > >> > > >> tests and working?
> > >> > > >>
> > >> > > >> On Fri, Apr 19, 2019 at 12:01 PM Umesh Kacha <
> > >> umesh.kacha@gmail.com>
> > >> > > >> wrote:
> > >> > > >>
> > >> > > >> > Thanks Vinoth how do I know what all spark jars and their
> > >> versions I
> > >> > > was
> > >> > > >> > expecting hoodie-spark-bundle-0.4.5.jar would do that since
> > it's
> > >> an
> > >> > > uber
> > >> > > >> > jar but it's not recently I found I had to add spark maven
> > >> > coordinates
> > >> > > >> > separately in pom file. Anyways if you can give me list of
> > jars I
> > >> > can
> > >> > > >> put
> > >> > > >> > in a classpath and run.
> > >> > > >> >
> > >> > > >> > On Fri, Apr 19, 2019, 11:40 PM Vinoth Chandar <
> > vinoth@apache.org
> > >> >
> > >> > > >> wrote:
> > >> > > >> >
> > >> > > >> > > Looks like a class mismatch error on Hadoop jars.. Easiest
> > way
> > >> to
> > >> > do
> > >> > > >> > this,
> > >> > > >> > > is to pull the code into IntelliJ, add the spark jars
> folder
> > to
> > >> > > >> module's
> > >> > > >> > > class path and then run the test by right clicking > run
> > >> > > >> > >
> > >> > > >> > > I can prep a patch for you if you'd like. lmk
> > >> > > >> > >
> > >> > > >> > > Thanks
> > >> > > >> > > Vinoth
> > >> > > >> > >
> > >> > > >> > > On Thu, Apr 18, 2019 at 8:46 AM Umesh Kacha <
> > >> > umesh.kacha@gmail.com>
> > >> > > >> > wrote:
> > >> > > >> > >
> > >> > > >> > > > Hi Vinoth, I could manage running HoodieJavaApp in my
> local
> > >> > maven
> > >> > > >> > project
> > >> > > >> > > > there I had to copy the following classes which were used
> > by
> > >> > > >> > > HoodieJavaApp.
> > >> > > >> > > > Inside HoodieJavaTest main I am creating object of
> > >> HoodieJavaApp
> > >> > > >> which
> > >> > > >> > > just
> > >> > > >> > > > runs with all default options.
> > >> > > >> > > >
> > >> > > >> > > > [image: image.png]
> > >> > > >> > > >
> > >> > > >> > > > However I get the following error which seems like one of
> > the
> > >> > run
> > >> > > >> time
> > >> > > >> > > > dependencies missing. Please guide.
> > >> > > >> > > >
> > >> > > >> > > > Exception in thread "main"
> > >> > > >> > > > com.uber.hoodie.exception.HoodieUpsertException: Failed
> to
> > >> > upsert
> > >> > > >> for
> > >> > > >> > > > commit time 20190418210326
> > >> > > >> > > > at
> > >> > > >>
> > >> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:175)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.DataSourceUtils.doWriteOperation(DataSourceUtils.java:153)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:149)
> > >> > > >> > > > at
> > >> > > >>
> > >> com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
> > >> > > >> > > > at
> > >> > > >>
> > >> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
> > >> > > >> > > > at
> > >> > > >>
> > >> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
> > >> > > >> > > > at HoodieJavaApp.run(HoodieJavaApp.java:143)
> > >> > > >> > > > at HoodieJavaApp.main(HoodieJavaApp.java:67)
> > >> > > >> > > > Caused by: org.apache.spark.SparkException: Job aborted
> due
> > >> to
> > >> > > stage
> > >> > > >> > > > failure: Task 0 in stage 27.0 failed 1 times, most recent
> > >> > failure:
> > >> > > >> Lost
> > >> > > >> > > > task 0.0 in stage 27.0 (TID 49, localhost, executor
> > driver):
> > >> > > >> > > > java.lang.RuntimeException:
> > >> > > >> > > com.uber.hoodie.exception.HoodieIndexException:
> > >> > > >> > > > Error checking bloom filter index.
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
> > >> > > >> > > > at
> > >> > scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> > >> > > >> > > > at
> > >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> > >> > > >> > > > at
> > >> > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
> > >> > > >> > > > at
> > >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> > >> > > >> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99)
> > >> > > >> > > > at
> > >> > > >> >
> > >> >
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > >> > > >> > > > at java.lang.Thread.run(Thread.java:745)
> > >> > > >> > > > Caused by:
> com.uber.hoodie.exception.HoodieIndexException:
> > >> Error
> > >> > > >> > checking
> > >> > > >> > > > bloom filter index.
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
> > >> > > >> > > > ... 13 more
> > >> > > >> > > > Caused by: java.lang.NoSuchMethodError:
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166)
> > >> > > >> > > > ... 15 more
> > >> > > >> > > >
> > >> > > >> > > > Driver stacktrace:
> > >> > > >> > > > at org.apache.spark.scheduler.DAGScheduler.org
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> > >> > > >> > > > at
> > >> > > >>
> > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> > >> > > >> > > > at scala.Option.foreach(Option.scala:257)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
> > >> > > >> > > > at
> > >> > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> > >> > > >> > > > at
> > >> > > >> >
> > >> >
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
> > >> > > >> > > > at
> > >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
> > >> > > >> > > > at
> > >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
> > >> > > >> > > > at
> > >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
> > >> > > >> > > > at
> > >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
> > >> > > >> > > > at
> > >> > > org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> > >> > > >> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> > >> > > >> > > > at org.apache.spark.rdd.RDD.collect(RDD.scala:934)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> > >> > > >> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:374)
> > >> > > >> > > > at
> > >> > > >> > >
> > >> > > >>
> > >> >
> > org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:312)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.table.WorkloadProfile.buildProfile(WorkloadProfile.java:64)
> > >> > > >> > > > at
> > >> > > >> >
> > >> >
> com.uber.hoodie.table.WorkloadProfile.<init>(WorkloadProfile.java:56)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:428)
> > >> > > >> > > > at
> > >> > > >>
> > >> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:170)
> > >> > > >> > > > ... 8 more
> > >> > > >> > > > Caused by: java.lang.RuntimeException:
> > >> > > >> > > > com.uber.hoodie.exception.HoodieIndexException: Error
> > >> checking
> > >> > > bloom
> > >> > > >> > > filter
> > >> > > >> > > > index.
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
> > >> > > >> > > > at
> > >> > scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> > >> > > >> > > > at
> > >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> > >> > > >> > > > at
> > >> > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
> > >> > > >> > > > at
> > >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> > >> > > >> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99)
> > >> > > >> > > > at
> > >> > > >> >
> > >> >
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > >> > > >> > > > at java.lang.Thread.run(Thread.java:745)
> > >> > > >> > > > Caused by:
> com.uber.hoodie.exception.HoodieIndexException:
> > >> Error
> > >> > > >> > checking
> > >> > > >> > > > bloom filter index.
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
> > >> > > >> > > > ... 13 more
> > >> > > >> > > > Caused by: java.lang.NoSuchMethodError:
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68)
> > >> > > >> > > > at
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166)
> > >> > > >> > > > ... 15 more
> > >> > > >> > > >
> > >> > > >> > > > On Thu, Apr 18, 2019 at 7:53 PM Vinoth Chandar <
> > >> > vinoth@apache.org
> > >> > > >
> > >> > > >> > > wrote:
> > >> > > >> > > >
> > >> > > >> > > >> Hi Umesh,
> > >> > > >> > > >>
> > >> > > >> > > >> IIUC, your suggestion is without the need to
> > checkout/build
> > >> > > source
> > >> > > >> > code,
> > >> > > >> > > >> one should be able to run the sample app? That does seem
> > >> fair
> > >> > to
> > >> > > >> me.
> > >> > > >> > We
> > >> > > >> > > >> had to move test data generator out of tests to place
> this
> > >> > under
> > >> > > >> > source
> > >> > > >> > > >> code.
> > >> > > >> > > >>
> > >> > > >> > > >> I am hoping something like hoodie-bench could be a more
> > >> > > >> comprehensive
> > >> > > >> > > >> replacement for this mid term.
> > >> > > >> > > >> https://github.com/apache/incubator-hudi/pull/623/files
> > >> > > Thoughts?
> > >> > > >> > > >>
> > >> > > >> > > >> But, in the short term, let us know if it becomes too
> > >> > cumbersome
> > >> > > >> for
> > >> > > >> > you
> > >> > > >> > > >> to
> > >> > > >> > > >> try out HoodieJavaApp.
> > >> > > >> > > >>
> > >> > > >> > > >> Thanks
> > >> > > >> > > >> Vinoth
> > >> > > >> > > >>
> > >> > > >> > > >> On Thu, Apr 18, 2019 at 6:00 AM Umesh Kacha <
> > >> > > umesh.kacha@gmail.com
> > >> > > >> >
> > >> > > >> > > >> wrote:
> > >> > > >> > > >>
> > >> > > >> > > >> > I can see there is a todo do what I suggested,
> > >> > > >> > > >> >
> > >> > > >> > > >> > #TODO - Need to move TestDataGenerator and
> HoodieJavaApp
> > >> out
> > >> > of
> > >> > > >> > tests
> > >> > > >> > > >> >
> > >> > > >> > > >> > On Thu, Apr 18, 2019 at 2:23 PM Umesh Kacha <
> > >> > > >> umesh.kacha@gmail.com>
> > >> > > >> > > >> wrote:
> > >> > > >> > > >> >
> > >> > > >> > > >> > > Ok this useful class should have been part of
> utility
> > >> and
> > >> > > >> should
> > >> > > >> > be
> > >> > > >> > > >> able
> > >> > > >> > > >> > > to run out of the box as IMHO developer need not
> > >> > necessarily
> > >> > > >> build
> > >> > > >> > > >> > project.
> > >> > > >> > > >> > > I tried to create a maven project where I kept
> > >> > > >> hoodie-spark-bundle
> > >> > > >> > > as
> > >> > > >> > > >> > > dependency and copied HoodieJavaApp and
> > >> DataSourceTestUtils
> > >> > > >> class
> > >> > > >> > > >> into it
> > >> > > >> > > >> > > but it does not compile. I have bee told here that
> > >> > > >> > > >> hoodie-spark-bundle is
> > >> > > >> > > >> > > uber jar but I doubt it is.
> > >> > > >> > > >> > >
> > >> > > >> > > >> > > On Thu, Apr 18, 2019 at 1:44 PM Jing Chen <
> > >> > > >> milantracy@gmail.com>
> > >> > > >> > > >> wrote:
> > >> > > >> > > >> > >
> > >> > > >> > > >> > >> Hi Umesh,
> > >> > > >> > > >> > >> I believe *HoodieJavaApp *is a test class under
> > >> > > >> *hoodie-spark.*
> > >> > > >> > > >> > >> AFAIK, test classes are not supposed to be included
> > in
> > >> the
> > >> > > >> > > artifact.
> > >> > > >> > > >> > >> However, if you want to build an artifact where you
> > >> have
> > >> > > >> access
> > >> > > >> > to
> > >> > > >> > > >> test
> > >> > > >> > > >> > >> classes, you would build from source code.
> > >> > > >> > > >> > >> Once you build the hoodie project, you are able to
> > >> find a
> > >> > > test
> > >> > > >> > jar
> > >> > > >> > > >> that
> > >> > > >> > > >> > >> includes *HoodieJavaApp *under
> > >> > > >> > > >> > >>
> > >> > > *hoodie-spark/target/hoodie-spark-0.4.5-SNAPSHOT-tests.jar**.*
> > >> > > >> > > >> > >>
> > >> > > >> > > >> > >> Thanks
> > >> > > >> > > >> > >> Jing
> > >> > > >> > > >> > >>
> > >> > > >> > > >> > >> On Wed, Apr 17, 2019 at 11:10 PM Umesh Kacha <
> > >> > > >> > > umesh.kacha@gmail.com>
> > >> > > >> > > >> > >> wrote:
> > >> > > >> > > >> > >>
> > >> > > >> > > >> > >> > Hi I am not able to import class HoodieJavaApp
> > using
> > >> any
> > >> > > of
> > >> > > >> the
> > >> > > >> > > >> maven
> > >> > > >> > > >> > >> jars.
> > >> > > >> > > >> > >> > I tried hooodie-spark-bundle and hoodie-spark
> both.
> > >> It
> > >> > > >> simply
> > >> > > >> > > does
> > >> > > >> > > >> not
> > >> > > >> > > >> > >> find
> > >> > > >> > > >> > >> > this class. I am using 0.4.5. Please guide.
> > >> > > >> > > >> > >> >
> > >> > > >> > > >> > >> > Regards,
> > >> > > >> > > >> > >> > Umesh
> > >> > > >> > > >> > >> >
> > >> > > >> > > >> > >>
> > >> > > >> > > >> > >
> > >> > > >> > > >> >
> > >> > > >> > > >>
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Not able to find HoodieJavaApp

Posted by Umesh Kacha <um...@gmail.com>.

Hi Vinoth thanks I will come to this after sometime right now priority has
changed.

Regards,
Umesh

On Fri, May 3, 2019, 8:49 PM Vinoth Chandar <vi...@apache.org> wrote:

> Hi Umesh,
>
> Did it work?
>
> Thanks
> Vinoth
>
> On Tue, Apr 23, 2019 at 9:33 AM Vinoth Chandar <vi...@apache.org> wrote:
>
> > Hi Umesh,
> >
> > I took a pass. Moving HoodieTestDataGenerator into src/java is not a good
> > idea. However, I have written up a simple demo app using the stock data
> > that we already use in our dockerized demo
> > https://github.com/vinothchandar/incubator-hudi/tree/quickstart
> >
> > Once you grab the code, build it using mvn clean install -DskipTests
> > -DskipITs
> > you should be able to run  spark-submit --class HoodieDemoApp --master
> > local[2] hoodie-utilities/target/hoodie-utilities-0.4.6-SNAPSHOT.jar and
> > get a dataset written..
> >
> > You can make changes and iterator as you wish..
> >
> > I really recommend using the dockerized setup described here. It does the
> > same thing, but lets you play with the entire ecosystem.
> > https://hudi.apache.org/docker_demo.html
> >
> > Thanks
> > Vinoth
> >
> >
> > On Mon, Apr 22, 2019 at 9:14 AM Umesh Kacha <um...@gmail.com>
> wrote:
> >
> >> Hi Vinoth thanks much. Eventual our deployment will be in AWS and we
> will
> >> be using Hoodie spark datasource to upsert delete as of now.
> >>
> >> Regards,
> >> Umesh
> >>
> >> On Mon, Apr 22, 2019 at 8:24 PM Vinoth Chandar <vi...@apache.org>
> wrote:
> >>
> >> > Hi Umesh,
> >> >
> >> > This is on top of my list of the week. But If you already have input
> >> data
> >> > somewhere on s3/hdfs, nothing stops you from trying the DeltaStreamer
> >> tool
> >> > or writing a simple spark job depending on hoodie-spark. Whats your
> >> > eventual deployment strategy?
> >> >
> >> > Thanks
> >> > Vinoth
> >> >
> >> > On Mon, Apr 22, 2019 at 6:09 AM Umesh Kacha <um...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi Vinoth can you please help with this I quickly want to try
> >> > HoodieJavaApp
> >> > > it seems to be partially working in my local setup with some run
> time
> >> > > dependencies failure as mentioned in the previous email.
> >> > >
> >> > > On Sat, Apr 20, 2019, 10:18 AM Umesh Kacha <um...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Thanks Vinoth yes please that would be great HoodieJavaApp moved
> >> out of
> >> > > > test and working.
> >> > > >
> >> > > > On Sat, Apr 20, 2019, 6:09 AM Vinoth Chandar <
> >> > > > mail.vinoth.chandar@gmail.com> wrote:
> >> > > >
> >> > > >> Sorry.  Not following. If you are building your own spark job
> using
> >> > > hudi,
> >> > > >> then you just pull in hoodie-spark module
> >> > > >>
> >> > > >> http://hudi.apache.org/writing_data.html#datasource-writer
> >> > > >>
> >> > > >>
> >> > > >> Spark bundle can be used with —jars option on spark-shell etc to
> >> query
> >> > > the
> >> > > >> datasets.
> >> > > >>
> >> > > >> Does that help? Can you describe what you are trying to
> accomplish?
> >> > > >>
> >> > > >> Checking again, do you need a patch with the HoodieJavaApp moved
> >> out
> >> > of
> >> > > >> tests and working?
> >> > > >>
> >> > > >> On Fri, Apr 19, 2019 at 12:01 PM Umesh Kacha <
> >> umesh.kacha@gmail.com>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Thanks Vinoth how do I know what all spark jars and their
> >> versions I
> >> > > was
> >> > > >> > expecting hoodie-spark-bundle-0.4.5.jar would do that since
> it's
> >> an
> >> > > uber
> >> > > >> > jar but it's not recently I found I had to add spark maven
> >> > coordinates
> >> > > >> > separately in pom file. Anyways if you can give me list of
> jars I
> >> > can
> >> > > >> put
> >> > > >> > in a classpath and run.
> >> > > >> >
> >> > > >> > On Fri, Apr 19, 2019, 11:40 PM Vinoth Chandar <
> vinoth@apache.org
> >> >
> >> > > >> wrote:
> >> > > >> >
> >> > > >> > > Looks like a class mismatch error on Hadoop jars.. Easiest
> way
> >> to
> >> > do
> >> > > >> > this,
> >> > > >> > > is to pull the code into IntelliJ, add the spark jars folder
> to
> >> > > >> module's
> >> > > >> > > class path and then run the test by right clicking > run
> >> > > >> > >
> >> > > >> > > I can prep a patch for you if you'd like. lmk
> >> > > >> > >
> >> > > >> > > Thanks
> >> > > >> > > Vinoth
> >> > > >> > >
> >> > > >> > > On Thu, Apr 18, 2019 at 8:46 AM Umesh Kacha <
> >> > umesh.kacha@gmail.com>
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > > Hi Vinoth, I could manage running HoodieJavaApp in my local
> >> > maven
> >> > > >> > project
> >> > > >> > > > there I had to copy the following classes which were used
> by
> >> > > >> > > HoodieJavaApp.
> >> > > >> > > > Inside HoodieJavaTest main I am creating object of
> >> HoodieJavaApp
> >> > > >> which
> >> > > >> > > just
> >> > > >> > > > runs with all default options.
> >> > > >> > > >
> >> > > >> > > > [image: image.png]
> >> > > >> > > >
> >> > > >> > > > However I get the following error which seems like one of
> the
> >> > run
> >> > > >> time
> >> > > >> > > > dependencies missing. Please guide.
> >> > > >> > > >
> >> > > >> > > > Exception in thread "main"
> >> > > >> > > > com.uber.hoodie.exception.HoodieUpsertException: Failed to
> >> > upsert
> >> > > >> for
> >> > > >> > > > commit time 20190418210326
> >> > > >> > > > at
> >> > > >>
> >> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:175)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.DataSourceUtils.doWriteOperation(DataSourceUtils.java:153)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:149)
> >> > > >> > > > at
> >> > > >>
> >> com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
> >> > > >> > > > at
> >> > > >>
> >> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
> >> > > >> > > > at
> >> > > >>
> >> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
> >> > > >> > > > at HoodieJavaApp.run(HoodieJavaApp.java:143)
> >> > > >> > > > at HoodieJavaApp.main(HoodieJavaApp.java:67)
> >> > > >> > > > Caused by: org.apache.spark.SparkException: Job aborted due
> >> to
> >> > > stage
> >> > > >> > > > failure: Task 0 in stage 27.0 failed 1 times, most recent
> >> > failure:
> >> > > >> Lost
> >> > > >> > > > task 0.0 in stage 27.0 (TID 49, localhost, executor
> driver):
> >> > > >> > > > java.lang.RuntimeException:
> >> > > >> > > com.uber.hoodie.exception.HoodieIndexException:
> >> > > >> > > > Error checking bloom filter index.
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
> >> > > >> > > > at
> >> > scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> >> > > >> > > > at
> >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> >> > > >> > > > at
> >> > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
> >> > > >> > > > at
> >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> >> > > >> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99)
> >> > > >> > > > at
> >> > > >> >
> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >> > > >> > > > at java.lang.Thread.run(Thread.java:745)
> >> > > >> > > > Caused by: com.uber.hoodie.exception.HoodieIndexException:
> >> Error
> >> > > >> > checking
> >> > > >> > > > bloom filter index.
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
> >> > > >> > > > ... 13 more
> >> > > >> > > > Caused by: java.lang.NoSuchMethodError:
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166)
> >> > > >> > > > ... 15 more
> >> > > >> > > >
> >> > > >> > > > Driver stacktrace:
> >> > > >> > > > at org.apache.spark.scheduler.DAGScheduler.org
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >> > > >> > > > at
> >> > > >>
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> >> > > >> > > > at scala.Option.foreach(Option.scala:257)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
> >> > > >> > > > at
> >> > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> >> > > >> > > > at
> >> > > >> >
> >> > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
> >> > > >> > > > at
> >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
> >> > > >> > > > at
> >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
> >> > > >> > > > at
> >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
> >> > > >> > > > at
> >> org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
> >> > > >> > > > at
> >> > > org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> >> > > >> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> >> > > >> > > > at org.apache.spark.rdd.RDD.collect(RDD.scala:934)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$countByKey$1.apply(PairRDDFunctions.scala:375)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> >> > > >> > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:374)
> >> > > >> > > > at
> >> > > >> > >
> >> > > >>
> >> >
> org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:312)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.table.WorkloadProfile.buildProfile(WorkloadProfile.java:64)
> >> > > >> > > > at
> >> > > >> >
> >> > com.uber.hoodie.table.WorkloadProfile.<init>(WorkloadProfile.java:56)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.HoodieWriteClient.upsertRecordsInternal(HoodieWriteClient.java:428)
> >> > > >> > > > at
> >> > > >>
> >> com.uber.hoodie.HoodieWriteClient.upsert(HoodieWriteClient.java:170)
> >> > > >> > > > ... 8 more
> >> > > >> > > > Caused by: java.lang.RuntimeException:
> >> > > >> > > > com.uber.hoodie.exception.HoodieIndexException: Error
> >> checking
> >> > > bloom
> >> > > >> > > filter
> >> > > >> > > > index.
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
> >> > > >> > > > at
> >> > scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> >> > > >> > > > at
> >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> >> > > >> > > > at
> >> > scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
> >> > > >> > > > at
> >> > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> >> > > >> > > > at org.apache.spark.scheduler.Task.run(Task.scala:99)
> >> > > >> > > > at
> >> > > >> >
> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >> > > >> > > > at java.lang.Thread.run(Thread.java:745)
> >> > > >> > > > Caused by: com.uber.hoodie.exception.HoodieIndexException:
> >> Error
> >> > > >> > checking
> >> > > >> > > > bloom filter index.
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:196)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:90)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
> >> > > >> > > > ... 13 more
> >> > > >> > > > Caused by: java.lang.NoSuchMethodError:
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.hadoop.conf.Configuration.addResource(Lorg/apache/hadoop/conf/Configuration;)V
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.common.util.ParquetUtils.filterParquetRowKeys(ParquetUtils.java:79)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction.checkCandidatesAgainstFile(HoodieBloomIndexCheckFunction.java:68)
> >> > > >> > > > at
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> com.uber.hoodie.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:166)
> >> > > >> > > > ... 15 more
> >> > > >> > > >
> >> > > >> > > > On Thu, Apr 18, 2019 at 7:53 PM Vinoth Chandar <
> >> > vinoth@apache.org
> >> > > >
> >> > > >> > > wrote:
> >> > > >> > > >
> >> > > >> > > >> Hi Umesh,
> >> > > >> > > >>
> >> > > >> > > >> IIUC, your suggestion is without the need to
> checkout/build
> >> > > source
> >> > > >> > code,
> >> > > >> > > >> one should be able to run the sample app? That does seem
> >> fair
> >> > to
> >> > > >> me.
> >> > > >> > We
> >> > > >> > > >> had to move test data generator out of tests to place this
> >> > under
> >> > > >> > source
> >> > > >> > > >> code.
> >> > > >> > > >>
> >> > > >> > > >> I am hoping something like hoodie-bench could be a more
> >> > > >> comprehensive
> >> > > >> > > >> replacement for this mid term.
> >> > > >> > > >> https://github.com/apache/incubator-hudi/pull/623/files
> >> > > Thoughts?
> >> > > >> > > >>
> >> > > >> > > >> But, in the short term, let us know if it becomes too
> >> > cumbersome
> >> > > >> for
> >> > > >> > you
> >> > > >> > > >> to
> >> > > >> > > >> try out HoodieJavaApp.
> >> > > >> > > >>
> >> > > >> > > >> Thanks
> >> > > >> > > >> Vinoth
> >> > > >> > > >>
> >> > > >> > > >> On Thu, Apr 18, 2019 at 6:00 AM Umesh Kacha <
> >> > > umesh.kacha@gmail.com
> >> > > >> >
> >> > > >> > > >> wrote:
> >> > > >> > > >>
> >> > > >> > > >> > I can see there is a todo do what I suggested,
> >> > > >> > > >> >
> >> > > >> > > >> > #TODO - Need to move TestDataGenerator and HoodieJavaApp
> >> out
> >> > of
> >> > > >> > tests
> >> > > >> > > >> >
> >> > > >> > > >> > On Thu, Apr 18, 2019 at 2:23 PM Umesh Kacha <
> >> > > >> umesh.kacha@gmail.com>
> >> > > >> > > >> wrote:
> >> > > >> > > >> >
> >> > > >> > > >> > > Ok this useful class should have been part of utility
> >> and
> >> > > >> should
> >> > > >> > be
> >> > > >> > > >> able
> >> > > >> > > >> > > to run out of the box as IMHO developer need not
> >> > necessarily
> >> > > >> build
> >> > > >> > > >> > project.
> >> > > >> > > >> > > I tried to create a maven project where I kept
> >> > > >> hoodie-spark-bundle
> >> > > >> > > as
> >> > > >> > > >> > > dependency and copied HoodieJavaApp and
> >> DataSourceTestUtils
> >> > > >> class
> >> > > >> > > >> into it
> >> > > >> > > >> > > but it does not compile. I have bee told here that
> >> > > >> > > >> hoodie-spark-bundle is
> >> > > >> > > >> > > uber jar but I doubt it is.
> >> > > >> > > >> > >
> >> > > >> > > >> > > On Thu, Apr 18, 2019 at 1:44 PM Jing Chen <
> >> > > >> milantracy@gmail.com>
> >> > > >> > > >> wrote:
> >> > > >> > > >> > >
> >> > > >> > > >> > >> Hi Umesh,
> >> > > >> > > >> > >> I believe *HoodieJavaApp *is a test class under
> >> > > >> *hoodie-spark.*
> >> > > >> > > >> > >> AFAIK, test classes are not supposed to be included
> in
> >> the
> >> > > >> > > artifact.
> >> > > >> > > >> > >> However, if you want to build an artifact where you
> >> have
> >> > > >> access
> >> > > >> > to
> >> > > >> > > >> test
> >> > > >> > > >> > >> classes, you would build from source code.
> >> > > >> > > >> > >> Once you build the hoodie project, you are able to
> >> find a
> >> > > test
> >> > > >> > jar
> >> > > >> > > >> that
> >> > > >> > > >> > >> includes *HoodieJavaApp *under
> >> > > >> > > >> > >>
> >> > > *hoodie-spark/target/hoodie-spark-0.4.5-SNAPSHOT-tests.jar**.*
> >> > > >> > > >> > >>
> >> > > >> > > >> > >> Thanks
> >> > > >> > > >> > >> Jing
> >> > > >> > > >> > >>
> >> > > >> > > >> > >> On Wed, Apr 17, 2019 at 11:10 PM Umesh Kacha <
> >> > > >> > > umesh.kacha@gmail.com>
> >> > > >> > > >> > >> wrote:
> >> > > >> > > >> > >>
> >> > > >> > > >> > >> > Hi I am not able to import class HoodieJavaApp
> using
> >> any
> >> > > of
> >> > > >> the
> >> > > >> > > >> maven
> >> > > >> > > >> > >> jars.
> >> > > >> > > >> > >> > I tried hooodie-spark-bundle and hoodie-spark both.
> >> It
> >> > > >> simply
> >> > > >> > > does
> >> > > >> > > >> not
> >> > > >> > > >> > >> find
> >> > > >> > > >> > >> > this class. I am using 0.4.5. Please guide.
> >> > > >> > > >> > >> >
> >> > > >> > > >> > >> > Regards,
> >> > > >> > > >> > >> > Umesh
> >> > > >> > > >> > >> >
> >> > > >> > > >> > >>
> >> > > >> > > >> > >
> >> > > >> > > >> >
> >> > > >> > > >>
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
>