You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by lamberken <la...@163.com> on 2020/01/01 00:19:26 UTC

Re:Re: Re: Re: Facing issues when using HiveIncrementalPuller


Hi @Pratyaksh Sharma,


Thanks for your detail stackstrace and reproduce steps. And your suggestion is reasonable.


1, For NPE issue, please tracking pr #1167 <https://github.com/apache/incubator-hudi/pull/1167>
2, For TTransportException issue, I have a question that can other statements be executed except create statement?


best,
lamber-ken

At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pr...@gmail.com> wrote:
>Thank you Lamberken, the above issue gets resolved with what you suggested.
>However, still HiveIncrementalPuller is not working.
>Subsequently I found and fixed a bug raised here -
>https://issues.apache.org/jira/browse/HUDI-485.
>
>Currently I am facing the below exception when trying to run the create
>table statement on docker cluster. Any leads for solving this are welcome -
>
>6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
>Exception when executing SQL
>
>java.sql.SQLException: org.apache.thrift.transport.TTransportException
>
>at
>org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)
>
>at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>
>at
>org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)
>
>at
>org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)
>
>at
>org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)
>
>at
>org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
>
>Caused by: org.apache.thrift.transport.TTransportException
>
>at
>org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>
>at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
>at
>org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>
>at
>org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>
>at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>
>at
>org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
>
>at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
>at
>org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>
>at
>org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>
>at
>org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>
>at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>
>at
>org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)
>
>at
>org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)
>
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>at
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>at java.lang.reflect.Method.invoke(Method.java:498)
>
>at
>org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
>
>at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)
>
>at
>org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)
>
>... 5 more
>
>6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  - Could
>not close the resultset opened
>
>java.sql.SQLException: org.apache.thrift.transport.TTransportException
>
>at
>org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)
>
>at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)
>
>at
>org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)
>
>at
>org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
>
>Caused by: org.apache.thrift.transport.TTransportException
>
>at
>org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>
>at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
>at
>org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>
>at
>org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>
>at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>
>at
>org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
>
>at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
>at
>org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>
>at
>org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>
>at
>org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>
>at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>
>at
>org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)
>
>at
>org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)
>
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>at
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>at java.lang.reflect.Method.invoke(Method.java:498)
>
>at
>org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
>
>at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)
>
>at
>org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)
>
>... 3 more
>
>Also the documentation does not mention the jars which need to be passed
>externally in classPath for executing above tool. We should upgrade the
>documentation to list down the jars so that it becomes easier for a new
>user to use this tool. I spent a lot of time adding all the jars
>incrementally. This jira (https://issues.apache.org/jira/browse/HUDI-486)
>tracks this.
>
>On Mon, Dec 30, 2019 at 5:35 PM lamberken <la...@163.com> wrote:
>
>>
>>
>> Hi @Pratyaksh Sharma
>>
>>
>> Thanks for your steps to reproduce this issue. Try to modify bellow codes,
>> and test again.
>>
>>
>> org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
>> --------------------------------- / String templateContent =
>> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
>> Changed to
>> / --------------------------------- / String templateContent =
>> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
>> best,
>> lamber-ken
>>
>>
>>
>>
>>
>> At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pr...@gmail.com> wrote:
>> >Hi Vinoth,
>> >
>> >I am able to reproduce this error on docker setup and have filed a jira -
>> >https://issues.apache.org/jira/browse/HUDI-484.
>> >
>> >Steps to reproduce are mentioned in the jira description itself.
>> >
>> >On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <pr...@gmail.com>
>> >wrote:
>> >
>> >> Hi Vinoth,
>> >>
>> >> I will try to reproduce the error on docker cluster and keep you
>> updated.
>> >>
>> >> On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <vi...@apache.org>
>> wrote:
>> >>
>> >>> Pratyaksh,
>> >>>
>> >>> If you are still having this issue, could you try reproducing this on
>> the
>> >>> docker setup
>> >>>
>> >>>
>> https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
>> >>> similar to this and raise a JIRA.
>> >>> Happy to look into it and get it fixed if needed
>> >>>
>> >>> Thanks
>> >>> Vinoth
>> >>>
>> >>> On Tue, Dec 24, 2019 at 8:43 AM lamberken <la...@163.com> wrote:
>> >>>
>> >>> >
>> >>> >
>> >>> > Hi, @Pratyaksh Sharma
>> >>> >
>> >>> >
>> >>> > The log4j-1.2.17.jar lib also needs to added to the classpath, for
>> >>> example:
>> >>> > java -cp
>> >>> >
>> >>>
>> /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
>> >>> > org.apache.hudi.utilities.HiveIncrementalPuller --help
>> >>> >
>> >>> >
>> >>> > best,
>> >>> > lamber-ken
>> >>> >
>> >>> > At 2019-12-24 17:23:20, "Pratyaksh Sharma" <pr...@gmail.com>
>> >>> wrote:
>> >>> > >Hi Vinoth,
>> >>> > >
>> >>> > >Sorry my bad, I did not realise earlier that spark is not needed for
>> >>> this
>> >>> > >class. I tried running it with the below command to get the
>> mentioned
>> >>> > >exception -
>> >>> > >
>> >>> > >Command -
>> >>> > >
>> >>> > >java -cp
>> >>> >
>> >>> >
>> >>>
>> >/path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
>> >>> > >org.apache.hudi.utilities.HiveIncrementalPuller --help
>> >>> > >
>> >>> > >Exception -
>> >>> > >Exception in thread "main" java.lang.NoClassDefFoundError:
>> >>> > >org/apache/log4j/LogManager
>> >>> > >        at
>> >>> >
>> >>> >
>> >>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
>> >>> > >Caused by: java.lang.ClassNotFoundException:
>> >>> org.apache.log4j.LogManager
>> >>> > >        at
>> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>> >>> > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >>> > >        at
>> >>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>> >>> > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >>> > >        ... 1 more
>> >>> > >
>> >>> > >I was able to fix it by including the corresponding jar in the
>> bundle.
>> >>> > >
>> >>> > >After fixing the above, still I am getting the NPE even though the
>> >>> > template
>> >>> > >is bundled in the jar.
>> >>> > >
>> >>> > >On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <vi...@apache.org>
>> >>> > wrote:
>> >>> > >
>> >>> > >> Hi Pratyaksh,
>> >>> > >>
>> >>> > >> HveIncrementalPuller is just a java program. Does not need Spark,
>> >>> since
>> >>> > it
>> >>> > >> just runs a HiveQL remotely..
>> >>> > >>
>> >>> > >> On the error you specified, seems like it can't find the template?
>> >>> Can
>> >>> > you
>> >>> > >> see if the bundle does not have the template file.. May be this
>> got
>> >>> > broken
>> >>> > >> during the bundling changes.. (since its no longer part of the
>> >>> resources
>> >>> > >> folder of the bundle module).. We should also probably be
>> throwing a
>> >>> > better
>> >>> > >> error than NPE..
>> >>> > >>
>> >>> > >> We can raise a JIRA, once you confirm.
>> >>> > >>
>> >>> > >> String templateContent =
>> >>> > >>
>> >>> > >>
>> >>> >
>> >>>
>> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
>> >>> > >>
>> >>> > >>
>> >>> > >> On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
>> >>> pratyaksh13@gmail.com
>> >>> > >
>> >>> > >> wrote:
>> >>> > >>
>> >>> > >> > Hi,
>> >>> > >> >
>> >>> > >> > Can someone guide me or share some documentation regarding how
>> to
>> >>> use
>> >>> > >> > HiveIncrementalPuller. I already went through the documentation
>> on
>> >>> > >> > https://hudi.apache.org/querying_data.html. I tried using this
>> >>> puller
>> >>> > >> > using
>> >>> > >> > the below command and facing the given exception.
>> >>> > >> >
>> >>> > >> > Any leads are appreciated.
>> >>> > >> >
>> >>> > >> > Command -
>> >>> > >> > spark-submit --name incremental-puller --queue etl --files
>> >>> > >> > incremental_sql.txt --master yarn --deploy-mode cluster
>> >>> > --driver-memory
>> >>> > >> 4g
>> >>> > >> > --executor-memory 4g --num-executors 2 --class
>> >>> > >> > org.apache.hudi.utilities.HiveIncrementalPuller
>> >>> > >> > hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
>> >>> > >> > jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass <pass>
>> >>> > >> > --extractSQLFile incremental_sql.txt --sourceDb <source_db>
>> >>> > --sourceTable
>> >>> > >> > <src_table> --targetDb tmp --targetTable tempTable
>> >>> --fromCommitTime 0
>> >>> > >> > --maxCommits 1
>> >>> > >> >
>> >>> > >> > Error -
>> >>> > >> >
>> >>> > >> > java.lang.NullPointerException
>> >>> > >> > at
>> >>> org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
>> >>> > >> > at
>> >>> > >> >
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
>> >>> > >> > at
>> >>> > >> >
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
>> >>> > >> > at
>> >>> > >> >
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
>> >>> > >> > at
>> >>> > >> >
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>
>>

Re: Facing issues when using HiveIncrementalPuller

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Hi Vinoth/Lamberken,

The issue seems to be due to number of open files (ulimit parameter). I
checked hiveserver container's logs when executing HiveIncrementalPuller
script and got the below error -

#

# A fatal error has been detected by the Java Runtime Environment:

#

#  SIGBUS (0x7) at pc=0x00007f522a4ede6d, pid=438, tid=0x00007f5204ffe700

#

# JRE version: OpenJDK Runtime Environment (8.0_212-b01) (build
1.8.0_212-8u212-b01-1~deb9u1-b01)

# Java VM: OpenJDK 64-Bit Server VM (25.212-b01 mixed mode linux-amd64
compressed oops)

# Problematic frame:

# C  [libzip.so+0x4e6d]

#

# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again

#

# An error report file with more information is saved as:

# /opt/hive/bin/hs_err_pid438.log


I was able to fix this by running the command ulimit -c unlimited in
hiveserver container. Now the task is only to set the ulimits in
docker-compose file itself for hiveserver.
Will be doing that and will raise a PR.

Will keep you guys posted if I face any further issues.


On Thu, Jan 2, 2020 at 11:27 AM lamberken <la...@163.com> wrote:

>
>
> Hi @Pratyaksh Sharma,
>
>
> Okay, all right. BTW, thanks for raising this issue.
>
>
> best,
> lamber-ken
>
>
> On 01/2/2020 13:47,Pratyaksh Sharma<pr...@gmail.com> wrote:
> Hi Lamberken,
>
> I am also trying to fix this issue. Please let us know if you come up with
> anything.
>
> On Thu, Jan 2, 2020 at 11:12 AM lamberken <la...@163.com> wrote:
>
>
>
> Hi @Vinoth,
>
>
> Got it, thank you for reminding me. I just made a mistake just now.
>
>
> best,
> lamber-ken
>
>
> On 01/2/2020 13:08,Vinoth Chandar<vi...@apache.org> wrote:
> Hi Lamber,
>
> utilities-bundle has always been a fat jar.. I was talking about
> hudi-utilities.
> Sure. take a swing at it. Happy to help as needed
>
> On Wed, Jan 1, 2020 at 8:57 PM lamberken <la...@163.com> wrote:
>
>
>
> Hi @Vinoth,
>
>
> I'm willing to solve this problem. I'm trying to find out from the history
> when hudi-utilities-bundle becoming not a fatjar.
>
>
>
> Git History
> 2019-08-29 FAT-JAR ---> 5f9fa82f47e1cc14a22b869250fe23c8f9c033cd
> 2019-09-14 NOT-FATJAR ---> d2525c31b7dad7bae2d4899d8df2a353ca39af50
> best,
> lamber-ken
>
>
> At 2020-01-01 09:15:01, "Vinoth Chandar" <vi...@apache.org> wrote:
> This does sound like a fair bit of pain.
> I am wondering if it makes sense to change the integ-test setup/docker
> demo
> to use incremental  puller. Bunch of the packaging issues around jars,
> seem
> like regressions that the hudi-utilities is not a fat jar anymore?
>
> if there are nt any takers, I can also try my hand at fixing this, once I
> get done with few things on my end. left a comment on HUDI-485
>
>
>
> On Tue, Dec 31, 2019 at 4:19 PM lamberken <la...@163.com> wrote:
>
>
>
> Hi @Pratyaksh Sharma,
>
>
> Thanks for your detail stackstrace and reproduce steps. And your
> suggestion is reasonable.
>
>
> 1, For NPE issue, please tracking pr #1167 <
> https://github.com/apache/incubator-hudi/pull/1167>
> 2, For TTransportException issue, I have a question that can other
> statements be executed except create statement?
>
>
> best,
> lamber-ken
>
> At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pr...@gmail.com>
> wrote:
> Thank you Lamberken, the above issue gets resolved with what you
> suggested.
> However, still HiveIncrementalPuller is not working.
> Subsequently I found and fixed a bug raised here -
> https://issues.apache.org/jira/browse/HUDI-485.
>
> Currently I am facing the below exception when trying to run the create
> table statement on docker cluster. Any leads for solving this are
> welcome
> -
>
> 6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
> Exception when executing SQL
>
> java.sql.SQLException: org.apache.thrift.transport.TTransportException
>
> at
>
>
>
>
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)
>
> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>
> at
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)
>
> at
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)
>
> at
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)
>
> at
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
>
> Caused by: org.apache.thrift.transport.TTransportException
>
> at
>
>
>
>
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
>
>
>
>
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>
> at
>
>
>
>
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>
> at
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>
> at
>
>
>
>
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
>
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
>
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>
> at
>
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>
> at
>
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>
> at
>
>
>
>
> org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)
>
> at
>
>
>
>
> org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
>
>
>
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
>
>
>
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at
>
>
>
>
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
>
> at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)
>
> at
>
>
>
>
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)
>
> ... 5 more
>
> 6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
> Could
> not close the resultset opened
>
> java.sql.SQLException: org.apache.thrift.transport.TTransportException
>
> at
>
>
>
>
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)
>
> at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)
>
> at
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)
>
> at
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
>
> Caused by: org.apache.thrift.transport.TTransportException
>
> at
>
>
>
>
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
>
>
>
>
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>
> at
>
>
>
>
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>
> at
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>
> at
>
>
>
>
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
>
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
>
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>
> at
>
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>
> at
>
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>
> at
>
>
>
>
> org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)
>
> at
>
>
>
>
> org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
>
>
>
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
>
>
>
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at
>
>
>
>
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
>
> at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)
>
> at
>
>
>
>
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)
>
> ... 3 more
>
> Also the documentation does not mention the jars which need to be
> passed
> externally in classPath for executing above tool. We should upgrade the
> documentation to list down the jars so that it becomes easier for a new
> user to use this tool. I spent a lot of time adding all the jars
> incrementally. This jira (
> https://issues.apache.org/jira/browse/HUDI-486)
> tracks this.
>
> On Mon, Dec 30, 2019 at 5:35 PM lamberken <la...@163.com> wrote:
>
>
>
> Hi @Pratyaksh Sharma
>
>
> Thanks for your steps to reproduce this issue. Try to modify bellow
> codes,
> and test again.
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
> --------------------------------- / String templateContent =
>
>
>
>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
> Changed to
> / --------------------------------- / String templateContent =
>
>
>
>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
> best,
> lamber-ken
>
>
>
>
>
> At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pr...@gmail.com>
> wrote:
> Hi Vinoth,
>
> I am able to reproduce this error on docker setup and have filed a
> jira -
> https://issues.apache.org/jira/browse/HUDI-484.
>
> Steps to reproduce are mentioned in the jira description itself.
>
> On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <
> pratyaksh13@gmail.com>
> wrote:
>
> Hi Vinoth,
>
> I will try to reproduce the error on docker cluster and keep you
> updated.
>
> On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <
> vinoth@apache.org>
> wrote:
>
> Pratyaksh,
>
> If you are still having this issue, could you try reproducing
> this
> on
> the
> docker setup
>
>
>
>
>
>
> https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
> similar to this and raise a JIRA.
> Happy to look into it and get it fixed if needed
>
> Thanks
> Vinoth
>
> On Tue, Dec 24, 2019 at 8:43 AM lamberken <la...@163.com>
> wrote:
>
>
>
> Hi, @Pratyaksh Sharma
>
>
> The log4j-1.2.17.jar lib also needs to added to the classpath,
> for
> example:
> java -cp
>
>
>
>
>
>
> /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> org.apache.hudi.utilities.HiveIncrementalPuller --help
>
>
> best,
> lamber-ken
>
> At 2019-12-24 17:23:20, "Pratyaksh Sharma" <
> pratyaksh13@gmail.com
>
> wrote:
> Hi Vinoth,
>
> Sorry my bad, I did not realise earlier that spark is not
> needed
> for
> this
> class. I tried running it with the below command to get the
> mentioned
> exception -
>
> Command -
>
> java -cp
>
>
>
>
>
>
>
> /path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> org.apache.hudi.utilities.HiveIncrementalPuller --help
>
> Exception -
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/log4j/LogManager
> at
>
>
>
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.log4j.LogManager
> at
> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 1 more
>
> I was able to fix it by including the corresponding jar in the
> bundle.
>
> After fixing the above, still I am getting the NPE even though
> the
> template
> is bundled in the jar.
>
> On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <
> vinoth@apache.org>
> wrote:
>
> Hi Pratyaksh,
>
> HveIncrementalPuller is just a java program. Does not need
> Spark,
> since
> it
> just runs a HiveQL remotely..
>
> On the error you specified, seems like it can't find the
> template?
> Can
> you
> see if the bundle does not have the template file.. May be
> this
> got
> broken
> during the bundling changes.. (since its no longer part of
> the
> resources
> folder of the bundle module).. We should also probably be
> throwing a
> better
> error than NPE..
>
> We can raise a JIRA, once you confirm.
>
> String templateContent =
>
>
>
>
>
>
>
>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
>
>
> On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
> pratyaksh13@gmail.com
>
> wrote:
>
> Hi,
>
> Can someone guide me or share some documentation regarding
> how
> to
> use
> HiveIncrementalPuller. I already went through the
> documentation
> on
> https://hudi.apache.org/querying_data.html. I tried using
> this
> puller
> using
> the below command and facing the given exception.
>
> Any leads are appreciated.
>
> Command -
> spark-submit --name incremental-puller --queue etl --files
> incremental_sql.txt --master yarn --deploy-mode cluster
> --driver-memory
> 4g
> --executor-memory 4g --num-executors 2 --class
> org.apache.hudi.utilities.HiveIncrementalPuller
> hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
> jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass
> <pass>
> --extractSQLFile incremental_sql.txt --sourceDb
> <source_db>
> --sourceTable
> <src_table> --targetDb tmp --targetTable tempTable
> --fromCommitTime 0
> --maxCommits 1
>
> Error -
>
> java.lang.NullPointerException
> at
> org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
> at
>
>
>
>
>
>
>
>
>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
> at
>
>
>
>
>
>
>
>
>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
> at
>
>
>
>
>
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
> at
>
>
>
>
>
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)
>
>
>
>
>
>
>
>
>
>

Re: Facing issues when using HiveIncrementalPuller

Posted by lamberken <la...@163.com>.

Hi @Pratyaksh Sharma,


Okay, all right. BTW, thanks for raising this issue.


best,
lamber-ken


On 01/2/2020 13:47,Pratyaksh Sharma<pr...@gmail.com> wrote:
Hi Lamberken,

I am also trying to fix this issue. Please let us know if you come up with
anything.

On Thu, Jan 2, 2020 at 11:12 AM lamberken <la...@163.com> wrote:



Hi @Vinoth,


Got it, thank you for reminding me. I just made a mistake just now.


best,
lamber-ken


On 01/2/2020 13:08,Vinoth Chandar<vi...@apache.org> wrote:
Hi Lamber,

utilities-bundle has always been a fat jar.. I was talking about
hudi-utilities.
Sure. take a swing at it. Happy to help as needed

On Wed, Jan 1, 2020 at 8:57 PM lamberken <la...@163.com> wrote:



Hi @Vinoth,


I'm willing to solve this problem. I'm trying to find out from the history
when hudi-utilities-bundle becoming not a fatjar.



Git History
2019-08-29 FAT-JAR ---> 5f9fa82f47e1cc14a22b869250fe23c8f9c033cd
2019-09-14 NOT-FATJAR ---> d2525c31b7dad7bae2d4899d8df2a353ca39af50
best,
lamber-ken


At 2020-01-01 09:15:01, "Vinoth Chandar" <vi...@apache.org> wrote:
This does sound like a fair bit of pain.
I am wondering if it makes sense to change the integ-test setup/docker
demo
to use incremental  puller. Bunch of the packaging issues around jars,
seem
like regressions that the hudi-utilities is not a fat jar anymore?

if there are nt any takers, I can also try my hand at fixing this, once I
get done with few things on my end. left a comment on HUDI-485



On Tue, Dec 31, 2019 at 4:19 PM lamberken <la...@163.com> wrote:



Hi @Pratyaksh Sharma,


Thanks for your detail stackstrace and reproduce steps. And your
suggestion is reasonable.


1, For NPE issue, please tracking pr #1167 <
https://github.com/apache/incubator-hudi/pull/1167>
2, For TTransportException issue, I have a question that can other
statements be executed except create statement?


best,
lamber-ken

At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pr...@gmail.com>
wrote:
Thank you Lamberken, the above issue gets resolved with what you
suggested.
However, still HiveIncrementalPuller is not working.
Subsequently I found and fixed a bug raised here -
https://issues.apache.org/jira/browse/HUDI-485.

Currently I am facing the below exception when trying to run the create
table statement on docker cluster. Any leads for solving this are
welcome
-

6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
Exception when executing SQL

java.sql.SQLException: org.apache.thrift.transport.TTransportException

at



org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)

at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)

at



org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)

at



org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)

at



org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)

at



org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)

Caused by: org.apache.thrift.transport.TTransportException

at



org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at



org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)

at



org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)

at
org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)

at



org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at



org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)

at



org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)

at



org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)

at



org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)

at



org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at



sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at



sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at



org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)

at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)

at



org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)

... 5 more

6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
Could
not close the resultset opened

java.sql.SQLException: org.apache.thrift.transport.TTransportException

at



org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)

at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)

at



org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)

at



org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)

Caused by: org.apache.thrift.transport.TTransportException

at



org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at



org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)

at



org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)

at
org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)

at



org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at



org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)

at



org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)

at



org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)

at



org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)

at



org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at



sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at



sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at



org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)

at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)

at



org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)

... 3 more

Also the documentation does not mention the jars which need to be
passed
externally in classPath for executing above tool. We should upgrade the
documentation to list down the jars so that it becomes easier for a new
user to use this tool. I spent a lot of time adding all the jars
incrementally. This jira (
https://issues.apache.org/jira/browse/HUDI-486)
tracks this.

On Mon, Dec 30, 2019 at 5:35 PM lamberken <la...@163.com> wrote:



Hi @Pratyaksh Sharma


Thanks for your steps to reproduce this issue. Try to modify bellow
codes,
and test again.



org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
--------------------------------- / String templateContent =



FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
Changed to
/ --------------------------------- / String templateContent =



FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
best,
lamber-ken





At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pr...@gmail.com>
wrote:
Hi Vinoth,

I am able to reproduce this error on docker setup and have filed a
jira -
https://issues.apache.org/jira/browse/HUDI-484.

Steps to reproduce are mentioned in the jira description itself.

On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <
pratyaksh13@gmail.com>
wrote:

Hi Vinoth,

I will try to reproduce the error on docker cluster and keep you
updated.

On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <
vinoth@apache.org>
wrote:

Pratyaksh,

If you are still having this issue, could you try reproducing
this
on
the
docker setup





https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
similar to this and raise a JIRA.
Happy to look into it and get it fixed if needed

Thanks
Vinoth

On Tue, Dec 24, 2019 at 8:43 AM lamberken <la...@163.com>
wrote:



Hi, @Pratyaksh Sharma


The log4j-1.2.17.jar lib also needs to added to the classpath,
for
example:
java -cp





/path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
org.apache.hudi.utilities.HiveIncrementalPuller --help


best,
lamber-ken

At 2019-12-24 17:23:20, "Pratyaksh Sharma" <
pratyaksh13@gmail.com

wrote:
Hi Vinoth,

Sorry my bad, I did not realise earlier that spark is not
needed
for
this
class. I tried running it with the below command to get the
mentioned
exception -

Command -

java -cp






/path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
org.apache.hudi.utilities.HiveIncrementalPuller --help

Exception -
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/log4j/LogManager
at






org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
Caused by: java.lang.ClassNotFoundException:
org.apache.log4j.LogManager
at
java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more

I was able to fix it by including the corresponding jar in the
bundle.

After fixing the above, still I am getting the NPE even though
the
template
is bundled in the jar.

On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <
vinoth@apache.org>
wrote:

Hi Pratyaksh,

HveIncrementalPuller is just a java program. Does not need
Spark,
since
it
just runs a HiveQL remotely..

On the error you specified, seems like it can't find the
template?
Can
you
see if the bundle does not have the template file.. May be
this
got
broken
during the bundling changes.. (since its no longer part of
the
resources
folder of the bundle module).. We should also probably be
throwing a
better
error than NPE..

We can raise a JIRA, once you confirm.

String templateContent =







FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));


On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
pratyaksh13@gmail.com

wrote:

Hi,

Can someone guide me or share some documentation regarding
how
to
use
HiveIncrementalPuller. I already went through the
documentation
on
https://hudi.apache.org/querying_data.html. I tried using
this
puller
using
the below command and facing the given exception.

Any leads are appreciated.

Command -
spark-submit --name incremental-puller --queue etl --files
incremental_sql.txt --master yarn --deploy-mode cluster
--driver-memory
4g
--executor-memory 4g --num-executors 2 --class
org.apache.hudi.utilities.HiveIncrementalPuller
hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass
<pass>
--extractSQLFile incremental_sql.txt --sourceDb
<source_db>
--sourceTable
<src_table> --targetDb tmp --targetTable tempTable
--fromCommitTime 0
--maxCommits 1

Error -

java.lang.NullPointerException
at
org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
at








org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
at








org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
at








org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
at








org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)










Re: Facing issues when using HiveIncrementalPuller

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Hi Lamberken,

I am also trying to fix this issue. Please let us know if you come up with
anything.

On Thu, Jan 2, 2020 at 11:12 AM lamberken <la...@163.com> wrote:

>
>
> Hi @Vinoth,
>
>
> Got it, thank you for reminding me. I just made a mistake just now.
>
>
> best,
> lamber-ken
>
>
> On 01/2/2020 13:08,Vinoth Chandar<vi...@apache.org> wrote:
> Hi Lamber,
>
> utilities-bundle has always been a fat jar.. I was talking about
> hudi-utilities.
> Sure. take a swing at it. Happy to help as needed
>
> On Wed, Jan 1, 2020 at 8:57 PM lamberken <la...@163.com> wrote:
>
>
>
> Hi @Vinoth,
>
>
> I'm willing to solve this problem. I'm trying to find out from the history
> when hudi-utilities-bundle becoming not a fatjar.
>
>
>
> Git History
> 2019-08-29 FAT-JAR ---> 5f9fa82f47e1cc14a22b869250fe23c8f9c033cd
> 2019-09-14 NOT-FATJAR ---> d2525c31b7dad7bae2d4899d8df2a353ca39af50
> best,
> lamber-ken
>
>
> At 2020-01-01 09:15:01, "Vinoth Chandar" <vi...@apache.org> wrote:
> This does sound like a fair bit of pain.
> I am wondering if it makes sense to change the integ-test setup/docker
> demo
> to use incremental  puller. Bunch of the packaging issues around jars,
> seem
> like regressions that the hudi-utilities is not a fat jar anymore?
>
> if there are nt any takers, I can also try my hand at fixing this, once I
> get done with few things on my end. left a comment on HUDI-485
>
>
>
> On Tue, Dec 31, 2019 at 4:19 PM lamberken <la...@163.com> wrote:
>
>
>
> Hi @Pratyaksh Sharma,
>
>
> Thanks for your detail stackstrace and reproduce steps. And your
> suggestion is reasonable.
>
>
> 1, For NPE issue, please tracking pr #1167 <
> https://github.com/apache/incubator-hudi/pull/1167>
> 2, For TTransportException issue, I have a question that can other
> statements be executed except create statement?
>
>
> best,
> lamber-ken
>
> At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pr...@gmail.com>
> wrote:
> Thank you Lamberken, the above issue gets resolved with what you
> suggested.
> However, still HiveIncrementalPuller is not working.
> Subsequently I found and fixed a bug raised here -
> https://issues.apache.org/jira/browse/HUDI-485.
>
> Currently I am facing the below exception when trying to run the create
> table statement on docker cluster. Any leads for solving this are
> welcome
> -
>
> 6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
> Exception when executing SQL
>
> java.sql.SQLException: org.apache.thrift.transport.TTransportException
>
> at
>
>
>
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)
>
> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>
> at
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)
>
> at
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)
>
> at
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)
>
> at
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
>
> Caused by: org.apache.thrift.transport.TTransportException
>
> at
>
>
>
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
>
>
>
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>
> at
>
>
>
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>
> at
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>
> at
>
>
>
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
>
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>
> at
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>
> at
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>
> at
>
>
>
> org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)
>
> at
>
>
>
> org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
>
>
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
>
>
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at
>
>
>
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
>
> at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)
>
> at
>
>
>
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)
>
> ... 5 more
>
> 6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
> Could
> not close the resultset opened
>
> java.sql.SQLException: org.apache.thrift.transport.TTransportException
>
> at
>
>
>
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)
>
> at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)
>
> at
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)
>
> at
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
>
> Caused by: org.apache.thrift.transport.TTransportException
>
> at
>
>
>
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
>
>
>
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>
> at
>
>
>
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>
> at
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>
> at
>
>
>
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
>
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>
> at
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>
> at
>
>
>
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>
> at
>
>
>
> org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)
>
> at
>
>
>
> org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
>
>
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
>
>
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at
>
>
>
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
>
> at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)
>
> at
>
>
>
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)
>
> ... 3 more
>
> Also the documentation does not mention the jars which need to be
> passed
> externally in classPath for executing above tool. We should upgrade the
> documentation to list down the jars so that it becomes easier for a new
> user to use this tool. I spent a lot of time adding all the jars
> incrementally. This jira (
> https://issues.apache.org/jira/browse/HUDI-486)
> tracks this.
>
> On Mon, Dec 30, 2019 at 5:35 PM lamberken <la...@163.com> wrote:
>
>
>
> Hi @Pratyaksh Sharma
>
>
> Thanks for your steps to reproduce this issue. Try to modify bellow
> codes,
> and test again.
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
> --------------------------------- / String templateContent =
>
>
>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
> Changed to
> / --------------------------------- / String templateContent =
>
>
>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
> best,
> lamber-ken
>
>
>
>
>
> At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pr...@gmail.com>
> wrote:
> Hi Vinoth,
>
> I am able to reproduce this error on docker setup and have filed a
> jira -
> https://issues.apache.org/jira/browse/HUDI-484.
>
> Steps to reproduce are mentioned in the jira description itself.
>
> On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <
> pratyaksh13@gmail.com>
> wrote:
>
> Hi Vinoth,
>
> I will try to reproduce the error on docker cluster and keep you
> updated.
>
> On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <
> vinoth@apache.org>
> wrote:
>
> Pratyaksh,
>
> If you are still having this issue, could you try reproducing
> this
> on
> the
> docker setup
>
>
>
>
>
> https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
> similar to this and raise a JIRA.
> Happy to look into it and get it fixed if needed
>
> Thanks
> Vinoth
>
> On Tue, Dec 24, 2019 at 8:43 AM lamberken <la...@163.com>
> wrote:
>
>
>
> Hi, @Pratyaksh Sharma
>
>
> The log4j-1.2.17.jar lib also needs to added to the classpath,
> for
> example:
> java -cp
>
>
>
>
>
> /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> org.apache.hudi.utilities.HiveIncrementalPuller --help
>
>
> best,
> lamber-ken
>
> At 2019-12-24 17:23:20, "Pratyaksh Sharma" <
> pratyaksh13@gmail.com
>
> wrote:
> Hi Vinoth,
>
> Sorry my bad, I did not realise earlier that spark is not
> needed
> for
> this
> class. I tried running it with the below command to get the
> mentioned
> exception -
>
> Command -
>
> java -cp
>
>
>
>
>
>
> /path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> org.apache.hudi.utilities.HiveIncrementalPuller --help
>
> Exception -
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/log4j/LogManager
> at
>
>
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.log4j.LogManager
> at
> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 1 more
>
> I was able to fix it by including the corresponding jar in the
> bundle.
>
> After fixing the above, still I am getting the NPE even though
> the
> template
> is bundled in the jar.
>
> On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <
> vinoth@apache.org>
> wrote:
>
> Hi Pratyaksh,
>
> HveIncrementalPuller is just a java program. Does not need
> Spark,
> since
> it
> just runs a HiveQL remotely..
>
> On the error you specified, seems like it can't find the
> template?
> Can
> you
> see if the bundle does not have the template file.. May be
> this
> got
> broken
> during the bundling changes.. (since its no longer part of
> the
> resources
> folder of the bundle module).. We should also probably be
> throwing a
> better
> error than NPE..
>
> We can raise a JIRA, once you confirm.
>
> String templateContent =
>
>
>
>
>
>
>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
>
>
> On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
> pratyaksh13@gmail.com
>
> wrote:
>
> Hi,
>
> Can someone guide me or share some documentation regarding
> how
> to
> use
> HiveIncrementalPuller. I already went through the
> documentation
> on
> https://hudi.apache.org/querying_data.html. I tried using
> this
> puller
> using
> the below command and facing the given exception.
>
> Any leads are appreciated.
>
> Command -
> spark-submit --name incremental-puller --queue etl --files
> incremental_sql.txt --master yarn --deploy-mode cluster
> --driver-memory
> 4g
> --executor-memory 4g --num-executors 2 --class
> org.apache.hudi.utilities.HiveIncrementalPuller
> hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
> jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass
> <pass>
> --extractSQLFile incremental_sql.txt --sourceDb
> <source_db>
> --sourceTable
> <src_table> --targetDb tmp --targetTable tempTable
> --fromCommitTime 0
> --maxCommits 1
>
> Error -
>
> java.lang.NullPointerException
> at
> org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
> at
>
>
>
>
>
>
>
>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
> at
>
>
>
>
>
>
>
>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
> at
>
>
>
>
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
> at
>
>
>
>
>
>
>
>
> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)
>
>
>
>
>
>
>
>
>

Re: Facing issues when using HiveIncrementalPuller

Posted by lamberken <la...@163.com>.

Hi @Vinoth,


Got it, thank you for reminding me. I just made a mistake just now.


best,
lamber-ken


On 01/2/2020 13:08,Vinoth Chandar<vi...@apache.org> wrote:
Hi Lamber,

utilities-bundle has always been a fat jar.. I was talking about
hudi-utilities.
Sure. take a swing at it. Happy to help as needed

On Wed, Jan 1, 2020 at 8:57 PM lamberken <la...@163.com> wrote:



Hi @Vinoth,


I'm willing to solve this problem. I'm trying to find out from the history
when hudi-utilities-bundle becoming not a fatjar.



Git History
2019-08-29 FAT-JAR ---> 5f9fa82f47e1cc14a22b869250fe23c8f9c033cd
2019-09-14 NOT-FATJAR ---> d2525c31b7dad7bae2d4899d8df2a353ca39af50
best,
lamber-ken


At 2020-01-01 09:15:01, "Vinoth Chandar" <vi...@apache.org> wrote:
This does sound like a fair bit of pain.
I am wondering if it makes sense to change the integ-test setup/docker
demo
to use incremental  puller. Bunch of the packaging issues around jars,
seem
like regressions that the hudi-utilities is not a fat jar anymore?

if there are nt any takers, I can also try my hand at fixing this, once I
get done with few things on my end. left a comment on HUDI-485



On Tue, Dec 31, 2019 at 4:19 PM lamberken <la...@163.com> wrote:



Hi @Pratyaksh Sharma,


Thanks for your detail stackstrace and reproduce steps. And your
suggestion is reasonable.


1, For NPE issue, please tracking pr #1167 <
https://github.com/apache/incubator-hudi/pull/1167>
2, For TTransportException issue, I have a question that can other
statements be executed except create statement?


best,
lamber-ken

At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pr...@gmail.com>
wrote:
Thank you Lamberken, the above issue gets resolved with what you
suggested.
However, still HiveIncrementalPuller is not working.
Subsequently I found and fixed a bug raised here -
https://issues.apache.org/jira/browse/HUDI-485.

Currently I am facing the below exception when trying to run the create
table statement on docker cluster. Any leads for solving this are
welcome
-

6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
Exception when executing SQL

java.sql.SQLException: org.apache.thrift.transport.TTransportException

at


org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)

at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)

at


org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)

at


org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)

at


org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)

at


org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)

Caused by: org.apache.thrift.transport.TTransportException

at


org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at


org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)

at


org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)

at
org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)

at


org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at


org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)

at


org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)

at


org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)

at


org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)

at


org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at


sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at


sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at


org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)

at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)

at


org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)

... 5 more

6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
Could
not close the resultset opened

java.sql.SQLException: org.apache.thrift.transport.TTransportException

at


org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)

at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)

at


org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)

at


org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)

Caused by: org.apache.thrift.transport.TTransportException

at


org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at


org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)

at


org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)

at
org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)

at


org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at


org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)

at


org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)

at


org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)

at


org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)

at


org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at


sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at


sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at


org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)

at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)

at


org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)

... 3 more

Also the documentation does not mention the jars which need to be
passed
externally in classPath for executing above tool. We should upgrade the
documentation to list down the jars so that it becomes easier for a new
user to use this tool. I spent a lot of time adding all the jars
incrementally. This jira (
https://issues.apache.org/jira/browse/HUDI-486)
tracks this.

On Mon, Dec 30, 2019 at 5:35 PM lamberken <la...@163.com> wrote:



Hi @Pratyaksh Sharma


Thanks for your steps to reproduce this issue. Try to modify bellow
codes,
and test again.



org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
--------------------------------- / String templateContent =


FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
Changed to
/ --------------------------------- / String templateContent =


FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
best,
lamber-ken





At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pr...@gmail.com>
wrote:
Hi Vinoth,

I am able to reproduce this error on docker setup and have filed a
jira -
https://issues.apache.org/jira/browse/HUDI-484.

Steps to reproduce are mentioned in the jira description itself.

On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <
pratyaksh13@gmail.com>
wrote:

Hi Vinoth,

I will try to reproduce the error on docker cluster and keep you
updated.

On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <
vinoth@apache.org>
wrote:

Pratyaksh,

If you are still having this issue, could you try reproducing
this
on
the
docker setup




https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
similar to this and raise a JIRA.
Happy to look into it and get it fixed if needed

Thanks
Vinoth

On Tue, Dec 24, 2019 at 8:43 AM lamberken <la...@163.com>
wrote:



Hi, @Pratyaksh Sharma


The log4j-1.2.17.jar lib also needs to added to the classpath,
for
example:
java -cp




/path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
org.apache.hudi.utilities.HiveIncrementalPuller --help


best,
lamber-ken

At 2019-12-24 17:23:20, "Pratyaksh Sharma" <
pratyaksh13@gmail.com

wrote:
Hi Vinoth,

Sorry my bad, I did not realise earlier that spark is not
needed
for
this
class. I tried running it with the below command to get the
mentioned
exception -

Command -

java -cp





/path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
org.apache.hudi.utilities.HiveIncrementalPuller --help

Exception -
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/log4j/LogManager
at





org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
Caused by: java.lang.ClassNotFoundException:
org.apache.log4j.LogManager
at
java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more

I was able to fix it by including the corresponding jar in the
bundle.

After fixing the above, still I am getting the NPE even though
the
template
is bundled in the jar.

On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <
vinoth@apache.org>
wrote:

Hi Pratyaksh,

HveIncrementalPuller is just a java program. Does not need
Spark,
since
it
just runs a HiveQL remotely..

On the error you specified, seems like it can't find the
template?
Can
you
see if the bundle does not have the template file.. May be
this
got
broken
during the bundling changes.. (since its no longer part of
the
resources
folder of the bundle module).. We should also probably be
throwing a
better
error than NPE..

We can raise a JIRA, once you confirm.

String templateContent =






FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));


On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
pratyaksh13@gmail.com

wrote:

Hi,

Can someone guide me or share some documentation regarding
how
to
use
HiveIncrementalPuller. I already went through the
documentation
on
https://hudi.apache.org/querying_data.html. I tried using
this
puller
using
the below command and facing the given exception.

Any leads are appreciated.

Command -
spark-submit --name incremental-puller --queue etl --files
incremental_sql.txt --master yarn --deploy-mode cluster
--driver-memory
4g
--executor-memory 4g --num-executors 2 --class
org.apache.hudi.utilities.HiveIncrementalPuller
hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass
<pass>
--extractSQLFile incremental_sql.txt --sourceDb
<source_db>
--sourceTable
<src_table> --targetDb tmp --targetTable tempTable
--fromCommitTime 0
--maxCommits 1

Error -

java.lang.NullPointerException
at
org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
at







org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
at







org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
at







org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
at







org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)









Re: Re: Re: Re: Re: Facing issues when using HiveIncrementalPuller

Posted by Vinoth Chandar <vi...@apache.org>.
Hi Lamber,

utilities-bundle has always been a fat jar.. I was talking about
hudi-utilities.
Sure. take a swing at it. Happy to help as needed

On Wed, Jan 1, 2020 at 8:57 PM lamberken <la...@163.com> wrote:

>
>
> Hi @Vinoth,
>
>
> I'm willing to solve this problem. I'm trying to find out from the history
> when hudi-utilities-bundle becoming not a fatjar.
>
>
>
> Git History
> 2019-08-29 FAT-JAR ---> 5f9fa82f47e1cc14a22b869250fe23c8f9c033cd
> 2019-09-14 NOT-FATJAR ---> d2525c31b7dad7bae2d4899d8df2a353ca39af50
> best,
> lamber-ken
>
>
> At 2020-01-01 09:15:01, "Vinoth Chandar" <vi...@apache.org> wrote:
> >This does sound like a fair bit of pain.
> >I am wondering if it makes sense to change the integ-test setup/docker
> demo
> >to use incremental  puller. Bunch of the packaging issues around jars,
> seem
> >like regressions that the hudi-utilities is not a fat jar anymore?
> >
> >if there are nt any takers, I can also try my hand at fixing this, once I
> >get done with few things on my end. left a comment on HUDI-485
> >
> >
> >
> >On Tue, Dec 31, 2019 at 4:19 PM lamberken <la...@163.com> wrote:
> >
> >>
> >>
> >> Hi @Pratyaksh Sharma,
> >>
> >>
> >> Thanks for your detail stackstrace and reproduce steps. And your
> >> suggestion is reasonable.
> >>
> >>
> >> 1, For NPE issue, please tracking pr #1167 <
> >> https://github.com/apache/incubator-hudi/pull/1167>
> >> 2, For TTransportException issue, I have a question that can other
> >> statements be executed except create statement?
> >>
> >>
> >> best,
> >> lamber-ken
> >>
> >> At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pr...@gmail.com>
> wrote:
> >> >Thank you Lamberken, the above issue gets resolved with what you
> >> suggested.
> >> >However, still HiveIncrementalPuller is not working.
> >> >Subsequently I found and fixed a bug raised here -
> >> >https://issues.apache.org/jira/browse/HUDI-485.
> >> >
> >> >Currently I am facing the below exception when trying to run the create
> >> >table statement on docker cluster. Any leads for solving this are
> welcome
> >> -
> >> >
> >> >6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
> >> >Exception when executing SQL
> >> >
> >> >java.sql.SQLException: org.apache.thrift.transport.TTransportException
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)
> >> >
> >> >at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
> >> >
> >> >Caused by: org.apache.thrift.transport.TTransportException
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> >> >
> >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
> >> >
> >> >at
> >> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
> >> >
> >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
> >> >
> >> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)
> >> >
> >> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >
> >> >at
> >>
> >>
> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >> >
> >> >at
> >>
> >>
> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> >
> >> >at java.lang.reflect.Method.invoke(Method.java:498)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
> >> >
> >> >at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)
> >> >
> >> >... 5 more
> >> >
> >> >6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
> Could
> >> >not close the resultset opened
> >> >
> >> >java.sql.SQLException: org.apache.thrift.transport.TTransportException
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)
> >> >
> >> >at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
> >> >
> >> >Caused by: org.apache.thrift.transport.TTransportException
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> >> >
> >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
> >> >
> >> >at
> >> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
> >> >
> >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
> >> >
> >> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)
> >> >
> >> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >
> >> >at
> >>
> >>
> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >> >
> >> >at
> >>
> >>
> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> >
> >> >at java.lang.reflect.Method.invoke(Method.java:498)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
> >> >
> >> >at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)
> >> >
> >> >... 3 more
> >> >
> >> >Also the documentation does not mention the jars which need to be
> passed
> >> >externally in classPath for executing above tool. We should upgrade the
> >> >documentation to list down the jars so that it becomes easier for a new
> >> >user to use this tool. I spent a lot of time adding all the jars
> >> >incrementally. This jira (
> https://issues.apache.org/jira/browse/HUDI-486)
> >> >tracks this.
> >> >
> >> >On Mon, Dec 30, 2019 at 5:35 PM lamberken <la...@163.com> wrote:
> >> >
> >> >>
> >> >>
> >> >> Hi @Pratyaksh Sharma
> >> >>
> >> >>
> >> >> Thanks for your steps to reproduce this issue. Try to modify bellow
> >> codes,
> >> >> and test again.
> >> >>
> >> >>
> >> >>
> org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
> >> >> --------------------------------- / String templateContent =
> >> >>
> >>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
> >> >> Changed to
> >> >> / --------------------------------- / String templateContent =
> >> >>
> >>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
> >> >> best,
> >> >> lamber-ken
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pr...@gmail.com>
> >> wrote:
> >> >> >Hi Vinoth,
> >> >> >
> >> >> >I am able to reproduce this error on docker setup and have filed a
> >> jira -
> >> >> >https://issues.apache.org/jira/browse/HUDI-484.
> >> >> >
> >> >> >Steps to reproduce are mentioned in the jira description itself.
> >> >> >
> >> >> >On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <
> >> pratyaksh13@gmail.com>
> >> >> >wrote:
> >> >> >
> >> >> >> Hi Vinoth,
> >> >> >>
> >> >> >> I will try to reproduce the error on docker cluster and keep you
> >> >> updated.
> >> >> >>
> >> >> >> On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <
> vinoth@apache.org>
> >> >> wrote:
> >> >> >>
> >> >> >>> Pratyaksh,
> >> >> >>>
> >> >> >>> If you are still having this issue, could you try reproducing
> this
> >> on
> >> >> the
> >> >> >>> docker setup
> >> >> >>>
> >> >> >>>
> >> >>
> >>
> https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
> >> >> >>> similar to this and raise a JIRA.
> >> >> >>> Happy to look into it and get it fixed if needed
> >> >> >>>
> >> >> >>> Thanks
> >> >> >>> Vinoth
> >> >> >>>
> >> >> >>> On Tue, Dec 24, 2019 at 8:43 AM lamberken <la...@163.com>
> >> wrote:
> >> >> >>>
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > Hi, @Pratyaksh Sharma
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > The log4j-1.2.17.jar lib also needs to added to the classpath,
> for
> >> >> >>> example:
> >> >> >>> > java -cp
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> >> >> >>> > org.apache.hudi.utilities.HiveIncrementalPuller --help
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > best,
> >> >> >>> > lamber-ken
> >> >> >>> >
> >> >> >>> > At 2019-12-24 17:23:20, "Pratyaksh Sharma" <
> pratyaksh13@gmail.com
> >> >
> >> >> >>> wrote:
> >> >> >>> > >Hi Vinoth,
> >> >> >>> > >
> >> >> >>> > >Sorry my bad, I did not realise earlier that spark is not
> needed
> >> for
> >> >> >>> this
> >> >> >>> > >class. I tried running it with the below command to get the
> >> >> mentioned
> >> >> >>> > >exception -
> >> >> >>> > >
> >> >> >>> > >Command -
> >> >> >>> > >
> >> >> >>> > >java -cp
> >> >> >>> >
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> >/path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> >> >> >>> > >org.apache.hudi.utilities.HiveIncrementalPuller --help
> >> >> >>> > >
> >> >> >>> > >Exception -
> >> >> >>> > >Exception in thread "main" java.lang.NoClassDefFoundError:
> >> >> >>> > >org/apache/log4j/LogManager
> >> >> >>> > >        at
> >> >> >>> >
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
> >> >> >>> > >Caused by: java.lang.ClassNotFoundException:
> >> >> >>> org.apache.log4j.LogManager
> >> >> >>> > >        at
> >> >> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> >> >> >>> > >        at
> java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >> >> >>> > >        at
> >> >> >>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> >> >> >>> > >        at
> java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >> >> >>> > >        ... 1 more
> >> >> >>> > >
> >> >> >>> > >I was able to fix it by including the corresponding jar in the
> >> >> bundle.
> >> >> >>> > >
> >> >> >>> > >After fixing the above, still I am getting the NPE even though
> >> the
> >> >> >>> > template
> >> >> >>> > >is bundled in the jar.
> >> >> >>> > >
> >> >> >>> > >On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <
> >> vinoth@apache.org>
> >> >> >>> > wrote:
> >> >> >>> > >
> >> >> >>> > >> Hi Pratyaksh,
> >> >> >>> > >>
> >> >> >>> > >> HveIncrementalPuller is just a java program. Does not need
> >> Spark,
> >> >> >>> since
> >> >> >>> > it
> >> >> >>> > >> just runs a HiveQL remotely..
> >> >> >>> > >>
> >> >> >>> > >> On the error you specified, seems like it can't find the
> >> template?
> >> >> >>> Can
> >> >> >>> > you
> >> >> >>> > >> see if the bundle does not have the template file.. May be
> this
> >> >> got
> >> >> >>> > broken
> >> >> >>> > >> during the bundling changes.. (since its no longer part of
> the
> >> >> >>> resources
> >> >> >>> > >> folder of the bundle module).. We should also probably be
> >> >> throwing a
> >> >> >>> > better
> >> >> >>> > >> error than NPE..
> >> >> >>> > >>
> >> >> >>> > >> We can raise a JIRA, once you confirm.
> >> >> >>> > >>
> >> >> >>> > >> String templateContent =
> >> >> >>> > >>
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
> >> >> >>> > >>
> >> >> >>> > >>
> >> >> >>> > >> On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
> >> >> >>> pratyaksh13@gmail.com
> >> >> >>> > >
> >> >> >>> > >> wrote:
> >> >> >>> > >>
> >> >> >>> > >> > Hi,
> >> >> >>> > >> >
> >> >> >>> > >> > Can someone guide me or share some documentation regarding
> >> how
> >> >> to
> >> >> >>> use
> >> >> >>> > >> > HiveIncrementalPuller. I already went through the
> >> documentation
> >> >> on
> >> >> >>> > >> > https://hudi.apache.org/querying_data.html. I tried using
> >> this
> >> >> >>> puller
> >> >> >>> > >> > using
> >> >> >>> > >> > the below command and facing the given exception.
> >> >> >>> > >> >
> >> >> >>> > >> > Any leads are appreciated.
> >> >> >>> > >> >
> >> >> >>> > >> > Command -
> >> >> >>> > >> > spark-submit --name incremental-puller --queue etl --files
> >> >> >>> > >> > incremental_sql.txt --master yarn --deploy-mode cluster
> >> >> >>> > --driver-memory
> >> >> >>> > >> 4g
> >> >> >>> > >> > --executor-memory 4g --num-executors 2 --class
> >> >> >>> > >> > org.apache.hudi.utilities.HiveIncrementalPuller
> >> >> >>> > >> > hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
> >> >> >>> > >> > jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass
> <pass>
> >> >> >>> > >> > --extractSQLFile incremental_sql.txt --sourceDb
> <source_db>
> >> >> >>> > --sourceTable
> >> >> >>> > >> > <src_table> --targetDb tmp --targetTable tempTable
> >> >> >>> --fromCommitTime 0
> >> >> >>> > >> > --maxCommits 1
> >> >> >>> > >> >
> >> >> >>> > >> > Error -
> >> >> >>> > >> >
> >> >> >>> > >> > java.lang.NullPointerException
> >> >> >>> > >> > at
> >> >> >>> org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
> >> >> >>> > >> > at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
> >> >> >>> > >> > at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
> >> >> >>> > >> > at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
> >> >> >>> > >> > at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>
> >> >>
> >>
>

Re:Re: Re: Re: Re: Facing issues when using HiveIncrementalPuller

Posted by lamberken <la...@163.com>.

Hi @Vinoth,


I'm willing to solve this problem. I'm trying to find out from the history when hudi-utilities-bundle becoming not a fatjar.



Git History
2019-08-29 FAT-JAR ---> 5f9fa82f47e1cc14a22b869250fe23c8f9c033cd
2019-09-14 NOT-FATJAR ---> d2525c31b7dad7bae2d4899d8df2a353ca39af50
best,
lamber-ken


At 2020-01-01 09:15:01, "Vinoth Chandar" <vi...@apache.org> wrote:
>This does sound like a fair bit of pain.
>I am wondering if it makes sense to change the integ-test setup/docker demo
>to use incremental  puller. Bunch of the packaging issues around jars, seem
>like regressions that the hudi-utilities is not a fat jar anymore?
>
>if there are nt any takers, I can also try my hand at fixing this, once I
>get done with few things on my end. left a comment on HUDI-485
>
>
>
>On Tue, Dec 31, 2019 at 4:19 PM lamberken <la...@163.com> wrote:
>
>>
>>
>> Hi @Pratyaksh Sharma,
>>
>>
>> Thanks for your detail stackstrace and reproduce steps. And your
>> suggestion is reasonable.
>>
>>
>> 1, For NPE issue, please tracking pr #1167 <
>> https://github.com/apache/incubator-hudi/pull/1167>
>> 2, For TTransportException issue, I have a question that can other
>> statements be executed except create statement?
>>
>>
>> best,
>> lamber-ken
>>
>> At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pr...@gmail.com> wrote:
>> >Thank you Lamberken, the above issue gets resolved with what you
>> suggested.
>> >However, still HiveIncrementalPuller is not working.
>> >Subsequently I found and fixed a bug raised here -
>> >https://issues.apache.org/jira/browse/HUDI-485.
>> >
>> >Currently I am facing the below exception when trying to run the create
>> >table statement on docker cluster. Any leads for solving this are welcome
>> -
>> >
>> >6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
>> >Exception when executing SQL
>> >
>> >java.sql.SQLException: org.apache.thrift.transport.TTransportException
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)
>> >
>> >at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
>> >
>> >Caused by: org.apache.thrift.transport.TTransportException
>> >
>> >at
>>
>> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>> >
>> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>> >
>> >at
>> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
>> >
>> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>> >
>> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>> >
>> >at
>>
>> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)
>> >
>> >at
>>
>> >org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)
>> >
>> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >
>> >at
>>
>> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >
>> >at
>>
>> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> >at java.lang.reflect.Method.invoke(Method.java:498)
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
>> >
>> >at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)
>> >
>> >... 5 more
>> >
>> >6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  - Could
>> >not close the resultset opened
>> >
>> >java.sql.SQLException: org.apache.thrift.transport.TTransportException
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)
>> >
>> >at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
>> >
>> >Caused by: org.apache.thrift.transport.TTransportException
>> >
>> >at
>>
>> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>> >
>> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>> >
>> >at
>> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
>> >
>> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>> >
>> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>> >
>> >at
>>
>> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)
>> >
>> >at
>>
>> >org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)
>> >
>> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >
>> >at
>>
>> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >
>> >at
>>
>> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> >at java.lang.reflect.Method.invoke(Method.java:498)
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
>> >
>> >at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)
>> >
>> >... 3 more
>> >
>> >Also the documentation does not mention the jars which need to be passed
>> >externally in classPath for executing above tool. We should upgrade the
>> >documentation to list down the jars so that it becomes easier for a new
>> >user to use this tool. I spent a lot of time adding all the jars
>> >incrementally. This jira (https://issues.apache.org/jira/browse/HUDI-486)
>> >tracks this.
>> >
>> >On Mon, Dec 30, 2019 at 5:35 PM lamberken <la...@163.com> wrote:
>> >
>> >>
>> >>
>> >> Hi @Pratyaksh Sharma
>> >>
>> >>
>> >> Thanks for your steps to reproduce this issue. Try to modify bellow
>> codes,
>> >> and test again.
>> >>
>> >>
>> >> org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
>> >> --------------------------------- / String templateContent =
>> >>
>> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
>> >> Changed to
>> >> / --------------------------------- / String templateContent =
>> >>
>> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
>> >> best,
>> >> lamber-ken
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pr...@gmail.com>
>> wrote:
>> >> >Hi Vinoth,
>> >> >
>> >> >I am able to reproduce this error on docker setup and have filed a
>> jira -
>> >> >https://issues.apache.org/jira/browse/HUDI-484.
>> >> >
>> >> >Steps to reproduce are mentioned in the jira description itself.
>> >> >
>> >> >On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <
>> pratyaksh13@gmail.com>
>> >> >wrote:
>> >> >
>> >> >> Hi Vinoth,
>> >> >>
>> >> >> I will try to reproduce the error on docker cluster and keep you
>> >> updated.
>> >> >>
>> >> >> On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <vi...@apache.org>
>> >> wrote:
>> >> >>
>> >> >>> Pratyaksh,
>> >> >>>
>> >> >>> If you are still having this issue, could you try reproducing this
>> on
>> >> the
>> >> >>> docker setup
>> >> >>>
>> >> >>>
>> >>
>> https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
>> >> >>> similar to this and raise a JIRA.
>> >> >>> Happy to look into it and get it fixed if needed
>> >> >>>
>> >> >>> Thanks
>> >> >>> Vinoth
>> >> >>>
>> >> >>> On Tue, Dec 24, 2019 at 8:43 AM lamberken <la...@163.com>
>> wrote:
>> >> >>>
>> >> >>> >
>> >> >>> >
>> >> >>> > Hi, @Pratyaksh Sharma
>> >> >>> >
>> >> >>> >
>> >> >>> > The log4j-1.2.17.jar lib also needs to added to the classpath, for
>> >> >>> example:
>> >> >>> > java -cp
>> >> >>> >
>> >> >>>
>> >>
>> /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
>> >> >>> > org.apache.hudi.utilities.HiveIncrementalPuller --help
>> >> >>> >
>> >> >>> >
>> >> >>> > best,
>> >> >>> > lamber-ken
>> >> >>> >
>> >> >>> > At 2019-12-24 17:23:20, "Pratyaksh Sharma" <pratyaksh13@gmail.com
>> >
>> >> >>> wrote:
>> >> >>> > >Hi Vinoth,
>> >> >>> > >
>> >> >>> > >Sorry my bad, I did not realise earlier that spark is not needed
>> for
>> >> >>> this
>> >> >>> > >class. I tried running it with the below command to get the
>> >> mentioned
>> >> >>> > >exception -
>> >> >>> > >
>> >> >>> > >Command -
>> >> >>> > >
>> >> >>> > >java -cp
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >>
>> >/path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
>> >> >>> > >org.apache.hudi.utilities.HiveIncrementalPuller --help
>> >> >>> > >
>> >> >>> > >Exception -
>> >> >>> > >Exception in thread "main" java.lang.NoClassDefFoundError:
>> >> >>> > >org/apache/log4j/LogManager
>> >> >>> > >        at
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
>> >> >>> > >Caused by: java.lang.ClassNotFoundException:
>> >> >>> org.apache.log4j.LogManager
>> >> >>> > >        at
>> >> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>> >> >>> > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >> >>> > >        at
>> >> >>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>> >> >>> > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >> >>> > >        ... 1 more
>> >> >>> > >
>> >> >>> > >I was able to fix it by including the corresponding jar in the
>> >> bundle.
>> >> >>> > >
>> >> >>> > >After fixing the above, still I am getting the NPE even though
>> the
>> >> >>> > template
>> >> >>> > >is bundled in the jar.
>> >> >>> > >
>> >> >>> > >On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <
>> vinoth@apache.org>
>> >> >>> > wrote:
>> >> >>> > >
>> >> >>> > >> Hi Pratyaksh,
>> >> >>> > >>
>> >> >>> > >> HveIncrementalPuller is just a java program. Does not need
>> Spark,
>> >> >>> since
>> >> >>> > it
>> >> >>> > >> just runs a HiveQL remotely..
>> >> >>> > >>
>> >> >>> > >> On the error you specified, seems like it can't find the
>> template?
>> >> >>> Can
>> >> >>> > you
>> >> >>> > >> see if the bundle does not have the template file.. May be this
>> >> got
>> >> >>> > broken
>> >> >>> > >> during the bundling changes.. (since its no longer part of the
>> >> >>> resources
>> >> >>> > >> folder of the bundle module).. We should also probably be
>> >> throwing a
>> >> >>> > better
>> >> >>> > >> error than NPE..
>> >> >>> > >>
>> >> >>> > >> We can raise a JIRA, once you confirm.
>> >> >>> > >>
>> >> >>> > >> String templateContent =
>> >> >>> > >>
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >>
>> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
>> >> >>> > >>
>> >> >>> > >>
>> >> >>> > >> On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
>> >> >>> pratyaksh13@gmail.com
>> >> >>> > >
>> >> >>> > >> wrote:
>> >> >>> > >>
>> >> >>> > >> > Hi,
>> >> >>> > >> >
>> >> >>> > >> > Can someone guide me or share some documentation regarding
>> how
>> >> to
>> >> >>> use
>> >> >>> > >> > HiveIncrementalPuller. I already went through the
>> documentation
>> >> on
>> >> >>> > >> > https://hudi.apache.org/querying_data.html. I tried using
>> this
>> >> >>> puller
>> >> >>> > >> > using
>> >> >>> > >> > the below command and facing the given exception.
>> >> >>> > >> >
>> >> >>> > >> > Any leads are appreciated.
>> >> >>> > >> >
>> >> >>> > >> > Command -
>> >> >>> > >> > spark-submit --name incremental-puller --queue etl --files
>> >> >>> > >> > incremental_sql.txt --master yarn --deploy-mode cluster
>> >> >>> > --driver-memory
>> >> >>> > >> 4g
>> >> >>> > >> > --executor-memory 4g --num-executors 2 --class
>> >> >>> > >> > org.apache.hudi.utilities.HiveIncrementalPuller
>> >> >>> > >> > hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
>> >> >>> > >> > jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass <pass>
>> >> >>> > >> > --extractSQLFile incremental_sql.txt --sourceDb <source_db>
>> >> >>> > --sourceTable
>> >> >>> > >> > <src_table> --targetDb tmp --targetTable tempTable
>> >> >>> --fromCommitTime 0
>> >> >>> > >> > --maxCommits 1
>> >> >>> > >> >
>> >> >>> > >> > Error -
>> >> >>> > >> >
>> >> >>> > >> > java.lang.NullPointerException
>> >> >>> > >> > at
>> >> >>> org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
>> >> >>> > >> > at
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
>> >> >>> > >> > at
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
>> >> >>> > >> > at
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
>> >> >>> > >> > at
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>
>> >>
>>

Re: Re: Re: Re: Re: Facing issues when using HiveIncrementalPuller

Posted by Vinoth Chandar <vi...@apache.org>.
@pratyaksh
>>Now the task is only to set the ulimits in docker-compose file itself for
hiveserver.
Hmmm. those many files get created? Please go ahead with the PR.

On the utility, I posted a braindump on one of the issues open. Its a
simply utility that just automates running a Hive query by hand. Lets just
redo it as we need it to be :)


On Fri, Jan 3, 2020 at 8:33 PM lamberken <la...@163.com> wrote:

>
>
> hi Vinoth Chandar / Pratyaksh Sharma,
>
>
> I reseted many commits from git and check whether HiveIncrementalPuller
> works normally. It seems that HiveIncrementalPuller has been working
> abnormallyis for a long time.
>
>
> For detail reproduce steps, please visit HUDI-486 <
> https://issues.apache.org/jira/browse/HUDI-486>
>
>
> best,
> lamber-ken
>
>
>
>
>
> At 2020-01-01 09:15:01, "Vinoth Chandar" <vi...@apache.org> wrote:
> >This does sound like a fair bit of pain.
> >I am wondering if it makes sense to change the integ-test setup/docker
> demo
> >to use incremental  puller. Bunch of the packaging issues around jars,
> seem
> >like regressions that the hudi-utilities is not a fat jar anymore?
> >
> >if there are nt any takers, I can also try my hand at fixing this, once I
> >get done with few things on my end. left a comment on HUDI-485
> >
> >
> >
> >On Tue, Dec 31, 2019 at 4:19 PM lamberken <la...@163.com> wrote:
> >
> >>
> >>
> >> Hi @Pratyaksh Sharma,
> >>
> >>
> >> Thanks for your detail stackstrace and reproduce steps. And your
> >> suggestion is reasonable.
> >>
> >>
> >> 1, For NPE issue, please tracking pr #1167 <
> >> https://github.com/apache/incubator-hudi/pull/1167>
> >> 2, For TTransportException issue, I have a question that can other
> >> statements be executed except create statement?
> >>
> >>
> >> best,
> >> lamber-ken
> >>
> >> At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pr...@gmail.com>
> wrote:
> >> >Thank you Lamberken, the above issue gets resolved with what you
> >> suggested.
> >> >However, still HiveIncrementalPuller is not working.
> >> >Subsequently I found and fixed a bug raised here -
> >> >https://issues.apache.org/jira/browse/HUDI-485.
> >> >
> >> >Currently I am facing the below exception when trying to run the create
> >> >table statement on docker cluster. Any leads for solving this are
> welcome
> >> -
> >> >
> >> >6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
> >> >Exception when executing SQL
> >> >
> >> >java.sql.SQLException: org.apache.thrift.transport.TTransportException
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)
> >> >
> >> >at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
> >> >
> >> >Caused by: org.apache.thrift.transport.TTransportException
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> >> >
> >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
> >> >
> >> >at
> >> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
> >> >
> >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
> >> >
> >> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)
> >> >
> >> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >
> >> >at
> >>
> >>
> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >> >
> >> >at
> >>
> >>
> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> >
> >> >at java.lang.reflect.Method.invoke(Method.java:498)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
> >> >
> >> >at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)
> >> >
> >> >... 5 more
> >> >
> >> >6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
> Could
> >> >not close the resultset opened
> >> >
> >> >java.sql.SQLException: org.apache.thrift.transport.TTransportException
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)
> >> >
> >> >at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)
> >> >
> >> >at
> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
> >> >
> >> >Caused by: org.apache.thrift.transport.TTransportException
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> >> >
> >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
> >> >
> >> >at
> >> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
> >> >
> >> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
> >> >
> >> >at
> >>
> >>
> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
> >> >
> >> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)
> >> >
> >> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >
> >> >at
> >>
> >>
> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >> >
> >> >at
> >>
> >>
> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> >
> >> >at java.lang.reflect.Method.invoke(Method.java:498)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
> >> >
> >> >at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)
> >> >
> >> >at
> >>
> >>
> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)
> >> >
> >> >... 3 more
> >> >
> >> >Also the documentation does not mention the jars which need to be
> passed
> >> >externally in classPath for executing above tool. We should upgrade the
> >> >documentation to list down the jars so that it becomes easier for a new
> >> >user to use this tool. I spent a lot of time adding all the jars
> >> >incrementally. This jira (
> https://issues.apache.org/jira/browse/HUDI-486)
> >> >tracks this.
> >> >
> >> >On Mon, Dec 30, 2019 at 5:35 PM lamberken <la...@163.com> wrote:
> >> >
> >> >>
> >> >>
> >> >> Hi @Pratyaksh Sharma
> >> >>
> >> >>
> >> >> Thanks for your steps to reproduce this issue. Try to modify bellow
> >> codes,
> >> >> and test again.
> >> >>
> >> >>
> >> >>
> org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
> >> >> --------------------------------- / String templateContent =
> >> >>
> >>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
> >> >> Changed to
> >> >> / --------------------------------- / String templateContent =
> >> >>
> >>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
> >> >> best,
> >> >> lamber-ken
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pr...@gmail.com>
> >> wrote:
> >> >> >Hi Vinoth,
> >> >> >
> >> >> >I am able to reproduce this error on docker setup and have filed a
> >> jira -
> >> >> >https://issues.apache.org/jira/browse/HUDI-484.
> >> >> >
> >> >> >Steps to reproduce are mentioned in the jira description itself.
> >> >> >
> >> >> >On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <
> >> pratyaksh13@gmail.com>
> >> >> >wrote:
> >> >> >
> >> >> >> Hi Vinoth,
> >> >> >>
> >> >> >> I will try to reproduce the error on docker cluster and keep you
> >> >> updated.
> >> >> >>
> >> >> >> On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <
> vinoth@apache.org>
> >> >> wrote:
> >> >> >>
> >> >> >>> Pratyaksh,
> >> >> >>>
> >> >> >>> If you are still having this issue, could you try reproducing
> this
> >> on
> >> >> the
> >> >> >>> docker setup
> >> >> >>>
> >> >> >>>
> >> >>
> >>
> https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
> >> >> >>> similar to this and raise a JIRA.
> >> >> >>> Happy to look into it and get it fixed if needed
> >> >> >>>
> >> >> >>> Thanks
> >> >> >>> Vinoth
> >> >> >>>
> >> >> >>> On Tue, Dec 24, 2019 at 8:43 AM lamberken <la...@163.com>
> >> wrote:
> >> >> >>>
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > Hi, @Pratyaksh Sharma
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > The log4j-1.2.17.jar lib also needs to added to the classpath,
> for
> >> >> >>> example:
> >> >> >>> > java -cp
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> >> >> >>> > org.apache.hudi.utilities.HiveIncrementalPuller --help
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > best,
> >> >> >>> > lamber-ken
> >> >> >>> >
> >> >> >>> > At 2019-12-24 17:23:20, "Pratyaksh Sharma" <
> pratyaksh13@gmail.com
> >> >
> >> >> >>> wrote:
> >> >> >>> > >Hi Vinoth,
> >> >> >>> > >
> >> >> >>> > >Sorry my bad, I did not realise earlier that spark is not
> needed
> >> for
> >> >> >>> this
> >> >> >>> > >class. I tried running it with the below command to get the
> >> >> mentioned
> >> >> >>> > >exception -
> >> >> >>> > >
> >> >> >>> > >Command -
> >> >> >>> > >
> >> >> >>> > >java -cp
> >> >> >>> >
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> >/path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> >> >> >>> > >org.apache.hudi.utilities.HiveIncrementalPuller --help
> >> >> >>> > >
> >> >> >>> > >Exception -
> >> >> >>> > >Exception in thread "main" java.lang.NoClassDefFoundError:
> >> >> >>> > >org/apache/log4j/LogManager
> >> >> >>> > >        at
> >> >> >>> >
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
> >> >> >>> > >Caused by: java.lang.ClassNotFoundException:
> >> >> >>> org.apache.log4j.LogManager
> >> >> >>> > >        at
> >> >> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> >> >> >>> > >        at
> java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >> >> >>> > >        at
> >> >> >>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> >> >> >>> > >        at
> java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >> >> >>> > >        ... 1 more
> >> >> >>> > >
> >> >> >>> > >I was able to fix it by including the corresponding jar in the
> >> >> bundle.
> >> >> >>> > >
> >> >> >>> > >After fixing the above, still I am getting the NPE even though
> >> the
> >> >> >>> > template
> >> >> >>> > >is bundled in the jar.
> >> >> >>> > >
> >> >> >>> > >On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <
> >> vinoth@apache.org>
> >> >> >>> > wrote:
> >> >> >>> > >
> >> >> >>> > >> Hi Pratyaksh,
> >> >> >>> > >>
> >> >> >>> > >> HveIncrementalPuller is just a java program. Does not need
> >> Spark,
> >> >> >>> since
> >> >> >>> > it
> >> >> >>> > >> just runs a HiveQL remotely..
> >> >> >>> > >>
> >> >> >>> > >> On the error you specified, seems like it can't find the
> >> template?
> >> >> >>> Can
> >> >> >>> > you
> >> >> >>> > >> see if the bundle does not have the template file.. May be
> this
> >> >> got
> >> >> >>> > broken
> >> >> >>> > >> during the bundling changes.. (since its no longer part of
> the
> >> >> >>> resources
> >> >> >>> > >> folder of the bundle module).. We should also probably be
> >> >> throwing a
> >> >> >>> > better
> >> >> >>> > >> error than NPE..
> >> >> >>> > >>
> >> >> >>> > >> We can raise a JIRA, once you confirm.
> >> >> >>> > >>
> >> >> >>> > >> String templateContent =
> >> >> >>> > >>
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
> >> >> >>> > >>
> >> >> >>> > >>
> >> >> >>> > >> On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
> >> >> >>> pratyaksh13@gmail.com
> >> >> >>> > >
> >> >> >>> > >> wrote:
> >> >> >>> > >>
> >> >> >>> > >> > Hi,
> >> >> >>> > >> >
> >> >> >>> > >> > Can someone guide me or share some documentation regarding
> >> how
> >> >> to
> >> >> >>> use
> >> >> >>> > >> > HiveIncrementalPuller. I already went through the
> >> documentation
> >> >> on
> >> >> >>> > >> > https://hudi.apache.org/querying_data.html. I tried using
> >> this
> >> >> >>> puller
> >> >> >>> > >> > using
> >> >> >>> > >> > the below command and facing the given exception.
> >> >> >>> > >> >
> >> >> >>> > >> > Any leads are appreciated.
> >> >> >>> > >> >
> >> >> >>> > >> > Command -
> >> >> >>> > >> > spark-submit --name incremental-puller --queue etl --files
> >> >> >>> > >> > incremental_sql.txt --master yarn --deploy-mode cluster
> >> >> >>> > --driver-memory
> >> >> >>> > >> 4g
> >> >> >>> > >> > --executor-memory 4g --num-executors 2 --class
> >> >> >>> > >> > org.apache.hudi.utilities.HiveIncrementalPuller
> >> >> >>> > >> > hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
> >> >> >>> > >> > jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass
> <pass>
> >> >> >>> > >> > --extractSQLFile incremental_sql.txt --sourceDb
> <source_db>
> >> >> >>> > --sourceTable
> >> >> >>> > >> > <src_table> --targetDb tmp --targetTable tempTable
> >> >> >>> --fromCommitTime 0
> >> >> >>> > >> > --maxCommits 1
> >> >> >>> > >> >
> >> >> >>> > >> > Error -
> >> >> >>> > >> >
> >> >> >>> > >> > java.lang.NullPointerException
> >> >> >>> > >> > at
> >> >> >>> org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
> >> >> >>> > >> > at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
> >> >> >>> > >> > at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
> >> >> >>> > >> > at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
> >> >> >>> > >> > at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>
> >> >>
> >>
>

Re:Re: Re: Re: Re: Facing issues when using HiveIncrementalPuller

Posted by lamberken <la...@163.com>.

hi Vinoth Chandar / Pratyaksh Sharma,


I reseted many commits from git and check whether HiveIncrementalPuller works normally. It seems that HiveIncrementalPuller has been working abnormallyis for a long time.


For detail reproduce steps, please visit HUDI-486 <https://issues.apache.org/jira/browse/HUDI-486>


best,
lamber-ken





At 2020-01-01 09:15:01, "Vinoth Chandar" <vi...@apache.org> wrote:
>This does sound like a fair bit of pain.
>I am wondering if it makes sense to change the integ-test setup/docker demo
>to use incremental  puller. Bunch of the packaging issues around jars, seem
>like regressions that the hudi-utilities is not a fat jar anymore?
>
>if there are nt any takers, I can also try my hand at fixing this, once I
>get done with few things on my end. left a comment on HUDI-485
>
>
>
>On Tue, Dec 31, 2019 at 4:19 PM lamberken <la...@163.com> wrote:
>
>>
>>
>> Hi @Pratyaksh Sharma,
>>
>>
>> Thanks for your detail stackstrace and reproduce steps. And your
>> suggestion is reasonable.
>>
>>
>> 1, For NPE issue, please tracking pr #1167 <
>> https://github.com/apache/incubator-hudi/pull/1167>
>> 2, For TTransportException issue, I have a question that can other
>> statements be executed except create statement?
>>
>>
>> best,
>> lamber-ken
>>
>> At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pr...@gmail.com> wrote:
>> >Thank you Lamberken, the above issue gets resolved with what you
>> suggested.
>> >However, still HiveIncrementalPuller is not working.
>> >Subsequently I found and fixed a bug raised here -
>> >https://issues.apache.org/jira/browse/HUDI-485.
>> >
>> >Currently I am facing the below exception when trying to run the create
>> >table statement on docker cluster. Any leads for solving this are welcome
>> -
>> >
>> >6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
>> >Exception when executing SQL
>> >
>> >java.sql.SQLException: org.apache.thrift.transport.TTransportException
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)
>> >
>> >at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
>> >
>> >Caused by: org.apache.thrift.transport.TTransportException
>> >
>> >at
>>
>> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>> >
>> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>> >
>> >at
>> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
>> >
>> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>> >
>> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>> >
>> >at
>>
>> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)
>> >
>> >at
>>
>> >org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)
>> >
>> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >
>> >at
>>
>> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >
>> >at
>>
>> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> >at java.lang.reflect.Method.invoke(Method.java:498)
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
>> >
>> >at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)
>> >
>> >... 5 more
>> >
>> >6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  - Could
>> >not close the resultset opened
>> >
>> >java.sql.SQLException: org.apache.thrift.transport.TTransportException
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)
>> >
>> >at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)
>> >
>> >at
>>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
>> >
>> >Caused by: org.apache.thrift.transport.TTransportException
>> >
>> >at
>>
>> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>> >
>> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
>> >
>> >at
>> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
>> >
>> >at
>>
>> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
>> >
>> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
>> >
>> >at
>>
>> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
>> >
>> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>> >
>> >at
>>
>> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)
>> >
>> >at
>>
>> >org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)
>> >
>> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >
>> >at
>>
>> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >
>> >at
>>
>> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> >at java.lang.reflect.Method.invoke(Method.java:498)
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
>> >
>> >at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)
>> >
>> >at
>>
>> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)
>> >
>> >... 3 more
>> >
>> >Also the documentation does not mention the jars which need to be passed
>> >externally in classPath for executing above tool. We should upgrade the
>> >documentation to list down the jars so that it becomes easier for a new
>> >user to use this tool. I spent a lot of time adding all the jars
>> >incrementally. This jira (https://issues.apache.org/jira/browse/HUDI-486)
>> >tracks this.
>> >
>> >On Mon, Dec 30, 2019 at 5:35 PM lamberken <la...@163.com> wrote:
>> >
>> >>
>> >>
>> >> Hi @Pratyaksh Sharma
>> >>
>> >>
>> >> Thanks for your steps to reproduce this issue. Try to modify bellow
>> codes,
>> >> and test again.
>> >>
>> >>
>> >> org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
>> >> --------------------------------- / String templateContent =
>> >>
>> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
>> >> Changed to
>> >> / --------------------------------- / String templateContent =
>> >>
>> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
>> >> best,
>> >> lamber-ken
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pr...@gmail.com>
>> wrote:
>> >> >Hi Vinoth,
>> >> >
>> >> >I am able to reproduce this error on docker setup and have filed a
>> jira -
>> >> >https://issues.apache.org/jira/browse/HUDI-484.
>> >> >
>> >> >Steps to reproduce are mentioned in the jira description itself.
>> >> >
>> >> >On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <
>> pratyaksh13@gmail.com>
>> >> >wrote:
>> >> >
>> >> >> Hi Vinoth,
>> >> >>
>> >> >> I will try to reproduce the error on docker cluster and keep you
>> >> updated.
>> >> >>
>> >> >> On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <vi...@apache.org>
>> >> wrote:
>> >> >>
>> >> >>> Pratyaksh,
>> >> >>>
>> >> >>> If you are still having this issue, could you try reproducing this
>> on
>> >> the
>> >> >>> docker setup
>> >> >>>
>> >> >>>
>> >>
>> https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
>> >> >>> similar to this and raise a JIRA.
>> >> >>> Happy to look into it and get it fixed if needed
>> >> >>>
>> >> >>> Thanks
>> >> >>> Vinoth
>> >> >>>
>> >> >>> On Tue, Dec 24, 2019 at 8:43 AM lamberken <la...@163.com>
>> wrote:
>> >> >>>
>> >> >>> >
>> >> >>> >
>> >> >>> > Hi, @Pratyaksh Sharma
>> >> >>> >
>> >> >>> >
>> >> >>> > The log4j-1.2.17.jar lib also needs to added to the classpath, for
>> >> >>> example:
>> >> >>> > java -cp
>> >> >>> >
>> >> >>>
>> >>
>> /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
>> >> >>> > org.apache.hudi.utilities.HiveIncrementalPuller --help
>> >> >>> >
>> >> >>> >
>> >> >>> > best,
>> >> >>> > lamber-ken
>> >> >>> >
>> >> >>> > At 2019-12-24 17:23:20, "Pratyaksh Sharma" <pratyaksh13@gmail.com
>> >
>> >> >>> wrote:
>> >> >>> > >Hi Vinoth,
>> >> >>> > >
>> >> >>> > >Sorry my bad, I did not realise earlier that spark is not needed
>> for
>> >> >>> this
>> >> >>> > >class. I tried running it with the below command to get the
>> >> mentioned
>> >> >>> > >exception -
>> >> >>> > >
>> >> >>> > >Command -
>> >> >>> > >
>> >> >>> > >java -cp
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >>
>> >/path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
>> >> >>> > >org.apache.hudi.utilities.HiveIncrementalPuller --help
>> >> >>> > >
>> >> >>> > >Exception -
>> >> >>> > >Exception in thread "main" java.lang.NoClassDefFoundError:
>> >> >>> > >org/apache/log4j/LogManager
>> >> >>> > >        at
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >>
>> >org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
>> >> >>> > >Caused by: java.lang.ClassNotFoundException:
>> >> >>> org.apache.log4j.LogManager
>> >> >>> > >        at
>> >> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>> >> >>> > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >> >>> > >        at
>> >> >>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>> >> >>> > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >> >>> > >        ... 1 more
>> >> >>> > >
>> >> >>> > >I was able to fix it by including the corresponding jar in the
>> >> bundle.
>> >> >>> > >
>> >> >>> > >After fixing the above, still I am getting the NPE even though
>> the
>> >> >>> > template
>> >> >>> > >is bundled in the jar.
>> >> >>> > >
>> >> >>> > >On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <
>> vinoth@apache.org>
>> >> >>> > wrote:
>> >> >>> > >
>> >> >>> > >> Hi Pratyaksh,
>> >> >>> > >>
>> >> >>> > >> HveIncrementalPuller is just a java program. Does not need
>> Spark,
>> >> >>> since
>> >> >>> > it
>> >> >>> > >> just runs a HiveQL remotely..
>> >> >>> > >>
>> >> >>> > >> On the error you specified, seems like it can't find the
>> template?
>> >> >>> Can
>> >> >>> > you
>> >> >>> > >> see if the bundle does not have the template file.. May be this
>> >> got
>> >> >>> > broken
>> >> >>> > >> during the bundling changes.. (since its no longer part of the
>> >> >>> resources
>> >> >>> > >> folder of the bundle module).. We should also probably be
>> >> throwing a
>> >> >>> > better
>> >> >>> > >> error than NPE..
>> >> >>> > >>
>> >> >>> > >> We can raise a JIRA, once you confirm.
>> >> >>> > >>
>> >> >>> > >> String templateContent =
>> >> >>> > >>
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >>
>> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
>> >> >>> > >>
>> >> >>> > >>
>> >> >>> > >> On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
>> >> >>> pratyaksh13@gmail.com
>> >> >>> > >
>> >> >>> > >> wrote:
>> >> >>> > >>
>> >> >>> > >> > Hi,
>> >> >>> > >> >
>> >> >>> > >> > Can someone guide me or share some documentation regarding
>> how
>> >> to
>> >> >>> use
>> >> >>> > >> > HiveIncrementalPuller. I already went through the
>> documentation
>> >> on
>> >> >>> > >> > https://hudi.apache.org/querying_data.html. I tried using
>> this
>> >> >>> puller
>> >> >>> > >> > using
>> >> >>> > >> > the below command and facing the given exception.
>> >> >>> > >> >
>> >> >>> > >> > Any leads are appreciated.
>> >> >>> > >> >
>> >> >>> > >> > Command -
>> >> >>> > >> > spark-submit --name incremental-puller --queue etl --files
>> >> >>> > >> > incremental_sql.txt --master yarn --deploy-mode cluster
>> >> >>> > --driver-memory
>> >> >>> > >> 4g
>> >> >>> > >> > --executor-memory 4g --num-executors 2 --class
>> >> >>> > >> > org.apache.hudi.utilities.HiveIncrementalPuller
>> >> >>> > >> > hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
>> >> >>> > >> > jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass <pass>
>> >> >>> > >> > --extractSQLFile incremental_sql.txt --sourceDb <source_db>
>> >> >>> > --sourceTable
>> >> >>> > >> > <src_table> --targetDb tmp --targetTable tempTable
>> >> >>> --fromCommitTime 0
>> >> >>> > >> > --maxCommits 1
>> >> >>> > >> >
>> >> >>> > >> > Error -
>> >> >>> > >> >
>> >> >>> > >> > java.lang.NullPointerException
>> >> >>> > >> > at
>> >> >>> org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
>> >> >>> > >> > at
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
>> >> >>> > >> > at
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
>> >> >>> > >> > at
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
>> >> >>> > >> > at
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>
>> >>
>>

Re: Re: Re: Re: Facing issues when using HiveIncrementalPuller

Posted by Vinoth Chandar <vi...@apache.org>.
This does sound like a fair bit of pain.
I am wondering if it makes sense to change the integ-test setup/docker demo
to use incremental  puller. Bunch of the packaging issues around jars, seem
like regressions that the hudi-utilities is not a fat jar anymore?

if there are nt any takers, I can also try my hand at fixing this, once I
get done with few things on my end. left a comment on HUDI-485



On Tue, Dec 31, 2019 at 4:19 PM lamberken <la...@163.com> wrote:

>
>
> Hi @Pratyaksh Sharma,
>
>
> Thanks for your detail stackstrace and reproduce steps. And your
> suggestion is reasonable.
>
>
> 1, For NPE issue, please tracking pr #1167 <
> https://github.com/apache/incubator-hudi/pull/1167>
> 2, For TTransportException issue, I have a question that can other
> statements be executed except create statement?
>
>
> best,
> lamber-ken
>
> At 2019-12-30 23:11:17, "Pratyaksh Sharma" <pr...@gmail.com> wrote:
> >Thank you Lamberken, the above issue gets resolved with what you
> suggested.
> >However, still HiveIncrementalPuller is not working.
> >Subsequently I found and fixed a bug raised here -
> >https://issues.apache.org/jira/browse/HUDI-485.
> >
> >Currently I am facing the below exception when trying to run the create
> >table statement on docker cluster. Any leads for solving this are welcome
> -
> >
> >6811 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  -
> >Exception when executing SQL
> >
> >java.sql.SQLException: org.apache.thrift.transport.TTransportException
> >
> >at
>
> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)
> >
> >at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
> >
> >at
>
> >org.apache.hudi.utilities.HiveIncrementalPuller.executeStatement(HiveIncrementalPuller.java:233)
> >
> >at
>
> >org.apache.hudi.utilities.HiveIncrementalPuller.executeIncrementalSQL(HiveIncrementalPuller.java:200)
> >
> >at
>
> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:157)
> >
> >at
>
> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
> >
> >Caused by: org.apache.thrift.transport.TTransportException
> >
> >at
>
> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> >
> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >
> >at
>
> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
> >
> >at
>
> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
> >
> >at
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
> >
> >at
>
> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
> >
> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >
> >at
>
> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
> >
> >at
>
> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
> >
> >at
>
> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
> >
> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> >
> >at
>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:467)
> >
> >at
>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:454)
> >
> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >at
>
> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >
> >at
>
> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> >at java.lang.reflect.Method.invoke(Method.java:498)
> >
> >at
>
> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
> >
> >at com.sun.proxy.$Proxy5.GetOperationStatus(Unknown Source)
> >
> >at
>
> >org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:367)
> >
> >... 5 more
> >
> >6812 [main] ERROR org.apache.hudi.utilities.HiveIncrementalPuller  - Could
> >not close the resultset opened
> >
> >java.sql.SQLException: org.apache.thrift.transport.TTransportException
> >
> >at
>
> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:214)
> >
> >at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:231)
> >
> >at
>
> >org.apache.hudi.utilities.HiveIncrementalPuller.saveDelta(HiveIncrementalPuller.java:165)
> >
> >at
>
> >org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:345)
> >
> >Caused by: org.apache.thrift.transport.TTransportException
> >
> >at
>
> >org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> >
> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >
> >at
>
> >org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
> >
> >at
>
> >org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
> >
> >at
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
> >
> >at
>
> >org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:38)
> >
> >at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> >
> >at
>
> >org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:425)
> >
> >at
>
> >org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:321)
> >
> >at
>
> >org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:225)
> >
> >at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> >
> >at
>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:513)
> >
> >at
>
> >org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:500)
> >
> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >at
>
> >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >
> >at
>
> >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> >at java.lang.reflect.Method.invoke(Method.java:498)
> >
> >at
>
> >org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1524)
> >
> >at com.sun.proxy.$Proxy5.CloseOperation(Unknown Source)
> >
> >at
>
> >org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:208)
> >
> >... 3 more
> >
> >Also the documentation does not mention the jars which need to be passed
> >externally in classPath for executing above tool. We should upgrade the
> >documentation to list down the jars so that it becomes easier for a new
> >user to use this tool. I spent a lot of time adding all the jars
> >incrementally. This jira (https://issues.apache.org/jira/browse/HUDI-486)
> >tracks this.
> >
> >On Mon, Dec 30, 2019 at 5:35 PM lamberken <la...@163.com> wrote:
> >
> >>
> >>
> >> Hi @Pratyaksh Sharma
> >>
> >>
> >> Thanks for your steps to reproduce this issue. Try to modify bellow
> codes,
> >> and test again.
> >>
> >>
> >> org.apache.hudi.utilities.HiveIncrementalPuller#HiveIncrementalPuller /
> >> --------------------------------- / String templateContent =
> >>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
> >> Changed to
> >> / --------------------------------- / String templateContent =
> >>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("/IncrementalPull.sqltemplate"));
> >> best,
> >> lamber-ken
> >>
> >>
> >>
> >>
> >>
> >> At 2019-12-30 19:25:08, "Pratyaksh Sharma" <pr...@gmail.com>
> wrote:
> >> >Hi Vinoth,
> >> >
> >> >I am able to reproduce this error on docker setup and have filed a
> jira -
> >> >https://issues.apache.org/jira/browse/HUDI-484.
> >> >
> >> >Steps to reproduce are mentioned in the jira description itself.
> >> >
> >> >On Thu, Dec 26, 2019 at 12:42 PM Pratyaksh Sharma <
> pratyaksh13@gmail.com>
> >> >wrote:
> >> >
> >> >> Hi Vinoth,
> >> >>
> >> >> I will try to reproduce the error on docker cluster and keep you
> >> updated.
> >> >>
> >> >> On Tue, Dec 24, 2019 at 11:23 PM Vinoth Chandar <vi...@apache.org>
> >> wrote:
> >> >>
> >> >>> Pratyaksh,
> >> >>>
> >> >>> If you are still having this issue, could you try reproducing this
> on
> >> the
> >> >>> docker setup
> >> >>>
> >> >>>
> >>
> https://hudi.apache.org/docker_demo.html#step-7--incremental-query-for-copy-on-write-table
> >> >>> similar to this and raise a JIRA.
> >> >>> Happy to look into it and get it fixed if needed
> >> >>>
> >> >>> Thanks
> >> >>> Vinoth
> >> >>>
> >> >>> On Tue, Dec 24, 2019 at 8:43 AM lamberken <la...@163.com>
> wrote:
> >> >>>
> >> >>> >
> >> >>> >
> >> >>> > Hi, @Pratyaksh Sharma
> >> >>> >
> >> >>> >
> >> >>> > The log4j-1.2.17.jar lib also needs to added to the classpath, for
> >> >>> example:
> >> >>> > java -cp
> >> >>> >
> >> >>>
> >>
> /path/to/hive-jdbc-2.3.1.jar:/path/to/log4j-1.2.17.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> >> >>> > org.apache.hudi.utilities.HiveIncrementalPuller --help
> >> >>> >
> >> >>> >
> >> >>> > best,
> >> >>> > lamber-ken
> >> >>> >
> >> >>> > At 2019-12-24 17:23:20, "Pratyaksh Sharma" <pratyaksh13@gmail.com
> >
> >> >>> wrote:
> >> >>> > >Hi Vinoth,
> >> >>> > >
> >> >>> > >Sorry my bad, I did not realise earlier that spark is not needed
> for
> >> >>> this
> >> >>> > >class. I tried running it with the below command to get the
> >> mentioned
> >> >>> > >exception -
> >> >>> > >
> >> >>> > >Command -
> >> >>> > >
> >> >>> > >java -cp
> >> >>> >
> >> >>> >
> >> >>>
> >>
> >/path/to/hive-jdbc-2.3.1.jar:packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar
> >> >>> > >org.apache.hudi.utilities.HiveIncrementalPuller --help
> >> >>> > >
> >> >>> > >Exception -
> >> >>> > >Exception in thread "main" java.lang.NoClassDefFoundError:
> >> >>> > >org/apache/log4j/LogManager
> >> >>> > >        at
> >> >>> >
> >> >>> >
> >> >>>
> >>
> >org.apache.hudi.utilities.HiveIncrementalPuller.<clinit>(HiveIncrementalPuller.java:64)
> >> >>> > >Caused by: java.lang.ClassNotFoundException:
> >> >>> org.apache.log4j.LogManager
> >> >>> > >        at
> >> java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> >> >>> > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >> >>> > >        at
> >> >>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> >> >>> > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >> >>> > >        ... 1 more
> >> >>> > >
> >> >>> > >I was able to fix it by including the corresponding jar in the
> >> bundle.
> >> >>> > >
> >> >>> > >After fixing the above, still I am getting the NPE even though
> the
> >> >>> > template
> >> >>> > >is bundled in the jar.
> >> >>> > >
> >> >>> > >On Mon, Dec 23, 2019 at 10:45 PM Vinoth Chandar <
> vinoth@apache.org>
> >> >>> > wrote:
> >> >>> > >
> >> >>> > >> Hi Pratyaksh,
> >> >>> > >>
> >> >>> > >> HveIncrementalPuller is just a java program. Does not need
> Spark,
> >> >>> since
> >> >>> > it
> >> >>> > >> just runs a HiveQL remotely..
> >> >>> > >>
> >> >>> > >> On the error you specified, seems like it can't find the
> template?
> >> >>> Can
> >> >>> > you
> >> >>> > >> see if the bundle does not have the template file.. May be this
> >> got
> >> >>> > broken
> >> >>> > >> during the bundling changes.. (since its no longer part of the
> >> >>> resources
> >> >>> > >> folder of the bundle module).. We should also probably be
> >> throwing a
> >> >>> > better
> >> >>> > >> error than NPE..
> >> >>> > >>
> >> >>> > >> We can raise a JIRA, once you confirm.
> >> >>> > >>
> >> >>> > >> String templateContent =
> >> >>> > >>
> >> >>> > >>
> >> >>> >
> >> >>>
> >>
> FileIOUtils.readAsUTFString(this.getClass().getResourceAsStream("IncrementalPull.sqltemplate"));
> >> >>> > >>
> >> >>> > >>
> >> >>> > >> On Mon, Dec 23, 2019 at 6:02 AM Pratyaksh Sharma <
> >> >>> pratyaksh13@gmail.com
> >> >>> > >
> >> >>> > >> wrote:
> >> >>> > >>
> >> >>> > >> > Hi,
> >> >>> > >> >
> >> >>> > >> > Can someone guide me or share some documentation regarding
> how
> >> to
> >> >>> use
> >> >>> > >> > HiveIncrementalPuller. I already went through the
> documentation
> >> on
> >> >>> > >> > https://hudi.apache.org/querying_data.html. I tried using
> this
> >> >>> puller
> >> >>> > >> > using
> >> >>> > >> > the below command and facing the given exception.
> >> >>> > >> >
> >> >>> > >> > Any leads are appreciated.
> >> >>> > >> >
> >> >>> > >> > Command -
> >> >>> > >> > spark-submit --name incremental-puller --queue etl --files
> >> >>> > >> > incremental_sql.txt --master yarn --deploy-mode cluster
> >> >>> > --driver-memory
> >> >>> > >> 4g
> >> >>> > >> > --executor-memory 4g --num-executors 2 --class
> >> >>> > >> > org.apache.hudi.utilities.HiveIncrementalPuller
> >> >>> > >> > hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --hiveUrl
> >> >>> > >> > jdbc:hive2://HOST:PORT/ --hiveUser <user> --hivePass <pass>
> >> >>> > >> > --extractSQLFile incremental_sql.txt --sourceDb <source_db>
> >> >>> > --sourceTable
> >> >>> > >> > <src_table> --targetDb tmp --targetTable tempTable
> >> >>> --fromCommitTime 0
> >> >>> > >> > --maxCommits 1
> >> >>> > >> >
> >> >>> > >> > Error -
> >> >>> > >> >
> >> >>> > >> > java.lang.NullPointerException
> >> >>> > >> > at
> >> >>> org.apache.hudi.common.util.FileIOUtils.copy(FileIOUtils.java:73)
> >> >>> > >> > at
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:66)
> >> >>> > >> > at
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >>
> org.apache.hudi.common.util.FileIOUtils.readAsUTFString(FileIOUtils.java:61)
> >> >>> > >> > at
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >>
> org.apache.hudi.utilities.HiveIncrementalPuller.<init>(HiveIncrementalPuller.java:113)
> >> >>> > >> > at
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >>
> org.apache.hudi.utilities.HiveIncrementalPuller.main(HiveIncrementalPuller.java:343)
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>
> >>
>