You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2020/10/01 00:39:00 UTC
[jira] [Commented] (HUDI-1289) Using hbase index in spark hangs in Hudi 0.6.0

    [ https://issues.apache.org/jira/browse/HUDI-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205173#comment-17205173 ] 

Vinoth Chandar commented on HUDI-1289:
--------------------------------------

[~vbalaji] do you remember why we had to shade hbase? was it proactively? 

> Using hbase index in spark hangs in Hudi 0.6.0
> ----------------------------------------------
>
>                 Key: HUDI-1289
>                 URL: https://issues.apache.org/jira/browse/HUDI-1289
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ryan Pifer
>            Priority: Major
>             Fix For: 0.6.1
>
>
> In Hudi 0.6.0 I can see that there was a change to shade the hbase dependencies in hudi-spark-bundle jar. When using HBASE index with only hudi-spark-bundle jar specified in spark session there are several issues:
>  
>  # Dependencies are not being correctly resolved:
> Hbase default status listener class value is defined by the class name before relocation
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$Listener at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2427) at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:656) ... 39 moreCaused by: java.lang.RuntimeException: class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$Listener at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2421) ... 40 more{code}
>  
> [https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java#L72-L73]
>  
> This can be fixed by overriding the status listener class in the hbase configuration used in hudi 
> {code:java}
> hbaseConfig.set("hbase.status.listener.class", "org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener"){code}
> [https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java#L134]
>  
> 2. After modifying the above, executors hang when trying to connect to hbase and fail after about 45 minutes
> {code:java}
> Caused by: org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:Thu Sep 17 23:59:42 UTC 2020, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68536: row 'hudiindex,12345678,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-10-81-236-56.ec2.internal,16020,1600130997457, seqNum=0
>  at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276) at org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:210) at org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210) at org.apache.hudi.org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:212) at org.apache.hudi.org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:186) at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1275) at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1181) at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1165) at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1122) at org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:957) at org.apache.hudi.org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83) at org.apache.hudi.org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:75) at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134) ... 35 more{code}
>  
> When investigating the executor logs I was able to find the following
> {code:java}
>  
> 20/09/18 21:35:48 TRACE TransportClient: Sending RPC to ip-10-31-253-39.ec2.internal/10.31.253.39:46825
> 20/09/18 21:35:48 TRACE TransportClient: Sending request RPC 7802669247197305083 to ip-10-31-253-39.ec2.internal/10.31.253.39:46825 took 0 ms
> 20/09/18 21:35:48 TRACE MessageDecoder: Received message RpcResponse: RpcResponse{requestId=7802669247197305083, body=NettyManagedBuffer{buf=SimpleLeakAwareByteBuf(PooledUnsafeDirectByteBuf(ridx: 21, widx: 102, cap: 128))}}
> 20/09/18 21:35:53 TRACE ZooKeeperRegistry: Looking up meta region location in ZK, connection=org.apache.hudi.org.apache.hadoop.hbase.client.ZooKeeperRegistry@268d3bae
> 20/09/18 21:35:53 TRACE ZKUtil: hconnection-0x4f596c31-0x10000036821007a, quorum=ip-10-31-253-39.ec2.internal:2181, baseZNode=/hbase Retrieved 51 byte(s) of data from znode /hbase/meta-region-server; data=PBUF\x0A)\x0A\x1Dip-10-16-254...
> 20/09/18 21:35:53 TRACE ZooKeeperRegistry: Looked up meta region location, connection=org.apache.hudi.org.apache.hadoop.hbase.client.ZooKeeperRegistry@268d3bae; servers = ip-10-16-254-233.ec2.internal,16020,1600298383776 
> 20/09/18 21:35:53 TRACE MetaCache: Merged cached locations: [region=hbase:meta,,1.1588230740, hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0]
> 20/09/18 21:35:53 DEBUG RpcClientImpl: Use SIMPLE authentication for service ClientService, sasl=false
> 20/09/18 21:35:53 DEBUG RpcClientImpl: Connecting to ip-10-16-254-233.ec2.internal/10.16.254.233:16020
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: starting, connections 1
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: marking at should close, reason: null
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: closing ipc connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: ipc connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 closed
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: stopped, connections 0
> 20/09/18 21:35:53 INFO RpcRetryingCaller: MESSAGE: Call to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 failed on local exception: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 is closing. Call id=418, waitTime=2
> 20/09/18 21:35:53 INFO RpcRetryingCaller: STACKTRACE[Ljava.lang.StackTraceElement;@20efcd07
> 20/09/18 21:35:53 INFO RpcRetryingCaller: CAUSE
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 is closing. Call id=418, waitTime=2
>  at org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1089)
>  at org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:865)
>  at org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:582)
> 20/09/18 21:35:53 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=38363 ms ago, cancelled=false, msg=row 'huditest,12345678,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0
> 20/09/18 21:35:53 TRACE MetaCache: Removed region=hbase:meta,,1.1588230740, hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0 from cache
>  
>  
> {code}
>  
> Even after adding the hbase jars to the session it will continue to hang. I was able to resolve the hanging issue by building the hudi spark bundle jar without shading the hbase related dependencies and adding them expliciting when launching my spark shell so it seems like a problem with relocation.
>  
> Example of able to use hbase index successfully:
> {code:java}
> spark-shell --jars /usr/lib/hudi/cli/lib/hbase-client-1.2.3.jar,/usr/lib/hudi/cli/lib/hbase-common-1.2.3.jar,usr/lib/hudi/cli/lib/hbase-protocol-1.2.3.jar,/usr/lib/hudi/cli/lib/htrace-core-3.1.0-incubating.jar,/usr/lib/hudi/cli/lib/metrics-core-2.2.0.jar,hudi-spark-bundle_2.11-0.6.0-amzn-0.jar,/usr/lib/spark/external/lib/spark-avro.jar --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.sql.hive.convertMetastoreParquet=false"
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)