You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by 蒋慧明 <hm...@samsung.com> on 2019/04/22 06:43:05 UTC

NotServingRegionException----RE: Re: KYLIN timeout problem

    Hi,
    
    
      adjust "kylin.metadata.hbase-rpc-timeout"
    
    
      This method does work.
    
    
     
    
    
    Another problem: When I tried to run kylin job, I encountered below problem. 
    
    
    I tried to check consistancy of kylin_metadata with "hbase hbck -details kylin_metadata". The result is "0 inconsistency"

`

 org.apache.kylin.engine.mr.exception.HadoopShellException:
java.lang.RuntimeException:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=1, exceptions:  
Mon Apr 22 14:30:23 GMT+08:00 2019,
RpcRetryingCaller{globalStartTime=1555914623296, pause=100, retries=1},
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region
kylin_metadata,/dict/FACT/DIM_TB/8b5cdf3e-8aa3-5c70-44bd-
fffdc6ed4d1a.dict,1555653527986.9fbc862f521b93968a3299c9d853992e. is not
online on ip-109-105-1-504.compute.internal,16020,1555901919971  
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3008)  
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1144)  
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2476)  
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2757)  
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)  
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)  
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)  
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)  
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)  
  
---  
      
    
    Does anyone knows how to solve this problem?
    
    
    Thanks a lot!





--------- **Original Message** ---------

**Sender** : JiaTao Tao  <ta...@gmail.com>

**Date** : 2019-04-19 10:54 (GMT+9)

**Title** : Re: KYLIN timeout problem



Hi

  

You can adjust "kylin.metadata.hbase-rpc-timeout" to a larger value. And then
run metadata/StorageCleanup, It will reduce the data in Hbase.

  

--  

  

Regards!

Aron Tao

  

蒋慧明 <[hm.jiang@samsung.com](mailto:hm.jiang@samsung.com)> 于2019年4月18日周四
下午2:35写道:  

> Hi,

>

> I met below error when I run cube in Kylin. It happened in the "#4 Step
Name: Build Dimension Dictionary"

>

>  
>

>  Tue Apr 16 14:18:06 GMT+08:00 2019,
RpcRetryingCaller{globalStartTime=1555395481041, pause=100, retries=1},
java.io.IOException: Call to [HBASE URL] failed on local exception:
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=15589,
waitTime=5001, operationTimeout=5000 expired.  
>  
> at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)  
> at
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)  
> ... 3 more  
> Caused by: java.io.IOException: Call to ip-10-10-110-102.cn-
north-1.compute.internal/[10.10.110.102:16020](http://10.10.110.102:16020)
failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
Call id=15589, waitTime=5001, operationTimeout=5000 expired.  
> at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:292)  
> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1274)  
> at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)  
> at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)  
> at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:35396)  
> at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:224)  
> at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:65)  
> at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212)  
>  
> ---  
>  
>  
>

> The normal dimension of this cube is about 25.

>

> Then I created a new cube with 3 normal dimensions and run it. The job is
successful.

>

>  
>

> When I tried to do metadata backup and metadata clean wich cmd
"metastore.sh", I also met the error:
"org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=15589,
waitTime=5001, operationTimeout=5000 expired".

>

>  
>

> Does anyone know about the root cause of this problem? And how to fix it?  
> Thanks a lot!

>

>  
>

>  
>

> ![](cid:EW9BEWXXYKEN@namo.co.kr)  
>  
> ---  
  
  

  

  

![](cid:XOK0LK7CT9SZ@namo.co.kr)  
  
---  
![](http://ext.samsung.net/mail/ext/v1/external/status/update?userid=hm.jiang&do=bWFpbElEPTIwMTkwNDIyMDY0MzA1ZXBjbXM1cDdkYjI3N2VkNzc3YTM0NTNjMzJmZDFkMjNkOWQzMjJjZiZyZWNpcGllbnRBZGRyZXNzPXVzZXJAa3lsaW4uYXBhY2hlLm9yZw__)


Re: NotServingRegionException----RE: Re: KYLIN timeout problem

Posted by Iñigo Martínez <im...@telecoming.com>.
Try to increase number of retries form 1 to something higher in hbase
client config, located at hbase-site.xml (in my case, at
$KYLIN_HOME/hadoop-conf/hbase-site.xml)

    <property>
      <name>hbase.client.retries.number</name>
      <value>35</value>
    </property>


El lun., 22 abr. 2019 a las 8:45, 蒋慧明 (<hm...@samsung.com>) escribió:

> Hi,
>
>   adjust "kylin.metadata.hbase-rpc-timeout"
>
>   This method does work.
>
>
>
> Another problem: When I tried to run kylin job, I encountered below problem.
>
> I tried to check consistancy of kylin_metadata with "hbase hbck -details kylin_metadata". The result is "0 inconsistency"
>
>  org.apache.kylin.engine.mr.exception.HadoopShellException:
> java.lang.RuntimeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=1, exceptions:
> Mon Apr 22 14:30:23 GMT+08:00 2019,
> RpcRetryingCaller{globalStartTime=1555914623296, pause=100, retries=1},
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Region
> kylin_metadata,/dict/FACT/DIM_TB/8b5cdf3e-8aa3-5c70-44bd-fffdc6ed4d1a.dict,1555653527986.9fbc862f521b93968a3299c9d853992e.
> is not online on ip-109-105-1-504.compute.internal,16020,1555901919971
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3008)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1144)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2476)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2757)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
>
> Does anyone knows how to solve this problem?
>
> Thanks a lot!
>
>
>
>
>
> --------- *Original Message* ---------
>
> *Sender* : JiaTao Tao <ta...@gmail.com>
>
> *Date* : 2019-04-19 10:54 (GMT+9)
>
> *Title* : Re: KYLIN timeout problem
>
>
> Hi
>
> You can adjust "kylin.metadata.hbase-rpc-timeout" to a larger value. And
> then run metadata/StorageCleanup, It will reduce the data in Hbase.
>
> --
>
>
> Regards!
>
> Aron Tao
>
> 蒋慧明 <hm...@samsung.com> 于2019年4月18日周四 下午2:35写道:
>
>> Hi,
>>
>> I met below error when I run cube in Kylin. It happened in the "#4 Step
>> Name: Build Dimension Dictionary"
>>
>>
>>
>>  Tue Apr 16 14:18:06 GMT+08:00 2019,
>> RpcRetryingCaller{globalStartTime=1555395481041, pause=100, retries=1},
>> java.io.IOException: Call to [HBASE URL] failed on local exception:
>> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=15589,
>> waitTime=5001, operationTimeout=5000 expired.
>>
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
>> at
>> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
>> ... 3 more
>> Caused by: java.io.IOException: Call to
>> ip-10-10-110-102.cn-north-1.compute.internal/10.10.110.102:16020 failed
>> on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
>> id=15589, waitTime=5001, operationTimeout=5000 expired.
>> at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:292)
>> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1274)
>> at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>> at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:35396)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:224)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:65)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212)
>>
>>
>>
>>
>> The normal dimension of this cube is about 25.
>>
>> Then I created a new cube with 3 normal dimensions and run it. The job is
>> successful.
>>
>>
>>
>> When I tried to do metadata backup and metadata clean wich cmd
>> "metastore.sh", I also met the error:
>> "org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=15589,
>> waitTime=5001, operationTimeout=5000 expired".
>>
>>
>>
>> Does anyone know about the root cause of this problem? And how to fix it?
>> Thanks a lot!
>>
>>
>>
>>
>>
>>
>
>
>

-- 




Iñigo Martínez
Systems Manager
imartinez@telecoming.com






  [image: Telecoming - Make it digital]
[image: 5000_empresas]
[image: 1000_empresas]

Paseo de la Castellana, 95. Torre Europa, pl 16. 28046 Madrid, Spain |
telecoming.com <http://www.telecoming.com/>



  Este correo electrónico y sus archivos adjuntos están dirigidos
únicamente a la(s) dirección(es) indicada(s) anteriormente. El carácter
confidencial, personal e intransferible del mismo está protegido
legalmente. Cualquier publicación, reproducción, distribución o
retransmisión no autorizada, ya sea completa o en parte, se encuentra
prohibida. Si ha recibido este mensaje por equivocación, notifíquelo
inmediatamente a la persona que lo ha enviado y borre el mensaje original
junto con sus ficheros anexos sin leerlo ni grabarlo en modo alguno.

Re: Error: Region is not online

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Huiming,

Have you recovered from this bad situation? When an RS is down, HBase needs
to take a while to identify the bad RS and then move its data to another
RS. How long this will take will depends on several factors. The following
post is discussing it.

https://stackoverflow.com/questions/36579219/how-long-hbase-need-to-take-for-recovering-one-crashed-regionserver

The port may be different if RS found the default port is not available. So
you may need double check HBase's configuration.

Please share with us if you have further information.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




蒋慧明 <hm...@samsung.com> 于2019年5月23日周四 下午2:29写道:

> Dear
>
>   When I tried to run a job, it report error like this:
>
>
>  org.apache.kylin.engine.mr.exception.HadoopShellException:
> java.lang.RuntimeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=1, exceptions:
> Thu May 23 14:21:30 GMT+08:00 2019,
> RpcRetryingCaller{globalStartTime=1558592490643, pause=100, retries=1},
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Region R1 is not
> online on RS1,16020
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3008)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1144)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2476)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2757)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
>
>
>
> From HBASE Webui, region R1 is actually located at RS2 (16030).
>
>
>
> Anyone know about:
>
> 1. Why Kylin request with the wrong region server? How to fixed it?
>
> 2. Why the requested port 16020 is different with Hbase port 16030? Is it
> normal?
>
>
>
> Thanks a lot!
>
>
>
>>
>>
>>
>>
>
>
>

Error: Region is not online

Posted by 蒋慧明 <hm...@samsung.com>.
Dear

  When I tried to run a job, it report error like this:  


 org.apache.kylin.engine.mr.exception.HadoopShellException:
java.lang.RuntimeException:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=1, exceptions:  
Thu May 23 14:21:30 GMT+08:00 2019,
RpcRetryingCaller{globalStartTime=1558592490643, pause=100, retries=1},
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region R1 is not online on
RS1,16020  
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3008)  
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1144)  
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2476)  
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2757)  
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)  
  
---  
  


From HBASE Webui, region R1 is actually located at RS2 (16030).



Anyone know about:

1\. Why Kylin request with the wrong region server? How to fixed it?

2\. Why the requested port 16020 is different with Hbase port 16030? Is it
normal?



Thanks a lot!



> ![](cid:OC322OJWTEAC@namo.co.kr)  
>  
> ---  
>  
>

>

>  
>

>

>  
>

>

> ![](cid:93BIV0UVOUUB@namo.co.kr)  
>  
> ---  
  
  

  

  

![](cid:XOK0LK7CT9SZ@namo.co.kr)  
  
---  
![](http://ext.samsung.net/mail/ext/v1/external/status/update?userid=hm.jiang&do=bWFpbElEPTIwMTkwNTIzMDYyOTQxZXBjbXM1cDM0YTljNzg3ZmQxZGQ3ZmY5MGNiMzBhODgzNWVjYWVkOSZyZWNpcGllbnRBZGRyZXNzPXVzZXJAa3lsaW4uYXBhY2hlLm9yZw__)


RE: Re: KYLIN build cube problem

Posted by 蒋慧明 <hm...@samsung.com>.
Hello JiaTao

  Thanks a lot for suggestion.

  Before running this job, "kylin.metadata.hbase-rpc-timeout" have already
been set to 50000. But operationTimeout is 9998, not 50000. It's strange





\--------- **Original Message** \---------

**Sender** : JiaTao Tao  <ta...@gmail.com>

**Date** : 2019-05-17 21:02 (GMT+9)

**Title** : Re: KYLIN build cube problem



Hi

  

You can try to adjust "kylin.metadata.hbase-rpc-timeout" to a larger value.
And then run metadata/StorageCleanup.

  

\--  

  

Regards!

Aron Tao

  

蒋慧明 <[hm.jiang@samsung.com](mailto:hm.jiang@samsung.com)> 于2019年5月17日周五
上午7:19写道:  

> Dear

>

>   When I tried to build cube for one day, the report error is different.

>

>   Sometimes, the error is :

>

> RpcRetryingCaller{globalStartTime=1558076503569, pause=100, retries=1},
java.io.IOException: Call to [HBASE IPXX.XX.XX.XXX:16020]  failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=3436827,
waitTime=9999, operationTimeout=9998 expired.  
>  
> ---  
>  
> Sometimes following error occurs:

>

> org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region kylin_metadata is
not online on ip-XX.XX.XX.XX,16020,1555901919971  
>  
> ---  
>  
>  
>

> Does anyone know about this problem? Where is the timeout-9998 configured?
Why is it not online?

>

> Many thanks!

>

> ![](cid:XOK0LK7CT9SZ@namo.co.kr)  
>  
> ---  
>  
>

>

>  
>

>

>  
>

>

> ![](cid:LP7KBSL8PYMC@namo.co.kr)  
>  
> ---  
  
  

  

  

![](cid:20190522022817_0@epcms5p)  
  
---  
![](http://ext.samsung.net/mail/ext/v1/external/status/update?userid=hm.jiang&do=bWFpbElEPTIwMTkwNTIyMDIyODE3ZXBjbXM1cDM2OWQxYjU0MTgzMmI3MzE2ZjZmZDJiYjZhNTRiNzgwOCZyZWNpcGllbnRBZGRyZXNzPXVzZXJAa3lsaW4uYXBhY2hlLm9yZw__)


Re: KYLIN build cube problem

Posted by JiaTao Tao <ta...@gmail.com>.
Hi

You can try to adjust "kylin.metadata.hbase-rpc-timeout" to a larger value.
And then run metadata/StorageCleanup.

-- 


Regards!

Aron Tao

蒋慧明 <hm...@samsung.com> 于2019年5月17日周五 上午7:19写道:

> Dear
>
>   When I tried to build cube for one day, the report error is different.
>
>   Sometimes, the error is :
>
> RpcRetryingCaller{globalStartTime=1558076503569, pause=100, retries=1},
> java.io.IOException: Call to [HBASE IPXX.XX.XX.XXX:16020]  failed on local
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
> id=3436827, waitTime=9999, operationTimeout=9998 expired.
>
>
>
> Sometimes following error occurs:
>
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Region kylin_metadata is
> not online on ip-XX.XX.XX.XX,16020,1555901919971
>
>
>
> Does anyone know about this problem? Where is the timeout-9998 configured?
> Why is it not online?
>
> Many thanks!
>
>
>
>
>

KYLIN build cube problem

Posted by 蒋慧明 <hm...@samsung.com>.
Dear

  When I tried to build cube for one day, the report error is different.

  Sometimes, the error is :

RpcRetryingCaller{globalStartTime=1558076503569, pause=100, retries=1},
java.io.IOException: Call to [HBASE IPXX.XX.XX.XXX:16020]  failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=3436827,
waitTime=9999, operationTimeout=9998 expired.  
  
---  
  


Sometimes following error occurs:

org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region kylin_metadata is
not online on ip-XX.XX.XX.XX,16020,1555901919971  
  
---  
  


Does anyone know about this problem? Where is the timeout-9998 configured? Why
is it not online?

Many thanks!

![](cid:9INH0GAN05W5@namo.co.kr)  
  
---  
  

  

  

![](cid:XOK0LK7CT9SZ@namo.co.kr)  
  
---  
![](http://ext.samsung.net/mail/ext/v1/external/status/update?userid=hm.jiang&do=bWFpbElEPTIwMTkwNTE3MDcxMzAzZXBjbXM1cDZlNTI3YjMzYjUxMGM5MzhhM2Q2MjRmNTE3MzYyYmI0YiZyZWNpcGllbnRBZGRyZXNzPXVzZXJAa3lsaW4uYXBhY2hlLm9yZw__)