You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by Alberto Ramón <a....@gmail.com> on 2016/12/02 00:03:32 UTC

corrupt metastore

I had some problems with corrupt data on HDFS and Meta HDFS
Now all services started OK

*None query is excuted in none cube *
*Error while executing SQL "select part_dt, sum(price) as total_selled,
count(distinct seller_id) as sellers from kylin_sales group by part_dt
order by part_dt LIMIT 50000":
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=5, exceptions: Fri Dec 02 07:31:07 GMT+08:00 2016,
org.apache.hadoop.hbase.client.RpcRetryingCaller@6cb60fb6,
com.google.protobuf.InvalidProtocolBufferException:
com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
had invalid wire type. at
com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom*


*I tried to rebuild cube, but:*




*Could not read JSON: Can not construct instance of long from String value
'2000-12-07 06:30:00': not a valid Long value at [Source:
org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
21] (through reference chain:
org.apache.kylin.rest.request.JobBuildRequest["startTime"]); nested
exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: Can
not construct instance of long from String value '2000-12-07 06:30:00': not
a valid Long value at [Source:
org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
21] (through reference chain:
org.apache.kylin.rest.request.JobBuildRequest["startTime"]*

*Some idea? I'm trying to metastore.sh, there is some check tool?*
2016-12-01 16:21:34,162 ERROR [pool-7-thread-1] dao.ExecutableDao:148 :
error get all Jobs:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=6, exceptions:
Fri Dec 02 05:21:34 GMT+08:00 2016, null, java.net.SocketTimeoutException:
callTimeout=60000, callDuration=122823: row '/execute/' on table
'kylin_metadata' at region=kylin_metadata,,1477759808710.faab4c9
88f06f17d9e903068db5b3b81., hostname=amb0.mycorp.kom,60020,1480614855596,
seqNum=1664

        at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:262)
        at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:199)

Caused by: java.net.SocketTimeoutException: callTimeout=60000,
callDuration=122823: row '/execute/' on table 'kylin_metadata' at
region=kylin_metadata,,1477759808710.faab4c988f06f17d9e903068db5b3b81.

*(re-deploy all isn't a problem, is only for knowledge)*

Re: corrupt metastore

Posted by Alberto Ramón <a....@gmail.com>.

jaja,

I think I loss some  HBase data (hbck found some error)
The Kylin Log is OK, make a clean start-up, but after data cube isn't
accesible

I tried use metastore.sh to delete all Kylin data (Clean and reset options,
I don't know what is the difference)

OK, I redeploy new system, Isn't problem
And perhaps I need scheduler a "metastore.sh backup"

Thanks ¡¡

2016-12-02 10:37 GMT+01:00 ShaoFeng Shi <sh...@apache.org>:

> There is no such a check tool/command today; Kylin metadata's availability
> relies on HBase and HDFS's replication; If the HBase and HDFS no data loss,
> then Kylin has no data loss;  You can watch the kylin.log during the
> startup, Kylin will report error when finding inconsistent metadata.
>
> 2016-12-02 15:23 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>
>> yes, yes,
>> I had this type of problems, I needed used
>>   hdfs fsck
>>   hbase hbck
>> And solved all problems. --> pehaps some data has been lost
>>
>> The nex steps will be:
>> -  check metadata of Kylin
>> -  check consistence between metadata and Kylin's tables
>>
>>
>> But I don't know if there is some tools/commands to do this
>> I saw metadata.sh script, but I cant find this functionality
>>
>>
>>
>> 2016-12-02 2:46 GMT+01:00 ShaoFeng Shi <sh...@apache.org>:
>>
>>> Hi Alberto, It looks like the HBase service is in trouble, please check
>>> it firstly;
>>>
>>> 2016-12-02 8:03 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>>
>>>> I had some problems with corrupt data on HDFS and Meta HDFS
>>>> Now all services started OK
>>>>
>>>> *None query is excuted in none cube *
>>>> *Error while executing SQL "select part_dt, sum(price) as total_selled,
>>>> count(distinct seller_id) as sellers from kylin_sales group by part_dt
>>>> order by part_dt LIMIT 50000":
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>>>> attempts=5, exceptions: Fri Dec 02 07:31:07 GMT+08:00 2016,
>>>> org.apache.hadoop.hbase.client.RpcRetryingCaller@6cb60fb6,
>>>> com.google.protobuf.InvalidProtocolBufferException:
>>>> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
>>>> had invalid wire type. at
>>>> com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
>>>> at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom*
>>>>
>>>>
>>>> *I tried to rebuild cube, but:*
>>>>
>>>>
>>>>
>>>>
>>>> *Could not read JSON: Can not construct instance of long from String
>>>> value '2000-12-07 06:30:00': not a valid Long value at [Source:
>>>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>>>> 21] (through reference chain:
>>>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]); nested
>>>> exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: Can
>>>> not construct instance of long from String value '2000-12-07 06:30:00': not
>>>> a valid Long value at [Source:
>>>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>>>> 21] (through reference chain:
>>>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]*
>>>>
>>>> *Some idea? I'm trying to metastore.sh, there is some check tool?*
>>>> 2016-12-01 16:21:34,162 ERROR [pool-7-thread-1] dao.ExecutableDao:148 :
>>>> error get all Jobs:
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>>>> attempts=6, exceptions:
>>>> Fri Dec 02 05:21:34 GMT+08:00 2016, null, java.net
>>>> .SocketTimeoutException: callTimeout=60000, callDuration=122823: row
>>>> '/execute/' on table 'kylin_metadata' at region=kylin_metadata,,1477759
>>>> 808710.faab4c9
>>>> 88f06f17d9e903068db5b3b81., hostname=amb0.mycorp.kom,60020,1480614855596,
>>>> seqNum=1664
>>>>
>>>>         at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepl
>>>> icas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:262)
>>>>         at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.c
>>>> all(ScannerCallableWithReplicas.java:199)
>>>>
>>>> Caused by: java.net.SocketTimeoutException: callTimeout=60000,
>>>> callDuration=122823: row '/execute/' on table 'kylin_metadata' at
>>>> region=kylin_metadata,,1477759808710.faab4c988f06f17d9e903068db5b3b81.
>>>>
>>>> *(re-deploy all isn't a problem, is only for knowledge)*
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Re: corrupt metastore

Posted by ShaoFeng Shi <sh...@apache.org>.

There is no such a check tool/command today; Kylin metadata's availability
relies on HBase and HDFS's replication; If the HBase and HDFS no data loss,
then Kylin has no data loss;  You can watch the kylin.log during the
startup, Kylin will report error when finding inconsistent metadata.

2016-12-02 15:23 GMT+08:00 Alberto Ramón <a....@gmail.com>:

> yes, yes,
> I had this type of problems, I needed used
>   hdfs fsck
>   hbase hbck
> And solved all problems. --> pehaps some data has been lost
>
> The nex steps will be:
> -  check metadata of Kylin
> -  check consistence between metadata and Kylin's tables
>
>
> But I don't know if there is some tools/commands to do this
> I saw metadata.sh script, but I cant find this functionality
>
>
>
> 2016-12-02 2:46 GMT+01:00 ShaoFeng Shi <sh...@apache.org>:
>
>> Hi Alberto, It looks like the HBase service is in trouble, please check
>> it firstly;
>>
>> 2016-12-02 8:03 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>
>>> I had some problems with corrupt data on HDFS and Meta HDFS
>>> Now all services started OK
>>>
>>> *None query is excuted in none cube *
>>> *Error while executing SQL "select part_dt, sum(price) as total_selled,
>>> count(distinct seller_id) as sellers from kylin_sales group by part_dt
>>> order by part_dt LIMIT 50000":
>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>>> attempts=5, exceptions: Fri Dec 02 07:31:07 GMT+08:00 2016,
>>> org.apache.hadoop.hbase.client.RpcRetryingCaller@6cb60fb6,
>>> com.google.protobuf.InvalidProtocolBufferException:
>>> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
>>> had invalid wire type. at
>>> com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
>>> at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom*
>>>
>>>
>>> *I tried to rebuild cube, but:*
>>>
>>>
>>>
>>>
>>> *Could not read JSON: Can not construct instance of long from String
>>> value '2000-12-07 06:30:00': not a valid Long value at [Source:
>>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>>> 21] (through reference chain:
>>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]); nested
>>> exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: Can
>>> not construct instance of long from String value '2000-12-07 06:30:00': not
>>> a valid Long value at [Source:
>>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>>> 21] (through reference chain:
>>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]*
>>>
>>> *Some idea? I'm trying to metastore.sh, there is some check tool?*
>>> 2016-12-01 16:21:34,162 ERROR [pool-7-thread-1] dao.ExecutableDao:148 :
>>> error get all Jobs:
>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>>> attempts=6, exceptions:
>>> Fri Dec 02 05:21:34 GMT+08:00 2016, null, java.net.SocketTimeoutException:
>>> callTimeout=60000, callDuration=122823: row '/execute/' on table
>>> 'kylin_metadata' at region=kylin_metadata,,1477759808710.faab4c9
>>> 88f06f17d9e903068db5b3b81., hostname=amb0.mycorp.kom,60020,1480614855596,
>>> seqNum=1664
>>>
>>>         at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepl
>>> icas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:262)
>>>         at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.c
>>> all(ScannerCallableWithReplicas.java:199)
>>>
>>> Caused by: java.net.SocketTimeoutException: callTimeout=60000,
>>> callDuration=122823: row '/execute/' on table 'kylin_metadata' at
>>> region=kylin_metadata,,1477759808710.faab4c988f06f17d9e903068db5b3b81.
>>>
>>> *(re-deploy all isn't a problem, is only for knowledge)*
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: corrupt metastore

Posted by Alberto Ramón <a....@gmail.com>.

yes, yes,
I had this type of problems, I needed used
  hdfs fsck
  hbase hbck
And solved all problems. --> pehaps some data has been lost

The nex steps will be:
-  check metadata of Kylin
-  check consistence between metadata and Kylin's tables


But I don't know if there is some tools/commands to do this
I saw metadata.sh script, but I cant find this functionality



2016-12-02 2:46 GMT+01:00 ShaoFeng Shi <sh...@apache.org>:

> Hi Alberto, It looks like the HBase service is in trouble, please check it
> firstly;
>
> 2016-12-02 8:03 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>
>> I had some problems with corrupt data on HDFS and Meta HDFS
>> Now all services started OK
>>
>> *None query is excuted in none cube *
>> *Error while executing SQL "select part_dt, sum(price) as total_selled,
>> count(distinct seller_id) as sellers from kylin_sales group by part_dt
>> order by part_dt LIMIT 50000":
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>> attempts=5, exceptions: Fri Dec 02 07:31:07 GMT+08:00 2016,
>> org.apache.hadoop.hbase.client.RpcRetryingCaller@6cb60fb6,
>> com.google.protobuf.InvalidProtocolBufferException:
>> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
>> had invalid wire type. at
>> com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
>> at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom*
>>
>>
>> *I tried to rebuild cube, but:*
>>
>>
>>
>>
>> *Could not read JSON: Can not construct instance of long from String
>> value '2000-12-07 06:30:00': not a valid Long value at [Source:
>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>> 21] (through reference chain:
>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]); nested
>> exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: Can
>> not construct instance of long from String value '2000-12-07 06:30:00': not
>> a valid Long value at [Source:
>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>> 21] (through reference chain:
>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]*
>>
>> *Some idea? I'm trying to metastore.sh, there is some check tool?*
>> 2016-12-01 16:21:34,162 ERROR [pool-7-thread-1] dao.ExecutableDao:148 :
>> error get all Jobs:
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>> attempts=6, exceptions:
>> Fri Dec 02 05:21:34 GMT+08:00 2016, null, java.net.SocketTimeoutException:
>> callTimeout=60000, callDuration=122823: row '/execute/' on table
>> 'kylin_metadata' at region=kylin_metadata,,1477759808710.faab4c9
>> 88f06f17d9e903068db5b3b81., hostname=amb0.mycorp.kom,60020,1480614855596,
>> seqNum=1664
>>
>>         at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepl
>> icas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:262)
>>         at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.c
>> all(ScannerCallableWithReplicas.java:199)
>>
>> Caused by: java.net.SocketTimeoutException: callTimeout=60000,
>> callDuration=122823: row '/execute/' on table 'kylin_metadata' at
>> region=kylin_metadata,,1477759808710.faab4c988f06f17d9e903068db5b3b81.
>>
>> *(re-deploy all isn't a problem, is only for knowledge)*
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Re: corrupt metastore

Posted by ShaoFeng Shi <sh...@apache.org>.

Hi Alberto, It looks like the HBase service is in trouble, please check it
firstly;

2016-12-02 8:03 GMT+08:00 Alberto Ramón <a....@gmail.com>:

> I had some problems with corrupt data on HDFS and Meta HDFS
> Now all services started OK
>
> *None query is excuted in none cube *
> *Error while executing SQL "select part_dt, sum(price) as total_selled,
> count(distinct seller_id) as sellers from kylin_sales group by part_dt
> order by part_dt LIMIT 50000":
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=5, exceptions: Fri Dec 02 07:31:07 GMT+08:00 2016,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@6cb60fb6,
> com.google.protobuf.InvalidProtocolBufferException:
> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
> had invalid wire type. at
> com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
> at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom*
>
>
> *I tried to rebuild cube, but:*
>
>
>
>
> *Could not read JSON: Can not construct instance of long from String value
> '2000-12-07 06:30:00': not a valid Long value at [Source:
> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
> 21] (through reference chain:
> org.apache.kylin.rest.request.JobBuildRequest["startTime"]); nested
> exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: Can
> not construct instance of long from String value '2000-12-07 06:30:00': not
> a valid Long value at [Source:
> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
> 21] (through reference chain:
> org.apache.kylin.rest.request.JobBuildRequest["startTime"]*
>
> *Some idea? I'm trying to metastore.sh, there is some check tool?*
> 2016-12-01 16:21:34,162 ERROR [pool-7-thread-1] dao.ExecutableDao:148 :
> error get all Jobs:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=6, exceptions:
> Fri Dec 02 05:21:34 GMT+08:00 2016, null, java.net.SocketTimeoutException:
> callTimeout=60000, callDuration=122823: row '/execute/' on table
> 'kylin_metadata' at region=kylin_metadata,,1477759808710.faab4c9
> 88f06f17d9e903068db5b3b81., hostname=amb0.mycorp.kom,60020,1480614855596,
> seqNum=1664
>
>         at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
> cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:262)
>         at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.
> call(ScannerCallableWithReplicas.java:199)
>
> Caused by: java.net.SocketTimeoutException: callTimeout=60000,
> callDuration=122823: row '/execute/' on table 'kylin_metadata' at
> region=kylin_metadata,,1477759808710.faab4c988f06f17d9e903068db5b3b81.
>
> *(re-deploy all isn't a problem, is only for knowledge)*
>



-- 
Best regards,

Shaofeng Shi 史少锋