You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by 喜之郎 <25...@qq.com> on 2018/06/05 10:54:08 UTC

回复:can anybody give some suggest about this elasticsearch shard failed problem? thanks

lucene version is 6.3.0
filesystem is xfs.
And this always happen at 00:02 06:02 12:02 18:02,
it's very strange 




------------------ 原始邮件 ------------------
发件人: "251922566"<25...@qq.com>;
发送时间: 2018年6月5日(星期二) 晚上6:50
收件人: "java-user"<ja...@lucene.apache.org>;

主题: can anybody give some suggest about this elasticsearch shard failed problem? thanks




Elasticsearch version (bin/elasticsearch --version): 5.1.1

Plugins installed: [] no

JVM version (java -version): 1.8.0_77

OS version (uname -a if on a Unix-like system): CentOS Linux release 7.2.1511 (Core)

Description of the problem including expected versus actual behavior:
when using update api ,highly concurrency , primary shard and replication shard all failed.
And this happened many times in 2 machines. So I think tihs is a bug.

Provide logs (if relevant):

[2018-04-27T12:02:22,797][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2] [analytics_profile_12014][7] received shard failed for shard id [[analytics_profile_12014][7]], allocation id [xIEoF3JaTLWQz6X2KxMWRA], primary term [0], message [shard failure, reason [refresh failed]], failure [EOFException[read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")]]
java.io.EOFException: read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")))
org.elasticsearch.action.FailedNodeException: Failed node [BbfFMNRpRvW5p8LDs3rquQ]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:984) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TcpTransport.lambda$handleException$17(TcpTransport.java:1314) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1312) [elasticsearch-5.1.1.jar:5.1.1]
Caused by: org.elasticsearch.transport.RemoteTransportException: [172.20.3.2_1][172.20.3.2:9301][internal:cluster/nodes/indices/shard/store[n]]
Caused by: org.elasticsearch.ElasticsearchException: Failed to list store metadata for shard [[analytics_action_12014_201804][15]]
Caused by: org.apache.lucene.index.CorruptIndexException: failed engine (reason: [corrupt file (source: [index])]) (resource=preexisting_corruption)
Caused by: java.io.IOException: failed engine (reason: [corrupt file (source: [index])])
Caused by: org.apache.lucene.index.CorruptIndexException: compound sub-files must have a valid codec header and footer: file is too small (0 bytes) (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata1/nodes/0/indices/cBECbko7SMKP3oXsTGi_kg/15/index/_2kqi.fdx")))
[2018-04-27T12:02:22,800][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2] [analytics_profile_12014][18] received shard failed for shard id [[analytics_profile_12014][18]], allocation id [7TieFxLRRZ-28uOsPFr1yQ], primary term [0], message [shard failure, reason [refresh failed]], failure [EOFException[read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")]]
java.io.EOFException: read past EOF: MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")))

Re: can anybody give some suggest about this elasticsearch shard failed problem? thanks

Posted by Adrien Grand <jp...@gmail.com>.
The fact that it happens when writing compound files is very suspicious
since there should be little time between when the original files are
written and when they are merged into a compound file. Is it a remote
filesystem? Do you have cron jobs that run every 6 hours?

Le mar. 5 juin 2018 à 16:43, 喜之郎 <25...@qq.com> a écrit :

> lucene version is 6.3.0
> filesystem is xfs.
> And this always happen at 00:02 06:02 12:02 18:02,
> it's very strange
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "251922566"<25...@qq.com>;
> 发送时间: 2018年6月5日(星期二) 晚上6:50
> 收件人: "java-user"<ja...@lucene.apache.org>;
>
> 主题: can anybody give some suggest about this elasticsearch shard failed
> problem? thanks
>
>
>
>
> Elasticsearch version (bin/elasticsearch --version): 5.1.1
>
> Plugins installed: [] no
>
> JVM version (java -version): 1.8.0_77
>
> OS version (uname -a if on a Unix-like system): CentOS Linux release
> 7.2.1511 (Core)
>
> Description of the problem including expected versus actual behavior:
> when using update api ,highly concurrency , primary shard and replication
> shard all failed.
> And this happened many times in 2 machines. So I think tihs is a bug.
>
> Provide logs (if relevant):
>
> [2018-04-27T12:02:22,797][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2]
> [analytics_profile_12014][7] received shard failed for shard id
> [[analytics_profile_12014][7]], allocation id [xIEoF3JaTLWQz6X2KxMWRA],
> primary term [0], message [shard failure, reason [refresh failed]], failure
> [EOFException[read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")]]
> java.io.EOFException: read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")
> Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status
> indeterminate: remaining=0, please run checkindex for more details
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/7/index/_fqb1.cfe")))
> org.elasticsearch.action.FailedNodeException: Failed node
> [BbfFMNRpRvW5p8LDs3rquQ]
> at
> org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219)
> ~[elasticsearch-5.1.1.jar:5.1.1]
> at
> org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:984)
> ~[elasticsearch-5.1.1.jar:5.1.1]
> at
> org.elasticsearch.transport.TcpTransport.lambda$handleException$17(TcpTransport.java:1314)
> ~[elasticsearch-5.1.1.jar:5.1.1]
> at
> org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1312)
> [elasticsearch-5.1.1.jar:5.1.1]
> Caused by: org.elasticsearch.transport.RemoteTransportException:
> [172.20.3.2_1][172.20.3.2:9301
> ][internal:cluster/nodes/indices/shard/store[n]]
> Caused by: org.elasticsearch.ElasticsearchException: Failed to list store
> metadata for shard [[analytics_action_12014_201804][15]]
> Caused by: org.apache.lucene.index.CorruptIndexException: failed engine
> (reason: [corrupt file (source: [index])]) (resource=preexisting_corruption)
> Caused by: java.io.IOException: failed engine (reason: [corrupt file
> (source: [index])])
> Caused by: org.apache.lucene.index.CorruptIndexException: compound
> sub-files must have a valid codec header and footer: file is too small (0
> bytes)
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata1/nodes/0/indices/cBECbko7SMKP3oXsTGi_kg/15/index/_2kqi.fdx")))
> [2018-04-27T12:02:22,800][WARN ][o.e.c.a.s.ShardStateAction] [172.20.3.2]
> [analytics_profile_12014][18] received shard failed for shard id
> [[analytics_profile_12014][18]], allocation id [7TieFxLRRZ-28uOsPFr1yQ],
> primary term [0], message [shard failure, reason [refresh failed]], failure
> [EOFException[read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")]]
> java.io.EOFException: read past EOF:
> MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")
> Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status
> indeterminate: remaining=0, please run checkindex for more details
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/esdata2/nodes/0/indices/fzJHAFdQQQO5zPL70D2b6g/18/index/_fhe2.cfe")))