You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by zhenyuan wei <ti...@gmail.com> on 2018/08/27 01:47:09 UTC

An exception when running Solr on HDFS，why a solr server can not recognize the write.lock file is created by itself before？

Hi all，
    I found an exception when running Solr on HDFS。The detail is：
Running solr on HDFS，and update doc was running always，
then，kill -9 solr JVM or reboot linux os/shutdown linux os，then restart all.
The exception  appears like：

2018-08-26 22:23:12.529 ERROR
(coreContainerWorkExecutor-2-thread-1-processing-n:cluster-node001:8983_solr)
[   ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on
startup
org.apache.solr.common.SolrException: Unable to create core
[collection002_shard56_replica_n110]
        at
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1061)
        at
org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:640)
        at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
        at java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.solr.common.SolrException: Index dir
'hdfs://hdfs-cluster/solr/collection002/core_node113/data/index/' of core
'collection002_shard56_replica_n110' is already locked. The most likely
cause is another Solr server (or another solr core in this server) also
configured to use this directory; other possible causes may be specific to
lockType: hdfs
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1009)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:864)
        at
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1040)
        ... 7 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Index dir
'hdfs://hdfs-cluster/solr/collection002/core_node113/data/index/' of core
'collection002_shard56_replica_n110' is already locked. The most likely
cause is another Solr server (or another solr core in this server) also
configured to use this directory; other possible causes may be specific to
lockType: hdfs
        at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:746)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:955)
        ... 9 more


In fact, a print out a hdfs api level exception stack, it reports like:

Caused by: org.apache.hadoop.fs.FileAlreadyExistsException:
/solr/collection002/core_node17/data/index/write.lock for client
192.168.0.12 already exists
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2563)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2450)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2334)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:623)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

        at sun.reflect.GeneratedConstructorAccessor140.newInstance(Unknown
Source)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1839)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
        at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
        at
org.apache.solr.store.hdfs.HdfsLockFactory.obtainLock(HdfsLockFactory.java:68)

So my question is , why a solr server can not recognize the write.lock file
is created by  itself before?
Is there any solution to solve this kind of accident or failure？



Thanks~
TinsWzy

Re: An exception when running Solr on HDFS，why a solr server can not recognize the write.lock file is created by itself before？

Posted by Walter Underwood <wu...@wunderwood.org>.

I accidentally put my Solr indexes on NFS once about ten years ago.
It was 100X slower. I would not recommend that.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 27, 2018, at 1:39 AM, zhenyuan wei <ti...@gmail.com> wrote:
> 
> Thanks for your answer! @Erick Erickson <er...@gmail.com>
> So, It's not recommended to run Solr on NFS ( like HDFS) now?  Maybe
> because of crash error or performance problem.
> I have a look at SOLR-8335&SOLR-8169, there is no good solution for this
> now， And maybe manual removal is the best option?
> 
> 
> Erick Erickson <er...@gmail.com> 于2018年8月27日周一 上午11:41写道：
> 
>> Because HDFS doesn't follow the file semantics that Solr expects.
>> 
>> There's quite a bit of background here:
>> https://issues.apache.org/jira/browse/SOLR-8335
>> 
>> Best,
>> Erick
>> On Sun, Aug 26, 2018 at 6:47 PM zhenyuan wei <ti...@gmail.com> wrote:
>>> 
>>> Hi all，
>>>    I found an exception when running Solr on HDFS。The detail is：
>>> Running solr on HDFS，and update doc was running always，
>>> then，kill -9 solr JVM or reboot linux os/shutdown linux os，then restart
>> all.
>>> The exception  appears like：
>>> 
>>> 2018-08-26 22:23:12.529 ERROR
>>> 
>> (coreContainerWorkExecutor-2-thread-1-processing-n:cluster-node001:8983_solr)
>>> [   ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on
>>> startup
>>> org.apache.solr.common.SolrException: Unable to create core
>>> [collection002_shard56_replica_n110]
>>>        at
>>> 
>> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1061)
>>>        at
>>> org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:640)
>>>        at
>>> 
>> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>        at
>>> 
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
>>>        at
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>>>        at
>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>>>        at java.lang.Thread.run(Thread.java:834)
>>> Caused by: org.apache.solr.common.SolrException: Index dir
>>> 'hdfs://hdfs-cluster/solr/collection002/core_node113/data/index/' of core
>>> 'collection002_shard56_replica_n110' is already locked. The most likely
>>> cause is another Solr server (or another solr core in this server) also
>>> configured to use this directory; other possible causes may be specific
>> to
>>> lockType: hdfs
>>>        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1009)
>>>        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:864)
>>>        at
>>> 
>> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1040)
>>>        ... 7 more
>>> Caused by: org.apache.lucene.store.LockObtainFailedException: Index dir
>>> 'hdfs://hdfs-cluster/solr/collection002/core_node113/data/index/' of core
>>> 'collection002_shard56_replica_n110' is already locked. The most likely
>>> cause is another Solr server (or another solr core in this server) also
>>> configured to use this directory; other possible causes may be specific
>> to
>>> lockType: hdfs
>>>        at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:746)
>>>        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:955)
>>>        ... 9 more
>>> 
>>> 
>>> In fact, a print out a hdfs api level exception stack, it reports like:
>>> 
>>> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException:
>>> /solr/collection002/core_node17/data/index/write.lock for client
>>> 192.168.0.12 already exists
>>>        at
>>> 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2563)
>>>        at
>>> 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2450)
>>>        at
>>> 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2334)
>>>        at
>>> 
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:623)
>>>        at
>>> 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
>>>        at
>>> 
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>        at
>>> 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at javax.security.auth.Subject.doAs(Subject.java:422)
>>>        at
>>> 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)
>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>>> 
>>>        at
>> sun.reflect.GeneratedConstructorAccessor140.newInstance(Unknown
>>> Source)
>>>        at
>>> 
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>        at
>> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>>>        at
>>> 
>> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>>>        at
>>> 
>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>>>        at
>>> 
>> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1839)
>>>        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
>>>        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
>>>        at
>>> 
>> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
>>>        at
>>> 
>> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
>>>        at
>>> 
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>        at
>>> 
>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
>>>        at
>>> 
>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
>>>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
>>>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
>>>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
>>>        at
>>> 
>> org.apache.solr.store.hdfs.HdfsLockFactory.obtainLock(HdfsLockFactory.java:68)
>>> 
>>> So my question is , why a solr server can not recognize the write.lock
>> file
>>> is created by  itself before?
>>> Is there any solution to solve this kind of accident or failure？
>>> 
>>> 
>>> 
>>> Thanks~
>>> TinsWzy
>>

Re: An exception when running Solr on HDFS，why a solr server can not recognize the write.lock file is created by itself before？

Posted by zhenyuan wei <ti...@gmail.com>.

Thanks for your answer! @Erick Erickson <er...@gmail.com>
So, It's not recommended to run Solr on NFS ( like HDFS) now?  Maybe
because of crash error or performance problem.
I have a look at SOLR-8335&SOLR-8169, there is no good solution for this
now， And maybe manual removal is the best option?


Erick Erickson <er...@gmail.com> 于2018年8月27日周一 上午11:41写道：

> Because HDFS doesn't follow the file semantics that Solr expects.
>
> There's quite a bit of background here:
> https://issues.apache.org/jira/browse/SOLR-8335
>
> Best,
> Erick
> On Sun, Aug 26, 2018 at 6:47 PM zhenyuan wei <ti...@gmail.com> wrote:
> >
> > Hi all，
> >     I found an exception when running Solr on HDFS。The detail is：
> > Running solr on HDFS，and update doc was running always，
> > then，kill -9 solr JVM or reboot linux os/shutdown linux os，then restart
> all.
> > The exception  appears like：
> >
> > 2018-08-26 22:23:12.529 ERROR
> >
> (coreContainerWorkExecutor-2-thread-1-processing-n:cluster-node001:8983_solr)
> > [   ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on
> > startup
> > org.apache.solr.common.SolrException: Unable to create core
> > [collection002_shard56_replica_n110]
> >         at
> >
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1061)
> >         at
> > org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:640)
> >         at
> >
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >         at
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> >         at java.lang.Thread.run(Thread.java:834)
> > Caused by: org.apache.solr.common.SolrException: Index dir
> > 'hdfs://hdfs-cluster/solr/collection002/core_node113/data/index/' of core
> > 'collection002_shard56_replica_n110' is already locked. The most likely
> > cause is another Solr server (or another solr core in this server) also
> > configured to use this directory; other possible causes may be specific
> to
> > lockType: hdfs
> >         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1009)
> >         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:864)
> >         at
> >
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1040)
> >         ... 7 more
> > Caused by: org.apache.lucene.store.LockObtainFailedException: Index dir
> > 'hdfs://hdfs-cluster/solr/collection002/core_node113/data/index/' of core
> > 'collection002_shard56_replica_n110' is already locked. The most likely
> > cause is another Solr server (or another solr core in this server) also
> > configured to use this directory; other possible causes may be specific
> to
> > lockType: hdfs
> >         at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:746)
> >         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:955)
> >         ... 9 more
> >
> >
> > In fact, a print out a hdfs api level exception stack, it reports like:
> >
> > Caused by: org.apache.hadoop.fs.FileAlreadyExistsException:
> > /solr/collection002/core_node17/data/index/write.lock for client
> > 192.168.0.12 already exists
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2563)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2450)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2334)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:623)
> >         at
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
> >         at
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> >         at
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:422)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
> >
> >         at
> sun.reflect.GeneratedConstructorAccessor140.newInstance(Unknown
> > Source)
> >         at
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> >         at
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> >         at
> >
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> >         at
> >
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> >         at
> >
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1839)
> >         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
> >         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
> >         at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
> >         at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
> >         at
> >
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> >         at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
> >         at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
> >         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
> >         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
> >         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
> >         at
> >
> org.apache.solr.store.hdfs.HdfsLockFactory.obtainLock(HdfsLockFactory.java:68)
> >
> > So my question is , why a solr server can not recognize the write.lock
> file
> > is created by  itself before?
> > Is there any solution to solve this kind of accident or failure？
> >
> >
> >
> > Thanks~
> > TinsWzy
>

Re: An exception when running Solr on HDFS，why a solr server can not recognize the write.lock file is created by itself before？

Posted by Erick Erickson <er...@gmail.com>.

Because HDFS doesn't follow the file semantics that Solr expects.

There's quite a bit of background here:
https://issues.apache.org/jira/browse/SOLR-8335

Best,
Erick
On Sun, Aug 26, 2018 at 6:47 PM zhenyuan wei <ti...@gmail.com> wrote:
>
> Hi all，
>     I found an exception when running Solr on HDFS。The detail is：
> Running solr on HDFS，and update doc was running always，
> then，kill -9 solr JVM or reboot linux os/shutdown linux os，then restart all.
> The exception  appears like：
>
> 2018-08-26 22:23:12.529 ERROR
> (coreContainerWorkExecutor-2-thread-1-processing-n:cluster-node001:8983_solr)
> [   ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on
> startup
> org.apache.solr.common.SolrException: Unable to create core
> [collection002_shard56_replica_n110]
>         at
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1061)
>         at
> org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:640)
>         at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>         at java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.solr.common.SolrException: Index dir
> 'hdfs://hdfs-cluster/solr/collection002/core_node113/data/index/' of core
> 'collection002_shard56_replica_n110' is already locked. The most likely
> cause is another Solr server (or another solr core in this server) also
> configured to use this directory; other possible causes may be specific to
> lockType: hdfs
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1009)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:864)
>         at
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1040)
>         ... 7 more
> Caused by: org.apache.lucene.store.LockObtainFailedException: Index dir
> 'hdfs://hdfs-cluster/solr/collection002/core_node113/data/index/' of core
> 'collection002_shard56_replica_n110' is already locked. The most likely
> cause is another Solr server (or another solr core in this server) also
> configured to use this directory; other possible causes may be specific to
> lockType: hdfs
>         at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:746)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:955)
>         ... 9 more
>
>
> In fact, a print out a hdfs api level exception stack, it reports like:
>
> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException:
> /solr/collection002/core_node17/data/index/write.lock for client
> 192.168.0.12 already exists
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2563)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2450)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2334)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:623)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>
>         at sun.reflect.GeneratedConstructorAccessor140.newInstance(Unknown
> Source)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>         at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1839)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
>         at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
>         at
> org.apache.solr.store.hdfs.HdfsLockFactory.obtainLock(HdfsLockFactory.java:68)
>
> So my question is , why a solr server can not recognize the write.lock file
> is created by  itself before?
> Is there any solution to solve this kind of accident or failure？
>
>
>
> Thanks~
> TinsWzy

Re: An exception when running Solr on HDFS，why a solr server can not recognize the write.lock file is created by itself before？

Posted by zhenyuan wei <ti...@gmail.com>.

@Shawn Heisey  Yeah, delete "write.lock" files manually is ok finally。
@Walter Underwood  Have some performace evaluation about Solr on HDFS vs
LocalFS  recently?

Shawn Heisey <ap...@elyograg.org> 于2018年8月28日周二 上午4:10写道：

> On 8/26/2018 7:47 PM, zhenyuan wei wrote:
> >      I found an exception when running Solr on HDFS。The detail is：
> > Running solr on HDFS，and update doc was running always，
> > then，kill -9 solr JVM or reboot linux os/shutdown linux os，then restart
> all.
>
> If you use "kill -9" to stop a Solr instance, the lockfile will get left
> behind and you may have difficulty starting Solr back up on ANY kind of
> filesystem until you delete the file in each core's data directory.  The
> filename defaults to "write.lock" if you don't change it.
>
> Thanks,
> Shawn
>
>

Re: An exception when running Solr on HDFS，why a solr server can not recognize the write.lock file is created by itself before？

Posted by Shawn Heisey <ap...@elyograg.org>.

On 8/26/2018 7:47 PM, zhenyuan wei wrote:
>      I found an exception when running Solr on HDFS。The detail is：
> Running solr on HDFS，and update doc was running always，
> then，kill -9 solr JVM or reboot linux os/shutdown linux os，then restart all.

If you use "kill -9" to stop a Solr instance, the lockfile will get left 
behind and you may have difficulty starting Solr back up on ANY kind of 
filesystem until you delete the file in each core's data directory.  The 
filename defaults to "write.lock" if you don't change it.

Thanks,
Shawn