You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "RangerZhou (Jira)" <ji...@apache.org> on 2020/06/05 06:26:00 UTC

[jira] [Commented] (IGNITE-11783) Open file limit for deb distribution

    [ https://issues.apache.org/jira/browse/IGNITE-11783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126439#comment-17126439 ] 

RangerZhou commented on IGNITE-11783:
-------------------------------------

Hi,

I tried your method, but issue still:
{code:java}
//代码占位符
sudo cat /etc/systemd/system/apache-ignite.service
[Unit]
Description-Ignite Limit Service
Wants=network.target network-online.target autofs.service
After=network.target network-online.target autofs.service

[Service]
Type=simple
User=farmer
ExecStart=/bin/touch /home/xxx/1.txt
LimitNOFILE=500000
LimitNPROC=500000

[Install]
WantedBy=multi-user.target


"fs.file-max = 2097152" to "/etc/sysctl.conf"

cat /etc/security/limits.conf
*         hard    nofile      500000
*         soft    nofile      500000
root      hard    nofile      500000
root      soft    nofile      500000

{code}

> Open file limit for deb distribution
> ------------------------------------
>
>                 Key: IGNITE-11783
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11783
>             Project: Ignite
>          Issue Type: Bug
>          Components: persistence
>    Affects Versions: 2.7
>         Environment: ubuntu-16.04
>            Reporter: Alexander Belyak
>            Priority: Major
>              Labels: documentation, newbie
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Step to reproduce:
> 1) Install ignite from deb package on ubuntu 16.04
> 2) Start with persistence
> 3) Create 5 caches (or one with 4000+ partitions)
> Error text:
> {noformat}
> [18:29:44,369][INFO][exchange-worker-#43][GridCacheDatabaseSharedManager] Restoring partition state for local groups [cntPartStateWal=0, lastCheckpointId=bd24ff23-da6f-46e5-bafd-b643db3870d4]
> [18:29:51,864][SEVERE][exchange-worker-#43][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureH
> andler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to initialize partition file: /usr/s
> hare/apache-ignite/work/db/node00-f49af718-48da-4186-b664-62aca736bdc9/cache-SQL_PUBLIC_VERTEX_TBL/part-913.bin]]
> class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to initialize partition file: /usr/share/apache-ignite/work/db/node00-f49af718-48da-4186-b664-62aca736bdc9/cache-SQL_PUBLIC_
> VERTEX_TBL/part-913.bin
>         at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:444)
>         at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.ensure(FilePageStore.java:650)
>         at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.ensure(FilePageStoreManager.java:712)
>         at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2472)
>         at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2419)
>         at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1628)
>         at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1302)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1453)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:806)
>         at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2667)
>         at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)
>         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.FileSystemException: /usr/share/apache-ignite/work/db/node00-f49af718-48da-4186-b664-62aca736bdc9/cache-SQL_PUBLIC_VERTEX_TBL/part-913.bin: Too many open files
>         at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>         at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at sun.nio.fs.UnixFileSystemProvider.newAsynchronousFileChannel(UnixFileSystemProvider.java:196)
>         at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:248)
>         at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:301)
>         at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.<init>(AsyncFileIO.java:57)
>         at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory.create(AsyncFileIOFactory.java:53)
>         at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:416)
>         ... 12 more
> {noformat}
> It happen because systemd service description (/etc/systemd/system/apache-ignite@.service) didn't contain
> {noformat}
> LimitNOFILE=500000
> (possible with) LimitNPROC=500000
> {noformat}
> see: https://fredrikaverpil.github.io/2016/04/27/systemd-and-resource-limits/
> Possible, installation script should also add:
> *  "fs.file-max = 2097152" to "/etc/sysctl.conf" 
> *  into /etc/security/limits.conf:
> {noformat}
> *         hard    nofile      500000
> *         soft    nofile      500000
> root      hard    nofile      500000
> root      soft    nofile      500000
> {noformat}
> see: https://easyengine.io/tutorials/linux/increase-open-files-limit
> And it will be amazing if ignite start process check file limits and print link to documentation page if:
> 1) persistence enabled
> 2) limits below some value (<=4096)
> 3) limits below total number of partition in current node
> And one more thing - if ignite get "Too many open files" exception in the middle of rebalancing - it will be terrible situation, whole cluster just stop working. It can happen if each node have almost full limit and:
> * someone create additional cache
> * topology change (remove node) and each remaining nodes get more local partition.
> Can we remember limit on startup and check limit each time when are we going to create local partition?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)