You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Anders Jensen-Urstad (Jira)" <ji...@apache.org> on 2021/06/15 09:31:00 UTC

[jira] [Commented] (JENA-2044) tdb2.tdbloader crashses loading wikidata

    [ https://issues.apache.org/jira/browse/JENA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363507#comment-17363507 ] 

Anders Jensen-Urstad commented on JENA-2044:
--------------------------------------------

Just in case someone else is looking here: while loading a dataset with about 2,2B quads, I got an error similar to the one from [~lixi]:
{noformat}
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f1dadb51000, 65536, 1) failed; error='Not enough space' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.{noformat}
Increased RAM (32 -> 48 GB). Same error at the exact same point. Turns out the default vm.max_map_count was too low. Raised it from 65530 to 262144 (sudo sysctl -w vm.max_map_count=262144, and/or add it to /etc/sysctl.conf and do sudo sysctl -p). Then the tdb2.tdbloader import successfully finished.

> tdb2.tdbloader crashses loading wikidata
> ----------------------------------------
>
>                 Key: JENA-2044
>                 URL: https://issues.apache.org/jira/browse/JENA-2044
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Cmd line tools, Jena, TDB2
>    Affects Versions: Jena 3.14.0, Jena 3.17.0
>         Environment: {code:bash}
> $ java --version
> openjdk 11.0.9.1 2020-11-04
> OpenJDK Runtime Environment (build 11.0.9.1+1-post-Debian-1deb10u2)
> OpenJDK 64-Bit Server VM (build 11.0.9.1+1-post-Debian-1deb10u2, mixed mode, sharing)
> $ uname -r
> 4.19.0-14-amd64
> $ lsb_release -da
> No LSB modules are available.
> Distributor ID:	Debian
> Description:	Debian GNU/Linux 10 (buster)
> Release:	10
> Codename:	buster
> {code}
>            Reporter: Alexander Bigerl
>            Priority: Major
>         Attachments: FusekiWikidataLoaderDockerfile, fuseki-wikidata-2020-11-11_(docker_8c_J16g_R256g_K256g_phased).log, hs_err_pid28709.log
>
>
> Apache jena crashes when loading wikidata truthy 2020-11-11 (it is not available any more via wikidata, but a backup can be found here: [https://hobbitdata.informatik.uni-leipzig.de/wikidata-20201111-truthy-BETA.nt.bz2]
> command run was:
> {code:bash}
> cgmemtime /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/triplestores/jena/apache-jena-3.17.0/bin/tdb2.tdbloader --loader=sequential --loc /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/databases/fuseki/wikidata-2020-11-11/ /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/datasets/wikidata-2020-11-11/wikidata-20201111-truthy-BETA.nt 2>&1 | tee /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/logs/load/fuseki-wikidata-2020-11-11.log
> {code}
>  
> The end of the logfile is:
> {code:bash}
> 06:56:52 INFO  loader          :: Add: 729,000,000 SPO->OSP (Batch: 237,642 / Avg: 300,050)
> 06:56:57 INFO  loader          :: Add: 730,000,000 SPO->OSP (Batch: 234,962 / Avg: 299,936)
> 06:56:57 INFO  loader          ::   Elapsed: 2,433.85 seconds [2021/02/12 06:56:57 CET]
> 06:57:01 INFO  loader          :: Add: 731,000,000 SPO->OSP (Batch: 233,863 / Avg: 299,820)
> 06:57:05 INFO  loader          :: Add: 732,000,000 SPO->OSP (Batch: 269,978 / Avg: 299,775)
> 06:57:08 INFO  loader          :: Add: 733,000,000 SPO->OSP (Batch: 281,373 / Avg: 299,748)
> 06:57:12 INFO  loader          :: Add: 734,000,000 SPO->OSP (Batch: 285,143 / Avg: 299,727)
> 06:57:15 INFO  loader          :: Add: 735,000,000 SPO->OSP (Batch: 290,023 / Avg: 299,714)
> 06:57:19 INFO  loader          :: Add: 736,000,000 SPO->OSP (Batch: 290,951 / Avg: 299,701)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (malloc) failed to allocate 2097152 bytes for AllocateHeap
> # An error report file with more information is saved as:
> # /home/d/dice-gr/profiles/unix/cs/triplestore-benchmark/hs_err_pid28709.log
> Child user: 70961.063 s
> Child sys : 5817.787 s
> Child wall: 76243.666 s
> Child high-water RSS                    :  534037652 KiB
> Recursive and acc. high-water RSS+CACHE :  585081976 KiB
> {code}
>  The machine has a AMD EPYC 7742 64-Core Processor, 1TB RAM and 2 TB free ssd storage on /home. So there should still be plenty of RAM have been available.
>  The loc folder has at the time of crash 543GB.
> I also tried with -loader=parallel and -loader=phased . Same result.
> h3. update 23/Feb/2021:
> ulimit -a
> {code:bash}
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 4110480
> max locked memory       (kbytes, -l) 65536
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 4110480
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> {code}
> output from prlimit --pid=<PID of tdb.loader2>:
> {code:bash}
> RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
> AS         address space limit                unlimited unlimited bytes
> CORE       max core file size                         0 unlimited bytes
> CPU        CPU time                           unlimited unlimited seconds
> DATA       max data size                      unlimited unlimited bytes
> FSIZE      max file size                      unlimited unlimited bytes
> LOCKS      max number of file locks held      unlimited unlimited locks
> MEMLOCK    max locked-in-memory address space  67108864  67108864 bytes
> MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
> NICE       max nice prio allowed to raise             0         0 
> NOFILE     max number of open files             1048576   1048576 files
> NPROC      max number of processes              4110480   4110480 processes
> RSS        max resident set size              unlimited unlimited bytes
> RTPRIO     max real-time priority                     0         0 
> RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
> SIGPENDING max number of pending signals        4110480   4110480 signals
> STACK      max stack size                       8388608 unlimited bytes
> {code}
> I tried several things. Nothing worked so far. Here is what I tried:
> * Setting several JVM_ARGS to avoid resource over-consumption: 
> {code:bash}
> -Xmx16g -XX:MaxDirectMemorySize=16g -Djdk.nio.maxCachedBufferSize=17179869184 -XX:ActiveProcessorCount=4
> {code}
> * Setting limits for Metaspace and CodeCache: 
> {code:bash}
> -XX:MaxMetaspaceSize=4g -XX:ReservedCodeCacheSize=4g
> {code}
> * limiting kernel shared memory to (set in /etc/sysctl.conf + reboot)
> {code:bash}
> kernel.shmmax=17179869185 # 16 GB
> kernel.shmall=4194304 # *4096 pagesize = 16 GB
> {code}
> * putting jena into a dockerfile and run it with the commands below. The idea was to limit the resources and have a default environment. I first tried with 16g memory but it within a couple 100M triples dropping below 10k triples/s loading speed. So I increased the reseources because I didn't have that much time on the server. Dockerfile is provided as attachment as well as the output. Here the output might be interesting. Because tdb2.loader did not crash completely but simply logged an Exception and idled ever since.
> {code:bash}
> docker build -f FusekiWikidataLoaderDockerfile -t fuseki-wikidata_01 empty_folder/
> dockerun --user $(id -u):$(id -g) --kernel-memory=256g --memory=256g --cpuset-cpus=0-7 --name fuseki_wikidata_8c_J16g_R256g_K256g_phased -v "$(pwd)"/databases:/databases -v "$(pwd)"/datasets:/datasets -v "$(pwd)"/logs:/logs fuseki-wikidata_01:latest
> {code}
> error:
> {code:bash}
> 05:27:31 INFO  loader          :: Add: 3,156,000,000 Index (Batch: 57,074 / Avg: 159,787)
> 05:27:35 ERROR Indexer         :: Interrupted
> org.apache.jena.dboe.base.file.FileException: BlockMgrMapped.segmentAllocate: Segment = 16313 : Offset = 136843362304
> 	at org.apache.jena.dboe.base.file.BlockAccessMapped.allocSegment(BlockAccessMapped.java:218) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.file.BlockAccessMapped.getByteBuffer(BlockAccessMapped.java:150) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.file.BlockAccessMapped.allocate(BlockAccessMapped.java:80) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocate(BlockMgrFileAccess.java:51) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.block.BlockMgrBase.allocate(BlockMgrBase.java:39) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.block.BlockMgrWrapper.allocate(BlockMgrWrapper.java:41) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.block.BlockMgrFreeChain.allocate(BlockMgrFreeChain.java:44) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.page.PageBlockMgr.create(PageBlockMgr.java:53) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeRecordsMgr.create(BPTreeRecordsMgr.java:77) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.create(BPTreeRecords.java:236) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.insertNewPage(BPTreeRecords.java:230) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.split(BPTreeRecords.java:206) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.split(BPTreeNode.java:568) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:509) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.insert(BPTreeNode.java:203) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPlusTree.insertAndReturnOld(BPlusTree.java:278) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPlusTree.insert(BPlusTree.java:271) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.tdb2.store.tupletable.TupleIndexRecord.performAdd(TupleIndexRecord.java:94) ~[jena-tdb2-3.17.0.jar:3.17.0]
> 	at org.apache.jena.tdb2.store.tupletable.TupleIndexBase.add(TupleIndexBase.java:66) ~[jena-tdb2-3.17.0.jar:3.17.0]
> 	at org.apache.jena.tdb2.loader.main.Indexer.lambda$loadTuples$1(Indexer.java:134) ~[jena-tdb2-3.17.0.jar:3.17.0]
> 	at org.apache.jena.tdb2.loader.main.Indexer.stageIndex(Indexer.java:116) ~[jena-tdb2-3.17.0.jar:3.17.0]
> 	at org.apache.jena.tdb2.loader.main.Indexer.lambda$startBulk$0(Indexer.java:93) ~[jena-tdb2-3.17.0.jar:3.17.0]
> 	at java.lang.Thread.run(Thread.java:834) [?:?]
> [53484.193s][warning][os,thread] Attempt to deallocate stack guard pages failed (0x00007f33244ff000-0x00007f3324503000).
> 05:27:36 ERROR Indexer         :: Interrupted
> org.apache.jena.dboe.base.file.FileException: BlockMgrMapped.segmentAllocate: Segment = 16459 : Offset = 138068099072
> 	at org.apache.jena.dboe.base.file.BlockAccessMapped.allocSegment(BlockAccessMapped.java:218) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.file.BlockAccessMapped.getByteBuffer(BlockAccessMapped.java:150) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.file.BlockAccessMapped.allocate(BlockAccessMapped.java:80) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocate(BlockMgrFileAccess.java:51) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.block.BlockMgrBase.allocate(BlockMgrBase.java:39) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.block.BlockMgrWrapper.allocate(BlockMgrWrapper.java:41) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.block.BlockMgrFreeChain.allocate(BlockMgrFreeChain.java:44) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.base.page.PageBlockMgr.create(PageBlockMgr.java:53) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeRecordsMgr.create(BPTreeRecordsMgr.java:77) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.create(BPTreeRecords.java:236) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.insertNewPage(BPTreeRecords.java:230) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.split(BPTreeRecords.java:206) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.split(BPTreeNode.java:568) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:509) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPTreeNode.insert(BPTreeNode.java:203) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPlusTree.insertAndReturnOld(BPlusTree.java:278) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.dboe.trans.bplustree.BPlusTree.insert(BPlusTree.java:271) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> 	at org.apache.jena.tdb2.store.tupletable.TupleIndexRecord.performAdd(TupleIndexRecord.java:94) ~[jena-tdb2-3.17.0.jar:3.17.0]
> 	at org.apache.jena.tdb2.store.tupletable.TupleIndexBase.add(TupleIndexBase.java:66) ~[jena-tdb2-3.17.0.jar:3.17.0]
> 	at org.apache.jena.tdb2.loader.main.Indexer.lambda$loadTuples$1(Indexer.java:134) ~[jena-tdb2-3.17.0.jar:3.17.0]
> 	at org.apache.jena.tdb2.loader.main.Indexer.stageIndex(Indexer.java:116) ~[jena-tdb2-3.17.0.jar:3.17.0]
> 	at org.apache.jena.tdb2.loader.main.Indexer.lambda$startBulk$0(Indexer.java:93) ~[jena-tdb2-3.17.0.jar:3.17.0]
> 	at java.lang.Thread.run(Thread.java:834) [?:?]
> [53485.632s][warning][os,thread] Attempt to deallocate stack guard pages failed (0x00007f33242fd000-0x00007f3324301000).
> {code}
> Side observation:
> * Fusekis loading performance seems to be influenced by linux' cache significantly. If there is 1TB available it had had a read rate of ~2.5GB/s (measured with htop). With only 16GB it dropped to at max 250MB/s. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)