You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2021/04/09 09:59:00 UTC
[jira] [Comment Edited] (JENA-2044) tdb2.tdbloader crashses loading
wikidata
[ https://issues.apache.org/jira/browse/JENA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290879#comment-17290879 ]
Andy Seaborne edited comment on JENA-2044 at 4/9/21, 9:58 AM:
--------------------------------------------------------------
First pass looking at the new material:
The load got further (~3B vs 750M triples) so changing settings had some effect.
The error is different:
{noformat}
Attempt to deallocate stack guard pages failed
{noformat}
which seems (from a web search) to be related to creating thread stackframes. I guess there wasn't a {{hs_err}} file this time - the JVM itself didn't crash.
{{tdb2.tdbloader}} creates all its threads at the start of the phases and while it is multithreaded, it uses a small fixed number per phase (less than 10).
I can't connect the new errors to what was happening before. The last time the {{hs_err}} file showed a fair number of GC threads. I don't know what counts as "reasonable"
was (Author: andy.seaborne):
First pass looking at the new material:
The load got further (~3B vs 750M triples) so changing settings had some effect.
The error is different:
{noformat}
Attempt to deallocate stack guard pages failed
{noformat}
which seems (from a web search) to be related to creating thread stackframes. I guess there wasn't a\{{hs_err}} file this time - the JVM itself didn't crash.
{{tdb2.tdbloader}} creates all its threads at the start of the phases and while it is multithreaded, it uses a small fixed number per phase (less than 10).
I can't connect the new errors to what was happening before. The last time the {{hs_err}} file showed a fair number of GC threads. I don't know what counts as "reasonable"
> tdb2.tdbloader crashses loading wikidata
> ----------------------------------------
>
> Key: JENA-2044
> URL: https://issues.apache.org/jira/browse/JENA-2044
> Project: Apache Jena
> Issue Type: Bug
> Components: Cmd line tools, Jena, TDB2
> Affects Versions: Jena 3.14.0, Jena 3.17.0
> Environment: {code:bash}
> $ java --version
> openjdk 11.0.9.1 2020-11-04
> OpenJDK Runtime Environment (build 11.0.9.1+1-post-Debian-1deb10u2)
> OpenJDK 64-Bit Server VM (build 11.0.9.1+1-post-Debian-1deb10u2, mixed mode, sharing)
> $ uname -r
> 4.19.0-14-amd64
> $ lsb_release -da
> No LSB modules are available.
> Distributor ID: Debian
> Description: Debian GNU/Linux 10 (buster)
> Release: 10
> Codename: buster
> {code}
> Reporter: Alexander Bigerl
> Priority: Major
> Attachments: FusekiWikidataLoaderDockerfile, fuseki-wikidata-2020-11-11_(docker_8c_J16g_R256g_K256g_phased).log, hs_err_pid28709.log
>
>
> Apache jena crashes when loading wikidata truthy 2020-11-11 (it is not available any more via wikidata, but a backup can be found here: [https://hobbitdata.informatik.uni-leipzig.de/wikidata-20201111-truthy-BETA.nt.bz2]
> command run was:
> {code:bash}
> cgmemtime /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/triplestores/jena/apache-jena-3.17.0/bin/tdb2.tdbloader --loader=sequential --loc /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/databases/fuseki/wikidata-2020-11-11/ /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/datasets/wikidata-2020-11-11/wikidata-20201111-truthy-BETA.nt 2>&1 | tee /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/logs/load/fuseki-wikidata-2020-11-11.log
> {code}
>
> The end of the logfile is:
> {code:bash}
> 06:56:52 INFO loader :: Add: 729,000,000 SPO->OSP (Batch: 237,642 / Avg: 300,050)
> 06:56:57 INFO loader :: Add: 730,000,000 SPO->OSP (Batch: 234,962 / Avg: 299,936)
> 06:56:57 INFO loader :: Elapsed: 2,433.85 seconds [2021/02/12 06:56:57 CET]
> 06:57:01 INFO loader :: Add: 731,000,000 SPO->OSP (Batch: 233,863 / Avg: 299,820)
> 06:57:05 INFO loader :: Add: 732,000,000 SPO->OSP (Batch: 269,978 / Avg: 299,775)
> 06:57:08 INFO loader :: Add: 733,000,000 SPO->OSP (Batch: 281,373 / Avg: 299,748)
> 06:57:12 INFO loader :: Add: 734,000,000 SPO->OSP (Batch: 285,143 / Avg: 299,727)
> 06:57:15 INFO loader :: Add: 735,000,000 SPO->OSP (Batch: 290,023 / Avg: 299,714)
> 06:57:19 INFO loader :: Add: 736,000,000 SPO->OSP (Batch: 290,951 / Avg: 299,701)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (malloc) failed to allocate 2097152 bytes for AllocateHeap
> # An error report file with more information is saved as:
> # /home/d/dice-gr/profiles/unix/cs/triplestore-benchmark/hs_err_pid28709.log
> Child user: 70961.063 s
> Child sys : 5817.787 s
> Child wall: 76243.666 s
> Child high-water RSS : 534037652 KiB
> Recursive and acc. high-water RSS+CACHE : 585081976 KiB
> {code}
> The machine has a AMD EPYC 7742 64-Core Processor, 1TB RAM and 2 TB free ssd storage on /home. So there should still be plenty of RAM have been available.
> The loc folder has at the time of crash 543GB.
> I also tried with -loader=parallel and -loader=phased . Same result.
> h3. update 23/Feb/2021:
> ulimit -a
> {code:bash}
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 4110480
> max locked memory (kbytes, -l) 65536
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 8192
> cpu time (seconds, -t) unlimited
> max user processes (-u) 4110480
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
> {code}
> output from prlimit --pid=<PID of tdb.loader2>:
> {code:bash}
> RESOURCE DESCRIPTION SOFT HARD UNITS
> AS address space limit unlimited unlimited bytes
> CORE max core file size 0 unlimited bytes
> CPU CPU time unlimited unlimited seconds
> DATA max data size unlimited unlimited bytes
> FSIZE max file size unlimited unlimited bytes
> LOCKS max number of file locks held unlimited unlimited locks
> MEMLOCK max locked-in-memory address space 67108864 67108864 bytes
> MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes
> NICE max nice prio allowed to raise 0 0
> NOFILE max number of open files 1048576 1048576 files
> NPROC max number of processes 4110480 4110480 processes
> RSS max resident set size unlimited unlimited bytes
> RTPRIO max real-time priority 0 0
> RTTIME timeout for real-time tasks unlimited unlimited microsecs
> SIGPENDING max number of pending signals 4110480 4110480 signals
> STACK max stack size 8388608 unlimited bytes
> {code}
> I tried several things. Nothing worked so far. Here is what I tried:
> * Setting several JVM_ARGS to avoid resource over-consumption:
> {code:bash}
> -Xmx16g -XX:MaxDirectMemorySize=16g -Djdk.nio.maxCachedBufferSize=17179869184 -XX:ActiveProcessorCount=4
> {code}
> * Setting limits for Metaspace and CodeCache:
> {code:bash}
> -XX:MaxMetaspaceSize=4g -XX:ReservedCodeCacheSize=4g
> {code}
> * limiting kernel shared memory to (set in /etc/sysctl.conf + reboot)
> {code:bash}
> kernel.shmmax=17179869185 # 16 GB
> kernel.shmall=4194304 # *4096 pagesize = 16 GB
> {code}
> * putting jena into a dockerfile and run it with the commands below. The idea was to limit the resources and have a default environment. I first tried with 16g memory but it within a couple 100M triples dropping below 10k triples/s loading speed. So I increased the reseources because I didn't have that much time on the server. Dockerfile is provided as attachment as well as the output. Here the output might be interesting. Because tdb2.loader did not crash completely but simply logged an Exception and idled ever since.
> {code:bash}
> docker build -f FusekiWikidataLoaderDockerfile -t fuseki-wikidata_01 empty_folder/
> dockerun --user $(id -u):$(id -g) --kernel-memory=256g --memory=256g --cpuset-cpus=0-7 --name fuseki_wikidata_8c_J16g_R256g_K256g_phased -v "$(pwd)"/databases:/databases -v "$(pwd)"/datasets:/datasets -v "$(pwd)"/logs:/logs fuseki-wikidata_01:latest
> {code}
> error:
> {code:bash}
> 05:27:31 INFO loader :: Add: 3,156,000,000 Index (Batch: 57,074 / Avg: 159,787)
> 05:27:35 ERROR Indexer :: Interrupted
> org.apache.jena.dboe.base.file.FileException: BlockMgrMapped.segmentAllocate: Segment = 16313 : Offset = 136843362304
> at org.apache.jena.dboe.base.file.BlockAccessMapped.allocSegment(BlockAccessMapped.java:218) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.file.BlockAccessMapped.getByteBuffer(BlockAccessMapped.java:150) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.file.BlockAccessMapped.allocate(BlockAccessMapped.java:80) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocate(BlockMgrFileAccess.java:51) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.block.BlockMgrBase.allocate(BlockMgrBase.java:39) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.block.BlockMgrWrapper.allocate(BlockMgrWrapper.java:41) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.block.BlockMgrFreeChain.allocate(BlockMgrFreeChain.java:44) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.page.PageBlockMgr.create(PageBlockMgr.java:53) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeRecordsMgr.create(BPTreeRecordsMgr.java:77) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.create(BPTreeRecords.java:236) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.insertNewPage(BPTreeRecords.java:230) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.split(BPTreeRecords.java:206) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.split(BPTreeNode.java:568) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:509) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.insert(BPTreeNode.java:203) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPlusTree.insertAndReturnOld(BPlusTree.java:278) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPlusTree.insert(BPlusTree.java:271) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.tdb2.store.tupletable.TupleIndexRecord.performAdd(TupleIndexRecord.java:94) ~[jena-tdb2-3.17.0.jar:3.17.0]
> at org.apache.jena.tdb2.store.tupletable.TupleIndexBase.add(TupleIndexBase.java:66) ~[jena-tdb2-3.17.0.jar:3.17.0]
> at org.apache.jena.tdb2.loader.main.Indexer.lambda$loadTuples$1(Indexer.java:134) ~[jena-tdb2-3.17.0.jar:3.17.0]
> at org.apache.jena.tdb2.loader.main.Indexer.stageIndex(Indexer.java:116) ~[jena-tdb2-3.17.0.jar:3.17.0]
> at org.apache.jena.tdb2.loader.main.Indexer.lambda$startBulk$0(Indexer.java:93) ~[jena-tdb2-3.17.0.jar:3.17.0]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> [53484.193s][warning][os,thread] Attempt to deallocate stack guard pages failed (0x00007f33244ff000-0x00007f3324503000).
> 05:27:36 ERROR Indexer :: Interrupted
> org.apache.jena.dboe.base.file.FileException: BlockMgrMapped.segmentAllocate: Segment = 16459 : Offset = 138068099072
> at org.apache.jena.dboe.base.file.BlockAccessMapped.allocSegment(BlockAccessMapped.java:218) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.file.BlockAccessMapped.getByteBuffer(BlockAccessMapped.java:150) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.file.BlockAccessMapped.allocate(BlockAccessMapped.java:80) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocate(BlockMgrFileAccess.java:51) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.block.BlockMgrBase.allocate(BlockMgrBase.java:39) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.block.BlockMgrWrapper.allocate(BlockMgrWrapper.java:41) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.block.BlockMgrFreeChain.allocate(BlockMgrFreeChain.java:44) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.base.page.PageBlockMgr.create(PageBlockMgr.java:53) ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeRecordsMgr.create(BPTreeRecordsMgr.java:77) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.create(BPTreeRecords.java:236) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.insertNewPage(BPTreeRecords.java:230) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeRecords.split(BPTreeRecords.java:206) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.split(BPTreeNode.java:568) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:509) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPTreeNode.insert(BPTreeNode.java:203) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPlusTree.insertAndReturnOld(BPlusTree.java:278) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.dboe.trans.bplustree.BPlusTree.insert(BPlusTree.java:271) ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at org.apache.jena.tdb2.store.tupletable.TupleIndexRecord.performAdd(TupleIndexRecord.java:94) ~[jena-tdb2-3.17.0.jar:3.17.0]
> at org.apache.jena.tdb2.store.tupletable.TupleIndexBase.add(TupleIndexBase.java:66) ~[jena-tdb2-3.17.0.jar:3.17.0]
> at org.apache.jena.tdb2.loader.main.Indexer.lambda$loadTuples$1(Indexer.java:134) ~[jena-tdb2-3.17.0.jar:3.17.0]
> at org.apache.jena.tdb2.loader.main.Indexer.stageIndex(Indexer.java:116) ~[jena-tdb2-3.17.0.jar:3.17.0]
> at org.apache.jena.tdb2.loader.main.Indexer.lambda$startBulk$0(Indexer.java:93) ~[jena-tdb2-3.17.0.jar:3.17.0]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> [53485.632s][warning][os,thread] Attempt to deallocate stack guard pages failed (0x00007f33242fd000-0x00007f3324301000).
> {code}
> Side observation:
> * Fusekis loading performance seems to be influenced by linux' cache significantly. If there is 1TB available it had had a read rate of ~2.5GB/s (measured with htop). With only 16GB it dropped to at max 250MB/s.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)