You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Neubert, Joachim" <J....@zbw.eu> on 2022/02/11 17:53:12 UTC
xloader "Can't find gzip program"
I've just started tests with xloader. It aborts with
17:21:56 INFO Data :: Triples = 10,000,000 ; Quads = 0
17:21:57 INFO =-=-=-=-=-=-=-=
17:21:57 INFO
17:21:57 INFO Build SPO
17:21:57 INFO (Very long pause likely at this point)
17:21:58 INFO Index :: Build index SPO
java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find gzip program
at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIndexX.java:207)
at org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIndexX.java:121)
at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.java:106)
at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.java:94)
at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
at org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java:67)
at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIndexX.java:183)
... 8 more
Of course, /usr/bin/gzip is in the path. My configuration is below, tdb2.xloader was called with --threads=12.
Any idea what could be wrong?
Cheers, Joachim
Configuration:
openjdk version "11.0.13" 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, sharing)
JAVA_OPTS: -d64 -Xmx12G
Loader: tdb2.xloader
Jena: VERSION: 4.4.0
Jena: BUILD_DATE: 2022-01-30T15:09:41Z
ARQ: VERSION: 4.4.0
ARQ: BUILD_DATE: 2022-01-30T15:09:41Z
TDB: VERSION: 4.4.0
TDB: BUILD_DATE: 2022-01-30T15:09:41Z
Use fuseki tdb2.xloader on file /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
17:20:13 INFO Setup:
17:20:13 INFO Database: /zbw/var/lib/fuseki/databases/temp
17:20:13 INFO Data: /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
17:20:13 INFO TMPDIR: /zbw/var/lib/fuseki/databases/temp
17:20:13 INFO
17:20:13 INFO Load node table
--
Joachim Neubert
ZBW - Leibniz Information Centre for Economics
Neuer Jungfernstieg 21
20354 Hamburg
Phone +49-40-42834-462
AW: xloader "Can't find gzip program"
Posted by Andy Seaborne <an...@apache.org>.
Thanks for the details. Good to add to the collective experience.
One reason to parse the file to /dev/null before trying to load it.
It doesn't look like there is much you can do. Reading the man page for
bzip2recover, it's going to loose some data and if that is not aligned
to N-triples, it will break the parser. Only by finding and fixing up
the damaged (in the NT sense) block file will it recover most of the data.
Andy
On 14/02/2022 13:19, Neubert, Joachim wrote:
> The error was in the binary:
> lbzcat: "/zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2": compressed data error: bad block header magic
>
> That created non-RDF input:
>
> [nbt@e6810f891672 ~]$ bzcat /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2 | sed -n '4052914958,4052914960p;4052914961q'
> <http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "\u0646\u062C\u0645 \u0641\u064A \u0643\u0648\u0643\u0628\u0629 \u0627\u0644\u062B\u0648\u0631"@ar .
>
> bzcat: Compressed file ends unexpectedly;
> perhaps it is corrupted? *Possible* reason follows.
> bzcat: Success
> Input file = /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2, output file = (stdout)
>
> It is possible that the compressed file(s) have become corrupted.
> You can use the -tvv option to test integrity of such files.
>
> You can use the `bzip2recover' program to attempt to recover
> data from undamaged sections of corrupted files.
>
> <http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "star in the constellation Taurus"@en .
> <https://www.wikidata.org/wiki/Special:EntityData/Q85112563> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Dataset> .
>
> which in turn produced:
>
> 03:02:18 INFO Nodes :: Add: 4,052,000,000 latest-truthy.nt (Batch: 108,189 / Avg: 102,550)
> 03:02:26 ERROR riot :: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream]
> Exception in thread "AsyncParser" org.apache.jena.riot.RiotException: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream]
> at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
> at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
> at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
> at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:95)
> at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61)
> at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53)
> at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
> at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:186)
> at org.apache.jena.riot.RDFParser.read(RDFParser.java:366)
> at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:335)
> at org.apache.jena.riot.RDFParser.parse(RDFParser.java:310)
> at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:552)
> at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$0(ProcBuildNodeTableX.java:198)
> at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
> at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$1(ProcBuildNodeTableX.java:194)
> at java.base/java.lang.Thread.run(Thread.java:829)
>
> Cheers, Joachim
>
>> -----Ursprüngliche Nachricht-----
>> Von: Andy Seaborne <an...@apache.org>
>> Gesendet: Montag, 14. Februar 2022 13:46
>> An: users@jena.apache.org
>> Betreff: Re: AW: AW: AW: AW: xloader "Can't find gzip program"
>>
>>
>>
>> On 14/02/2022 08:01, Neubert, Joachim wrote:
>>> Thanks, Andy, the TDB2 assembler fixed it, and all worked well.
>>>
>>> I've tried to load wikidata-truthy then, but apparently the bzip file
>>> was damaged at line 4052914959 - have to try again
>>
>> How annoying.
>>
>> Is it an RDF syntax error or bad binary or somethign else?
>>
>> --
>>
>> My experience is that gz is faster to load.
>>
>> bz2 emphases compactness over speed.
>>
>> Andy
>>
>>>
>>> Cheers, Joachim
>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Andy Seaborne <an...@apache.org>
>>>> Gesendet: Samstag, 12. Februar 2022 11:15
>>>> An: users@jena.apache.org
>>>> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"
>>>>
>>>> Hi Joachim,
>>>>
>>>> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
>>>>
>>>> The build setup is for repeatable builds of releases. Any build from
>>>> the X.Y.Z release source, with the same JDK, will generate the byte-wise
>> same jar files.
>>>>
>>>> Each release build fixes the timestamp and uses that, and it gets in
>>>> the POM as property <project.build.outputTimestamp>. It only get
>>>> updated when a release happens otherwise the POM file is going to get
>>>> modified several times a week.
>>>>
>>>> Thankfully, we have --version on most commands as well.
>>>>
>>>> That's timestamps explained.
>>>>
>>>> ----
>>>>
>>>> You seem to have run the TDB2 xloader, then given the text index
>>>> builder a assembler description for TDB1.
>>>>
>>>> Fuseki with --loc determines the database type by looking at the file
>>>> layout, but assemblers don't.
>>>>
>>>> The version output can be changed to say "TDB1" without too much
>>>> disruption. Small tweak that might have helped shown this up earlier.
>>>>
>>>> Andy
>>>>
>>>> On 11/02/2022 23:06, Neubert, Joachim wrote:
>>>>> Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
>>>>>
>>>>> Now the loading works smoothly:
>>>>>
>>>>> 22:50:10 INFO Load node table = 62 seconds
>>>>> 22:50:10 INFO Load ingest data = 37 seconds
>>>>> 22:50:10 INFO Build index SPO = 7 seconds
>>>>> 22:50:10 INFO Build index POS = 12 seconds
>>>>> 22:50:10 INFO Build index OSP = 9 seconds
>>>>> 22:50:10 INFO Overall 127 seconds
>>>>> 22:50:10 INFO Overall 00h 02m 07s
>>>>> 22:50:10 INFO Triples loaded = 10000000
>>>>> 22:50:10 INFO Quads loaded = 0
>>>>> 22:50:10 INFO Overall Rate 78740 tuples per second
>>>>
>>>> That's output from tdb2.xloader.
>>>>
>>>> At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
>>>> "tdb2.tdbloader --loader=parallel"
>>>>
>>>>> However, the text indexing crashes, when called like that:
>>>>>
>>>>> java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
>>>>> --desc=/tmp/temp.ttl
>>>>>
>>>>> org.apache.jena.assembler.exceptions.AssemblerException: caught:
>>>> Unable to check TDB lock owner, the lock file contents appear to be
>>>> for a
>>>> TDB2 database. Please try loading this location as a TDB2 database.
>>>> See https://jena.apache.org/documentation/tdb/faqs.html for more
>>>> information.
>>>>> doing:
>>>>> root: file:///tmp/temp.ttl#dataset with type:
>>>>> http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
>>>>> org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
>>>>
>>>> But that is TDB1
>>>>
>>>>> root: http://localhost/jena_example/#text_dataset with type:
>>>>> http://jena.apache.org/text#TextDataset assembler class: class
>>>>> org.apache.jena.query.text.assembler.TextDatasetAssembler
>>>>>
>>>> ...
>>>>> Caused by: org.apache.jena.tdb.base.file.FileException: Unable to
>>>>> check
>>>> TDB lock owner, the lock file contents appear to be for a TDB2 database.
>>>> Please try loading this location as a TDB2 database. See
>>>> https://jena.apache.org/documentation/tdb/faqs.html for more
>>>> information.
>>>>> at
>>>>> org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
>>>>> 110)
>>>>
>>>> org.apache.jena.tdb == TDB1
>>>>
>>>>> at
>>>> org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.jav
>>>> a:139)
>>>>> at
>>>>
>> org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.jav
>>>> a
>>>> :262)
>>>>> at
>>>> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
>>>>> at
>>>> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
>>>>> at
>>>> org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(Datase
>>>> tGra
>>>> phTransaction.java:72)
>>>>> at
>>>>> org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
>>>> ...
>>>>
>>>>> ... 23 more
>>>>> 2022-02-11 22:50:12 ABORTED
>>>>>
>>>>> cat /var/lib/fuseki/databases/temp/tdb.lock
>>>>> 32907
>>>>>
>>>>> Cheers, Joachim
AW: AW: AW: AW: AW: xloader "Can't find gzip program"
Posted by "Neubert, Joachim" <J....@zbw.eu>.
The error was in the binary:
lbzcat: "/zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2": compressed data error: bad block header magic
That created non-RDF input:
[nbt@e6810f891672 ~]$ bzcat /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2 | sed -n '4052914958,4052914960p;4052914961q'
<http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "\u0646\u062C\u0645 \u0641\u064A \u0643\u0648\u0643\u0628\u0629 \u0627\u0644\u062B\u0648\u0631"@ar .
bzcat: Compressed file ends unexpectedly;
perhaps it is corrupted? *Possible* reason follows.
bzcat: Success
Input file = /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2, output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
<http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "star in the constellation Taurus"@en .
<https://www.wikidata.org/wiki/Special:EntityData/Q85112563> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Dataset> .
which in turn produced:
03:02:18 INFO Nodes :: Add: 4,052,000,000 latest-truthy.nt (Batch: 108,189 / Avg: 102,550)
03:02:26 ERROR riot :: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream]
Exception in thread "AsyncParser" org.apache.jena.riot.RiotException: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream]
at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:95)
at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61)
at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:186)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:366)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:335)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:310)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:552)
at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$0(ProcBuildNodeTableX.java:198)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$1(ProcBuildNodeTableX.java:194)
at java.base/java.lang.Thread.run(Thread.java:829)
Cheers, Joachim
> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne <an...@apache.org>
> Gesendet: Montag, 14. Februar 2022 13:46
> An: users@jena.apache.org
> Betreff: Re: AW: AW: AW: AW: xloader "Can't find gzip program"
>
>
>
> On 14/02/2022 08:01, Neubert, Joachim wrote:
> > Thanks, Andy, the TDB2 assembler fixed it, and all worked well.
> >
> > I've tried to load wikidata-truthy then, but apparently the bzip file
> > was damaged at line 4052914959 - have to try again
>
> How annoying.
>
> Is it an RDF syntax error or bad binary or somethign else?
>
> --
>
> My experience is that gz is faster to load.
>
> bz2 emphases compactness over speed.
>
> Andy
>
> >
> > Cheers, Joachim
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Andy Seaborne <an...@apache.org>
> >> Gesendet: Samstag, 12. Februar 2022 11:15
> >> An: users@jena.apache.org
> >> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"
> >>
> >> Hi Joachim,
> >>
> >> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
> >>
> >> The build setup is for repeatable builds of releases. Any build from
> >> the X.Y.Z release source, with the same JDK, will generate the byte-wise
> same jar files.
> >>
> >> Each release build fixes the timestamp and uses that, and it gets in
> >> the POM as property <project.build.outputTimestamp>. It only get
> >> updated when a release happens otherwise the POM file is going to get
> >> modified several times a week.
> >>
> >> Thankfully, we have --version on most commands as well.
> >>
> >> That's timestamps explained.
> >>
> >> ----
> >>
> >> You seem to have run the TDB2 xloader, then given the text index
> >> builder a assembler description for TDB1.
> >>
> >> Fuseki with --loc determines the database type by looking at the file
> >> layout, but assemblers don't.
> >>
> >> The version output can be changed to say "TDB1" without too much
> >> disruption. Small tweak that might have helped shown this up earlier.
> >>
> >> Andy
> >>
> >> On 11/02/2022 23:06, Neubert, Joachim wrote:
> >>> Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
> >>>
> >>> Now the loading works smoothly:
> >>>
> >>> 22:50:10 INFO Load node table = 62 seconds
> >>> 22:50:10 INFO Load ingest data = 37 seconds
> >>> 22:50:10 INFO Build index SPO = 7 seconds
> >>> 22:50:10 INFO Build index POS = 12 seconds
> >>> 22:50:10 INFO Build index OSP = 9 seconds
> >>> 22:50:10 INFO Overall 127 seconds
> >>> 22:50:10 INFO Overall 00h 02m 07s
> >>> 22:50:10 INFO Triples loaded = 10000000
> >>> 22:50:10 INFO Quads loaded = 0
> >>> 22:50:10 INFO Overall Rate 78740 tuples per second
> >>
> >> That's output from tdb2.xloader.
> >>
> >> At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
> >> "tdb2.tdbloader --loader=parallel"
> >>
> >>> However, the text indexing crashes, when called like that:
> >>>
> >>> java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
> >>> --desc=/tmp/temp.ttl
> >>>
> >>> org.apache.jena.assembler.exceptions.AssemblerException: caught:
> >> Unable to check TDB lock owner, the lock file contents appear to be
> >> for a
> >> TDB2 database. Please try loading this location as a TDB2 database.
> >> See https://jena.apache.org/documentation/tdb/faqs.html for more
> >> information.
> >>> doing:
> >>> root: file:///tmp/temp.ttl#dataset with type:
> >>> http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
> >>> org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
> >>
> >> But that is TDB1
> >>
> >>> root: http://localhost/jena_example/#text_dataset with type:
> >>> http://jena.apache.org/text#TextDataset assembler class: class
> >>> org.apache.jena.query.text.assembler.TextDatasetAssembler
> >>>
> >> ...
> >>> Caused by: org.apache.jena.tdb.base.file.FileException: Unable to
> >>> check
> >> TDB lock owner, the lock file contents appear to be for a TDB2 database.
> >> Please try loading this location as a TDB2 database. See
> >> https://jena.apache.org/documentation/tdb/faqs.html for more
> >> information.
> >>> at
> >>> org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
> >>> 110)
> >>
> >> org.apache.jena.tdb == TDB1
> >>
> >>> at
> >> org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.jav
> >> a:139)
> >>> at
> >>
> org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.jav
> >> a
> >> :262)
> >>> at
> >> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
> >>> at
> >> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
> >>> at
> >> org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(Datase
> >> tGra
> >> phTransaction.java:72)
> >>> at
> >>> org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
> >> ...
> >>
> >>> ... 23 more
> >>> 2022-02-11 22:50:12 ABORTED
> >>>
> >>> cat /var/lib/fuseki/databases/temp/tdb.lock
> >>> 32907
> >>>
> >>> Cheers, Joachim
Re: AW: AW: AW: AW: xloader "Can't find gzip program"
Posted by Andy Seaborne <an...@apache.org>.
On 14/02/2022 08:01, Neubert, Joachim wrote:
> Thanks, Andy, the TDB2 assembler fixed it, and all worked well.
>
> I've tried to load wikidata-truthy then, but apparently the bzip file was damaged at line 4052914959 - have to try again
How annoying.
Is it an RDF syntax error or bad binary or somethign else?
--
My experience is that gz is faster to load.
bz2 emphases compactness over speed.
Andy
>
> Cheers, Joachim
>
>> -----Ursprüngliche Nachricht-----
>> Von: Andy Seaborne <an...@apache.org>
>> Gesendet: Samstag, 12. Februar 2022 11:15
>> An: users@jena.apache.org
>> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"
>>
>> Hi Joachim,
>>
>> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
>>
>> The build setup is for repeatable builds of releases. Any build from the X.Y.Z
>> release source, with the same JDK, will generate the byte-wise same jar files.
>>
>> Each release build fixes the timestamp and uses that, and it gets in the POM
>> as property <project.build.outputTimestamp>. It only get updated when a
>> release happens otherwise the POM file is going to get modified several
>> times a week.
>>
>> Thankfully, we have --version on most commands as well.
>>
>> That's timestamps explained.
>>
>> ----
>>
>> You seem to have run the TDB2 xloader, then given the text index builder a
>> assembler description for TDB1.
>>
>> Fuseki with --loc determines the database type by looking at the file layout,
>> but assemblers don't.
>>
>> The version output can be changed to say "TDB1" without too much
>> disruption. Small tweak that might have helped shown this up earlier.
>>
>> Andy
>>
>> On 11/02/2022 23:06, Neubert, Joachim wrote:
>>> Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
>>>
>>> Now the loading works smoothly:
>>>
>>> 22:50:10 INFO Load node table = 62 seconds
>>> 22:50:10 INFO Load ingest data = 37 seconds
>>> 22:50:10 INFO Build index SPO = 7 seconds
>>> 22:50:10 INFO Build index POS = 12 seconds
>>> 22:50:10 INFO Build index OSP = 9 seconds
>>> 22:50:10 INFO Overall 127 seconds
>>> 22:50:10 INFO Overall 00h 02m 07s
>>> 22:50:10 INFO Triples loaded = 10000000
>>> 22:50:10 INFO Quads loaded = 0
>>> 22:50:10 INFO Overall Rate 78740 tuples per second
>>
>> That's output from tdb2.xloader.
>>
>> At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
>> "tdb2.tdbloader --loader=parallel"
>>
>>> However, the text indexing crashes, when called like that:
>>>
>>> java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
>>> --desc=/tmp/temp.ttl
>>>
>>> org.apache.jena.assembler.exceptions.AssemblerException: caught:
>> Unable to check TDB lock owner, the lock file contents appear to be for a
>> TDB2 database. Please try loading this location as a TDB2 database. See
>> https://jena.apache.org/documentation/tdb/faqs.html for more
>> information.
>>> doing:
>>> root: file:///tmp/temp.ttl#dataset with type:
>>> http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
>>> org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
>>
>> But that is TDB1
>>
>>> root: http://localhost/jena_example/#text_dataset with type:
>>> http://jena.apache.org/text#TextDataset assembler class: class
>>> org.apache.jena.query.text.assembler.TextDatasetAssembler
>>>
>> ...
>>> Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check
>> TDB lock owner, the lock file contents appear to be for a TDB2 database.
>> Please try loading this location as a TDB2 database. See
>> https://jena.apache.org/documentation/tdb/faqs.html for more
>> information.
>>> at
>>> org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
>>> 110)
>>
>> org.apache.jena.tdb == TDB1
>>
>>> at
>> org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)
>>> at
>> org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java
>> :262)
>>> at
>> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
>>> at
>> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
>>> at
>> org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGra
>> phTransaction.java:72)
>>> at
>>> org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
>> ...
>>
>>> ... 23 more
>>> 2022-02-11 22:50:12 ABORTED
>>>
>>> cat /var/lib/fuseki/databases/temp/tdb.lock
>>> 32907
>>>
>>> Cheers, Joachim
AW: AW: AW: AW: xloader "Can't find gzip program"
Posted by "Neubert, Joachim" <J....@zbw.eu>.
Thanks, Andy, the TDB2 assembler fixed it, and all worked well.
I've tried to load wikidata-truthy then, but apparently the bzip file was damaged at line 4052914959 - have to try again
Cheers, Joachim
> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne <an...@apache.org>
> Gesendet: Samstag, 12. Februar 2022 11:15
> An: users@jena.apache.org
> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"
>
> Hi Joachim,
>
> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
>
> The build setup is for repeatable builds of releases. Any build from the X.Y.Z
> release source, with the same JDK, will generate the byte-wise same jar files.
>
> Each release build fixes the timestamp and uses that, and it gets in the POM
> as property <project.build.outputTimestamp>. It only get updated when a
> release happens otherwise the POM file is going to get modified several
> times a week.
>
> Thankfully, we have --version on most commands as well.
>
> That's timestamps explained.
>
> ----
>
> You seem to have run the TDB2 xloader, then given the text index builder a
> assembler description for TDB1.
>
> Fuseki with --loc determines the database type by looking at the file layout,
> but assemblers don't.
>
> The version output can be changed to say "TDB1" without too much
> disruption. Small tweak that might have helped shown this up earlier.
>
> Andy
>
> On 11/02/2022 23:06, Neubert, Joachim wrote:
> > Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
> >
> > Now the loading works smoothly:
> >
> > 22:50:10 INFO Load node table = 62 seconds
> > 22:50:10 INFO Load ingest data = 37 seconds
> > 22:50:10 INFO Build index SPO = 7 seconds
> > 22:50:10 INFO Build index POS = 12 seconds
> > 22:50:10 INFO Build index OSP = 9 seconds
> > 22:50:10 INFO Overall 127 seconds
> > 22:50:10 INFO Overall 00h 02m 07s
> > 22:50:10 INFO Triples loaded = 10000000
> > 22:50:10 INFO Quads loaded = 0
> > 22:50:10 INFO Overall Rate 78740 tuples per second
>
> That's output from tdb2.xloader.
>
> At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
> "tdb2.tdbloader --loader=parallel"
>
> > However, the text indexing crashes, when called like that:
> >
> > java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
> > --desc=/tmp/temp.ttl
> >
> > org.apache.jena.assembler.exceptions.AssemblerException: caught:
> Unable to check TDB lock owner, the lock file contents appear to be for a
> TDB2 database. Please try loading this location as a TDB2 database. See
> https://jena.apache.org/documentation/tdb/faqs.html for more
> information.
> > doing:
> > root: file:///tmp/temp.ttl#dataset with type:
> > http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
> > org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
>
> But that is TDB1
>
> > root: http://localhost/jena_example/#text_dataset with type:
> > http://jena.apache.org/text#TextDataset assembler class: class
> > org.apache.jena.query.text.assembler.TextDatasetAssembler
> >
> ...
> > Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check
> TDB lock owner, the lock file contents appear to be for a TDB2 database.
> Please try loading this location as a TDB2 database. See
> https://jena.apache.org/documentation/tdb/faqs.html for more
> information.
> > at
> > org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
> > 110)
>
> org.apache.jena.tdb == TDB1
>
> > at
> org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)
> > at
> org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java
> :262)
> > at
> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
> > at
> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
> > at
> org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGra
> phTransaction.java:72)
> > at
> > org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
> ...
>
> > ... 23 more
> > 2022-02-11 22:50:12 ABORTED
> >
> > cat /var/lib/fuseki/databases/temp/tdb.lock
> > 32907
> >
> > Cheers, Joachim
Re: AW: AW: AW: xloader "Can't find gzip program"
Posted by Andy Seaborne <an...@apache.org>.
Hi Joachim,
Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
The build setup is for repeatable builds of releases. Any build from the
X.Y.Z release source, with the same JDK, will generate the byte-wise
same jar files.
Each release build fixes the timestamp and uses that, and it gets in the
POM as property <project.build.outputTimestamp>. It only get updated
when a release happens otherwise the POM file is going to get modified
several times a week.
Thankfully, we have --version on most commands as well.
That's timestamps explained.
----
You seem to have run the TDB2 xloader, then given the text index builder
a assembler description for TDB1.
Fuseki with --loc determines the database type by looking at the file
layout, but assemblers don't.
The version output can be changed to say "TDB1" without too much
disruption. Small tweak that might have helped shown this up earlier.
Andy
On 11/02/2022 23:06, Neubert, Joachim wrote:
> Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
>
> Now the loading works smoothly:
>
> 22:50:10 INFO Load node table = 62 seconds
> 22:50:10 INFO Load ingest data = 37 seconds
> 22:50:10 INFO Build index SPO = 7 seconds
> 22:50:10 INFO Build index POS = 12 seconds
> 22:50:10 INFO Build index OSP = 9 seconds
> 22:50:10 INFO Overall 127 seconds
> 22:50:10 INFO Overall 00h 02m 07s
> 22:50:10 INFO Triples loaded = 10000000
> 22:50:10 INFO Quads loaded = 0
> 22:50:10 INFO Overall Rate 78740 tuples per second
That's output from tdb2.xloader.
At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
"tdb2.tdbloader --loader=parallel"
> However, the text indexing crashes, when called like that:
>
> java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug --desc=/tmp/temp.ttl
>
> org.apache.jena.assembler.exceptions.AssemblerException: caught: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information.
> doing:
> root: file:///tmp/temp.ttl#dataset with type: http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
But that is TDB1
> root: http://localhost/jena_example/#text_dataset with type: http://jena.apache.org/text#TextDataset assembler class: class org.apache.jena.query.text.assembler.TextDatasetAssembler
>
...
> Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information.
> at org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:110)
org.apache.jena.tdb == TDB1
> at org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)
> at org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java:262)
> at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
> at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
> at org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGraphTransaction.java:72)
> at org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
...
> ... 23 more
> 2022-02-11 22:50:12 ABORTED
>
> cat /var/lib/fuseki/databases/temp/tdb.lock
> 32907
>
> Cheers, Joachim
AW: AW: AW: xloader "Can't find gzip program"
Posted by "Neubert, Joachim" <J....@zbw.eu>.
Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
Now the loading works smoothly:
22:50:10 INFO Load node table = 62 seconds
22:50:10 INFO Load ingest data = 37 seconds
22:50:10 INFO Build index SPO = 7 seconds
22:50:10 INFO Build index POS = 12 seconds
22:50:10 INFO Build index OSP = 9 seconds
22:50:10 INFO Overall 127 seconds
22:50:10 INFO Overall 00h 02m 07s
22:50:10 INFO Triples loaded = 10000000
22:50:10 INFO Quads loaded = 0
22:50:10 INFO Overall Rate 78740 tuples per second
However, the text indexing crashes, when called like that:
java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug --desc=/tmp/temp.ttl
org.apache.jena.assembler.exceptions.AssemblerException: caught: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information.
doing:
root: file:///tmp/temp.ttl#dataset with type: http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
root: http://localhost/jena_example/#text_dataset with type: http://jena.apache.org/text#TextDataset assembler class: class org.apache.jena.query.text.assembler.TextDatasetAssembler
at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:165)
at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.open(AssemblerGroup.java:144)
at org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.open(AssemblerGroup.java:93)
at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:39)
at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:35)
at org.apache.jena.query.text.assembler.TextDatasetAssembler.open(TextDatasetAssembler.java:67)
at org.apache.jena.query.text.assembler.TextDatasetAssembler.open(TextDatasetAssembler.java:42)
at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:157)
at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.open(AssemblerGroup.java:144)
at org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.open(AssemblerGroup.java:93)
at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:39)
at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:35)
at org.apache.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:144)
at org.apache.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:132)
at org.apache.jena.query.text.TextDatasetFactory.create(TextDatasetFactory.java:38)
at org.apache.jena.query.text.cmd.textindexer.processModulesAndArgs(textindexer.java:90)
at org.apache.jena.cmd.CmdArgModule.process(CmdArgModule.java:39)
at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:86)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43)
at org.apache.jena.query.text.cmd.textindexer.main(textindexer.java:52)
at org.apache.jena.query.text.cmd.InitTextCmds.lambda$cmds$1(InitTextCmds.java:26)
at org.apache.jena.cmd.Cmds.exec(Cmds.java:65)
at jena.textindexer.main(textindexer.java:25)
Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information.
at org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:110)
at org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)
at org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java:262)
at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
at org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGraphTransaction.java:72)
at org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
at org.apache.jena.tdb.sys.TDBMaker._create(TDBMaker.java:100)
at org.apache.jena.tdb.sys.TDBMaker.createDatasetGraphTransaction(TDBMaker.java:43)
at org.apache.jena.tdb.TDBFactory._createDatasetGraph(TDBFactory.java:93)
at org.apache.jena.tdb.TDBFactory.createDatasetGraph(TDBFactory.java:71)
at org.apache.jena.tdb.assembler.DatasetAssemblerTDB1.make(DatasetAssemblerTDB1.java:55)
at org.apache.jena.tdb.assembler.DatasetAssemblerTDB1.createDataset(DatasetAssemblerTDB1.java:46)
at org.apache.jena.sparql.core.assembler.DatasetAssembler.open(DatasetAssembler.java:40)
at org.apache.jena.sparql.core.assembler.DatasetAssembler.open(DatasetAssembler.java:33)
at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:157)
... 23 more
2022-02-11 22:50:12 ABORTED
cat /var/lib/fuseki/databases/temp/tdb.lock
32907
Cheers, Joachim
> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne <an...@apache.org>
> Gesendet: Freitag, 11. Februar 2022 23:06
> An: users@jena.apache.org
> Betreff: Re: AW: AW: xloader "Can't find gzip program"
>
>
>
> On 11/02/2022 21:38, Neubert, Joachim wrote:
> > Strange - I should have the same version:
> >
> > sudo tar xzvf
> > /usr/local/src/apache-jena-fuseki-4.5.0-20220209.180144-12.tar.gz
>
> Different jar file : apache-jena-4.5.0-20220209.180144-12 (no Fuseki) but
> weird anyway.
>
> wget
> https://repository.apache.org/content/groups/snapshots/org/apache/jena/
> apache-jena/4.5.0-SNAPSHOT/apache-jena-4.5.0-20220209.180144-12.zip
>
> then the zip file is:
>
> 27372309 Feb 9 18:26 apache-jena-4.5.0-20220209.180144-12.zip
>
>
> apache-jena-4.5.0-SNAPSHOT/bin/tdb2.tdbloader --version
>
> Jena: VERSION: 4.5.0-SNAPSHOT
> Jena: BUILD_DATE: 2022-02-09T18:01:44Z
> ARQ: VERSION: 4.5.0-SNAPSHOT
> ARQ: BUILD_DATE: 2022-02-09T18:01:44Z
> TDB2: VERSION: 4.5.0-SNAPSHOT
> TDB2: BUILD_DATE: 2022-02-09T18:01:44Z
>
> yet the TDB2 jar is dated 30th Jan, as are the files inside it -- can't explain that.
>
> 294846 Jan 30 15:03
> apache-jena-4.5.0-SNAPSHOT/lib/jena-tdb2-4.5.0-SNAPSHOT.jar
>
> The tdb2.xloader script is 10485 bytes and has
>
> SORT_THREADS="2"
>
> in it. Is that what your copy of the script have in it?
>
> I'll clear the Jenkins workspace and schedule a new build.
>
> Andy
>
> >
> > but the jarfile date is of Jan 30:
> >
> > ll apache-jena-fuseki-4.5.0-SNAPSHOT/
> > total 35868
> > -rw-r--r-- 1 root root 36975 Jan 30 15:02 LICENSE
> > -rw-r--r-- 1 root root 8914 Jan 30 15:02 NOTICE
> > -rw-r--r-- 1 root root 1151 Jan 30 15:02 README
> > drwxr-xr-x 2 root root 179 Feb 11 20:47 bin
> > -rwxr-xr-x 1 root root 12339 Jan 30 15:02 fuseki
> > -rwxr-xr-x 1 root root 1241 Jan 30 15:02 fuseki-backup
> > -rwxr-xr-x 1 root root 3370 Jan 30 15:02 fuseki-server
> > -rw-r--r-- 1 root root 1264 Jan 30 15:02 fuseki-server.bat
> > -rw-r--r-- 1 root root 36631864 Jan 30 15:02 fuseki-server.jar
> > -rw-r--r-- 1 root root 2217 Jan 30 15:02 fuseki.service
> > -rw-r--r-- 1 root root 2124 Jan 30 15:02 log4j2.properties
> > drwxr-xr-x 4 root root 121 Jan 30 15:02 webapp
> >
> > Cheers, Joachim
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Andy Seaborne <an...@apache.org>
> >> Gesendet: Freitag, 11. Februar 2022 22:30
> >> An: users@jena.apache.org
> >> Betreff: Re: AW: xloader "Can't find gzip program"
> >>
> >> Works for me - make sure it is the latest dev build (the one down the
> >> bottom)
> >>
> >> I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09)
> >>
> >> and loaded a few millions triples with no problems.
> >>
> >> rm -rf DB2
> >> apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2
> >> ~/Datasets/BSBM/bsbm-5m.nt.gz
> >>
> >> Andy
> >>
> >> On 11/02/2022 21:20, Neubert, Joachim wrote:
> >>> Hi Andy,
> >>>
> >>> Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster
> >>> -
> >> however, the same error at SPO start.
> >>>
> >>> Please let me know if I can help with tracing/reproducing the error.
> >>>
> >>> Cheers, Joachim
> >>>
> >>>> -----Ursprüngliche Nachricht-----
> >>>> Von: Andy Seaborne <an...@apache.org>
> >>>> Gesendet: Freitag, 11. Februar 2022 21:07
> >>>> An: users@jena.apache.org
> >>>> Betreff: Re: xloader "Can't find gzip program"
> >>>>
> >>>> Hi Joachim,
> >>>>
> >>>> https://issues.apache.org/jira/browse/JENA-2277
> >>>> https://issues.apache.org/jira/browse/JENA-2279
> >>>>
> >>>> There are two fixes for tdb2.xloader which are now in the
> >>>> development
> >>>> builds:
> >>>>
> >>>> https://repository.apache.org/content/groups/snapshots/org/apache/j
> >>>> en
> >>>> a/
> >>>>
> >>>> (these are not official releases and have not been voted on by the
> >>>> PMC)
> >>>>
> >>>> If you coudl test them and let us know if they work or whether
> >>>> theer are further problems, that would be great.
> >>>>
> >>>> Andy
> >>>>
> >>>>
> >>>> On 11/02/2022 17:53, Neubert, Joachim wrote:
> >>>>> I've just started tests with xloader. It aborts with
> >>>>>
> >>>>> 17:21:56 INFO Data :: Triples = 10,000,000 ; Quads = 0
> >>>>> 17:21:57 INFO =-=-=-=-=-=-=-=
> >>>>> 17:21:57 INFO
> >>>>> 17:21:57 INFO Build SPO
> >>>>> 17:21:57 INFO (Very long pause likely at this point)
> >>>>> 17:21:58 INFO Index :: Build index SPO
> >>>>> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException:
> >>>>> Can't find
> >>>> gzip program
> >>>>> at
> >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcB
> >>>> ui
> >>>> ldIn
> >>>> dexX.java:207)
> >>>>> at
> >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIn
> >>>> de
> >>>> xX.ja
> >>>> va:121)
> >>>>> at
> >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.
> >>>> ja
> >>>> va:1
> >>>> 06)
> >>>>> at
> >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.j
> >>>> av
> >>>> a:94
> >>>> )
> >>>>> at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
> >>>>> at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
> >>>>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> >>>>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> >>>>> at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
> >>>>> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip
> program
> >>>>> at
> >>>> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.ja
> >>>> va
> >>>> :67
> >>>> )
> >>>>> at
> >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcB
> >>>> ui
> >>>> ldIn
> >>>> dexX.java:183)
> >>>>> ... 8 more
> >>>>>
> >>>>> Of course, /usr/bin/gzip is in the path. My configuration is
> >>>>> below,
> >>>> tdb2.xloader was called with --threads=12.
> >>>>>
> >>>>> Any idea what could be wrong?
> >>>>>
> >>>>> Cheers, Joachim
> >>>>>
> >>>>>
> >>>>> Configuration:
> >>>>> openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime
> >> Environment
> >>>>> 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build
> >>>>> 11.0.13+8-LTS, mixed mode, sharing)
> >>>>> JAVA_OPTS: -d64 -Xmx12G
> >>>>> Loader: tdb2.xloader
> >>>>> Jena: VERSION: 4.4.0
> >>>>> Jena: BUILD_DATE: 2022-01-30T15:09:41Z
> >>>>> ARQ: VERSION: 4.4.0
> >>>>> ARQ: BUILD_DATE: 2022-01-30T15:09:41Z
> >>>>> TDB: VERSION: 4.4.0
> >>>>> TDB: BUILD_DATE: 2022-01-30T15:09:41Z
> >>>>>
> >>>>> Use fuseki tdb2.xloader on file
> >>>>> /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> >>>>> 17:20:13 INFO Setup:
> >>>>> 17:20:13 INFO Database: /zbw/var/lib/fuseki/databases/temp
> >>>>> 17:20:13 INFO Data: /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> >>>>> 17:20:13 INFO TMPDIR: /zbw/var/lib/fuseki/databases/temp
> >>>>> 17:20:13 INFO
> >>>>> 17:20:13 INFO Load node table
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Joachim Neubert
> >>>>>
> >>>>> ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg
> >>>>> 21
> >>>>> 20354 Hamburg
> >>>>> Phone +49-40-42834-462
> >>>>>
> >>>>>
Re: AW: AW: xloader "Can't find gzip program"
Posted by Andy Seaborne <an...@apache.org>.
On 11/02/2022 21:38, Neubert, Joachim wrote:
> Strange - I should have the same version:
>
> sudo tar xzvf /usr/local/src/apache-jena-fuseki-4.5.0-20220209.180144-12.tar.gz
Different jar file : apache-jena-4.5.0-20220209.180144-12 (no Fuseki)
but weird anyway.
wget
https://repository.apache.org/content/groups/snapshots/org/apache/jena/apache-jena/4.5.0-SNAPSHOT/apache-jena-4.5.0-20220209.180144-12.zip
then the zip file is:
27372309 Feb 9 18:26 apache-jena-4.5.0-20220209.180144-12.zip
apache-jena-4.5.0-SNAPSHOT/bin/tdb2.tdbloader --version
Jena: VERSION: 4.5.0-SNAPSHOT
Jena: BUILD_DATE: 2022-02-09T18:01:44Z
ARQ: VERSION: 4.5.0-SNAPSHOT
ARQ: BUILD_DATE: 2022-02-09T18:01:44Z
TDB2: VERSION: 4.5.0-SNAPSHOT
TDB2: BUILD_DATE: 2022-02-09T18:01:44Z
yet the TDB2 jar is dated 30th Jan, as are the files inside it -- can't
explain that.
294846 Jan 30 15:03
apache-jena-4.5.0-SNAPSHOT/lib/jena-tdb2-4.5.0-SNAPSHOT.jar
The tdb2.xloader script is 10485 bytes and has
SORT_THREADS="2"
in it. Is that what your copy of the script have in it?
I'll clear the Jenkins workspace and schedule a new build.
Andy
>
> but the jarfile date is of Jan 30:
>
> ll apache-jena-fuseki-4.5.0-SNAPSHOT/
> total 35868
> -rw-r--r-- 1 root root 36975 Jan 30 15:02 LICENSE
> -rw-r--r-- 1 root root 8914 Jan 30 15:02 NOTICE
> -rw-r--r-- 1 root root 1151 Jan 30 15:02 README
> drwxr-xr-x 2 root root 179 Feb 11 20:47 bin
> -rwxr-xr-x 1 root root 12339 Jan 30 15:02 fuseki
> -rwxr-xr-x 1 root root 1241 Jan 30 15:02 fuseki-backup
> -rwxr-xr-x 1 root root 3370 Jan 30 15:02 fuseki-server
> -rw-r--r-- 1 root root 1264 Jan 30 15:02 fuseki-server.bat
> -rw-r--r-- 1 root root 36631864 Jan 30 15:02 fuseki-server.jar
> -rw-r--r-- 1 root root 2217 Jan 30 15:02 fuseki.service
> -rw-r--r-- 1 root root 2124 Jan 30 15:02 log4j2.properties
> drwxr-xr-x 4 root root 121 Jan 30 15:02 webapp
>
> Cheers, Joachim
>
>> -----Ursprüngliche Nachricht-----
>> Von: Andy Seaborne <an...@apache.org>
>> Gesendet: Freitag, 11. Februar 2022 22:30
>> An: users@jena.apache.org
>> Betreff: Re: AW: xloader "Can't find gzip program"
>>
>> Works for me - make sure it is the latest dev build (the one down the
>> bottom)
>>
>> I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09)
>>
>> and loaded a few millions triples with no problems.
>>
>> rm -rf DB2
>> apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2
>> ~/Datasets/BSBM/bsbm-5m.nt.gz
>>
>> Andy
>>
>> On 11/02/2022 21:20, Neubert, Joachim wrote:
>>> Hi Andy,
>>>
>>> Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster -
>> however, the same error at SPO start.
>>>
>>> Please let me know if I can help with tracing/reproducing the error.
>>>
>>> Cheers, Joachim
>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Andy Seaborne <an...@apache.org>
>>>> Gesendet: Freitag, 11. Februar 2022 21:07
>>>> An: users@jena.apache.org
>>>> Betreff: Re: xloader "Can't find gzip program"
>>>>
>>>> Hi Joachim,
>>>>
>>>> https://issues.apache.org/jira/browse/JENA-2277
>>>> https://issues.apache.org/jira/browse/JENA-2279
>>>>
>>>> There are two fixes for tdb2.xloader which are now in the development
>>>> builds:
>>>>
>>>> https://repository.apache.org/content/groups/snapshots/org/apache/jen
>>>> a/
>>>>
>>>> (these are not official releases and have not been voted on by the
>>>> PMC)
>>>>
>>>> If you coudl test them and let us know if they work or whether theer
>>>> are further problems, that would be great.
>>>>
>>>> Andy
>>>>
>>>>
>>>> On 11/02/2022 17:53, Neubert, Joachim wrote:
>>>>> I've just started tests with xloader. It aborts with
>>>>>
>>>>> 17:21:56 INFO Data :: Triples = 10,000,000 ; Quads = 0
>>>>> 17:21:57 INFO =-=-=-=-=-=-=-=
>>>>> 17:21:57 INFO
>>>>> 17:21:57 INFO Build SPO
>>>>> 17:21:57 INFO (Very long pause likely at this point)
>>>>> 17:21:58 INFO Index :: Build index SPO
>>>>> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't
>>>>> find
>>>> gzip program
>>>>> at
>>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui
>>>> ldIn
>>>> dexX.java:207)
>>>>> at
>>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildInde
>>>> xX.ja
>>>> va:121)
>>>>> at
>>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.ja
>>>> va:1
>>>> 06)
>>>>> at
>>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.jav
>>>> a:94
>>>> )
>>>>> at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
>>>>> at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
>>>>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>> at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
>>>>> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
>>>>> at
>>>> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java
>>>> :67
>>>> )
>>>>> at
>>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui
>>>> ldIn
>>>> dexX.java:183)
>>>>> ... 8 more
>>>>>
>>>>> Of course, /usr/bin/gzip is in the path. My configuration is below,
>>>> tdb2.xloader was called with --threads=12.
>>>>>
>>>>> Any idea what could be wrong?
>>>>>
>>>>> Cheers, Joachim
>>>>>
>>>>>
>>>>> Configuration:
>>>>> openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime
>> Environment
>>>>> 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build
>>>>> 11.0.13+8-LTS, mixed mode, sharing)
>>>>> JAVA_OPTS: -d64 -Xmx12G
>>>>> Loader: tdb2.xloader
>>>>> Jena: VERSION: 4.4.0
>>>>> Jena: BUILD_DATE: 2022-01-30T15:09:41Z
>>>>> ARQ: VERSION: 4.4.0
>>>>> ARQ: BUILD_DATE: 2022-01-30T15:09:41Z
>>>>> TDB: VERSION: 4.4.0
>>>>> TDB: BUILD_DATE: 2022-01-30T15:09:41Z
>>>>>
>>>>> Use fuseki tdb2.xloader on file
>>>>> /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
>>>>> 17:20:13 INFO Setup:
>>>>> 17:20:13 INFO Database: /zbw/var/lib/fuseki/databases/temp
>>>>> 17:20:13 INFO Data: /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
>>>>> 17:20:13 INFO TMPDIR: /zbw/var/lib/fuseki/databases/temp
>>>>> 17:20:13 INFO
>>>>> 17:20:13 INFO Load node table
>>>>>
>>>>>
>>>>> --
>>>>> Joachim Neubert
>>>>>
>>>>> ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg
>>>>> 21
>>>>> 20354 Hamburg
>>>>> Phone +49-40-42834-462
>>>>>
>>>>>
AW: AW: xloader "Can't find gzip program"
Posted by "Neubert, Joachim" <J....@zbw.eu>.
Strange - I should have the same version:
sudo tar xzvf /usr/local/src/apache-jena-fuseki-4.5.0-20220209.180144-12.tar.gz
but the jarfile date is of Jan 30:
ll apache-jena-fuseki-4.5.0-SNAPSHOT/
total 35868
-rw-r--r-- 1 root root 36975 Jan 30 15:02 LICENSE
-rw-r--r-- 1 root root 8914 Jan 30 15:02 NOTICE
-rw-r--r-- 1 root root 1151 Jan 30 15:02 README
drwxr-xr-x 2 root root 179 Feb 11 20:47 bin
-rwxr-xr-x 1 root root 12339 Jan 30 15:02 fuseki
-rwxr-xr-x 1 root root 1241 Jan 30 15:02 fuseki-backup
-rwxr-xr-x 1 root root 3370 Jan 30 15:02 fuseki-server
-rw-r--r-- 1 root root 1264 Jan 30 15:02 fuseki-server.bat
-rw-r--r-- 1 root root 36631864 Jan 30 15:02 fuseki-server.jar
-rw-r--r-- 1 root root 2217 Jan 30 15:02 fuseki.service
-rw-r--r-- 1 root root 2124 Jan 30 15:02 log4j2.properties
drwxr-xr-x 4 root root 121 Jan 30 15:02 webapp
Cheers, Joachim
> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne <an...@apache.org>
> Gesendet: Freitag, 11. Februar 2022 22:30
> An: users@jena.apache.org
> Betreff: Re: AW: xloader "Can't find gzip program"
>
> Works for me - make sure it is the latest dev build (the one down the
> bottom)
>
> I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09)
>
> and loaded a few millions triples with no problems.
>
> rm -rf DB2
> apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2
> ~/Datasets/BSBM/bsbm-5m.nt.gz
>
> Andy
>
> On 11/02/2022 21:20, Neubert, Joachim wrote:
> > Hi Andy,
> >
> > Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster -
> however, the same error at SPO start.
> >
> > Please let me know if I can help with tracing/reproducing the error.
> >
> > Cheers, Joachim
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Andy Seaborne <an...@apache.org>
> >> Gesendet: Freitag, 11. Februar 2022 21:07
> >> An: users@jena.apache.org
> >> Betreff: Re: xloader "Can't find gzip program"
> >>
> >> Hi Joachim,
> >>
> >> https://issues.apache.org/jira/browse/JENA-2277
> >> https://issues.apache.org/jira/browse/JENA-2279
> >>
> >> There are two fixes for tdb2.xloader which are now in the development
> >> builds:
> >>
> >> https://repository.apache.org/content/groups/snapshots/org/apache/jen
> >> a/
> >>
> >> (these are not official releases and have not been voted on by the
> >> PMC)
> >>
> >> If you coudl test them and let us know if they work or whether theer
> >> are further problems, that would be great.
> >>
> >> Andy
> >>
> >>
> >> On 11/02/2022 17:53, Neubert, Joachim wrote:
> >>> I've just started tests with xloader. It aborts with
> >>>
> >>> 17:21:56 INFO Data :: Triples = 10,000,000 ; Quads = 0
> >>> 17:21:57 INFO =-=-=-=-=-=-=-=
> >>> 17:21:57 INFO
> >>> 17:21:57 INFO Build SPO
> >>> 17:21:57 INFO (Very long pause likely at this point)
> >>> 17:21:58 INFO Index :: Build index SPO
> >>> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't
> >>> find
> >> gzip program
> >>> at
> >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui
> >> ldIn
> >> dexX.java:207)
> >>> at
> >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildInde
> >> xX.ja
> >> va:121)
> >>> at
> >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.ja
> >> va:1
> >> 06)
> >>> at
> >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.jav
> >> a:94
> >> )
> >>> at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
> >>> at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
> >>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> >>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> >>> at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
> >>> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
> >>> at
> >> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java
> >> :67
> >> )
> >>> at
> >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui
> >> ldIn
> >> dexX.java:183)
> >>> ... 8 more
> >>>
> >>> Of course, /usr/bin/gzip is in the path. My configuration is below,
> >> tdb2.xloader was called with --threads=12.
> >>>
> >>> Any idea what could be wrong?
> >>>
> >>> Cheers, Joachim
> >>>
> >>>
> >>> Configuration:
> >>> openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime
> Environment
> >>> 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build
> >>> 11.0.13+8-LTS, mixed mode, sharing)
> >>> JAVA_OPTS: -d64 -Xmx12G
> >>> Loader: tdb2.xloader
> >>> Jena: VERSION: 4.4.0
> >>> Jena: BUILD_DATE: 2022-01-30T15:09:41Z
> >>> ARQ: VERSION: 4.4.0
> >>> ARQ: BUILD_DATE: 2022-01-30T15:09:41Z
> >>> TDB: VERSION: 4.4.0
> >>> TDB: BUILD_DATE: 2022-01-30T15:09:41Z
> >>>
> >>> Use fuseki tdb2.xloader on file
> >>> /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> >>> 17:20:13 INFO Setup:
> >>> 17:20:13 INFO Database: /zbw/var/lib/fuseki/databases/temp
> >>> 17:20:13 INFO Data: /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> >>> 17:20:13 INFO TMPDIR: /zbw/var/lib/fuseki/databases/temp
> >>> 17:20:13 INFO
> >>> 17:20:13 INFO Load node table
> >>>
> >>>
> >>> --
> >>> Joachim Neubert
> >>>
> >>> ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg
> >>> 21
> >>> 20354 Hamburg
> >>> Phone +49-40-42834-462
> >>>
> >>>
Re: AW: xloader "Can't find gzip program"
Posted by Andy Seaborne <an...@apache.org>.
Works for me - make sure it is the latest dev build (the one down the
bottom)
I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09)
and loaded a few millions triples with no problems.
rm -rf DB2
apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2
~/Datasets/BSBM/bsbm-5m.nt.gz
Andy
On 11/02/2022 21:20, Neubert, Joachim wrote:
> Hi Andy,
>
> Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster - however, the same error at SPO start.
>
> Please let me know if I can help with tracing/reproducing the error.
>
> Cheers, Joachim
>
>> -----Ursprüngliche Nachricht-----
>> Von: Andy Seaborne <an...@apache.org>
>> Gesendet: Freitag, 11. Februar 2022 21:07
>> An: users@jena.apache.org
>> Betreff: Re: xloader "Can't find gzip program"
>>
>> Hi Joachim,
>>
>> https://issues.apache.org/jira/browse/JENA-2277
>> https://issues.apache.org/jira/browse/JENA-2279
>>
>> There are two fixes for tdb2.xloader which are now in the development
>> builds:
>>
>> https://repository.apache.org/content/groups/snapshots/org/apache/jena/
>>
>> (these are not official releases and have not been voted on by the PMC)
>>
>> If you coudl test them and let us know if they work or whether theer are
>> further problems, that would be great.
>>
>> Andy
>>
>>
>> On 11/02/2022 17:53, Neubert, Joachim wrote:
>>> I've just started tests with xloader. It aborts with
>>>
>>> 17:21:56 INFO Data :: Triples = 10,000,000 ; Quads = 0
>>> 17:21:57 INFO =-=-=-=-=-=-=-=
>>> 17:21:57 INFO
>>> 17:21:57 INFO Build SPO
>>> 17:21:57 INFO (Very long pause likely at this point)
>>> 17:21:58 INFO Index :: Build index SPO
>>> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find
>> gzip program
>>> at
>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn
>> dexX.java:207)
>>> at
>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIndexX.ja
>> va:121)
>>> at
>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.java:1
>> 06)
>>> at
>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.java:94
>> )
>>> at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
>>> at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
>>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>> at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
>>> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
>>> at
>> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java:67
>> )
>>> at
>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn
>> dexX.java:183)
>>> ... 8 more
>>>
>>> Of course, /usr/bin/gzip is in the path. My configuration is below,
>> tdb2.xloader was called with --threads=12.
>>>
>>> Any idea what could be wrong?
>>>
>>> Cheers, Joachim
>>>
>>>
>>> Configuration:
>>> openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime Environment
>>> 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build
>>> 11.0.13+8-LTS, mixed mode, sharing)
>>> JAVA_OPTS: -d64 -Xmx12G
>>> Loader: tdb2.xloader
>>> Jena: VERSION: 4.4.0
>>> Jena: BUILD_DATE: 2022-01-30T15:09:41Z
>>> ARQ: VERSION: 4.4.0
>>> ARQ: BUILD_DATE: 2022-01-30T15:09:41Z
>>> TDB: VERSION: 4.4.0
>>> TDB: BUILD_DATE: 2022-01-30T15:09:41Z
>>>
>>> Use fuseki tdb2.xloader on file
>>> /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
>>> 17:20:13 INFO Setup:
>>> 17:20:13 INFO Database: /zbw/var/lib/fuseki/databases/temp
>>> 17:20:13 INFO Data: /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
>>> 17:20:13 INFO TMPDIR: /zbw/var/lib/fuseki/databases/temp
>>> 17:20:13 INFO
>>> 17:20:13 INFO Load node table
>>>
>>>
>>> --
>>> Joachim Neubert
>>>
>>> ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg 21
>>> 20354 Hamburg
>>> Phone +49-40-42834-462
>>>
>>>
AW: xloader "Can't find gzip program"
Posted by "Neubert, Joachim" <J....@zbw.eu>.
Hi Andy,
Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster - however, the same error at SPO start.
Please let me know if I can help with tracing/reproducing the error.
Cheers, Joachim
> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne <an...@apache.org>
> Gesendet: Freitag, 11. Februar 2022 21:07
> An: users@jena.apache.org
> Betreff: Re: xloader "Can't find gzip program"
>
> Hi Joachim,
>
> https://issues.apache.org/jira/browse/JENA-2277
> https://issues.apache.org/jira/browse/JENA-2279
>
> There are two fixes for tdb2.xloader which are now in the development
> builds:
>
> https://repository.apache.org/content/groups/snapshots/org/apache/jena/
>
> (these are not official releases and have not been voted on by the PMC)
>
> If you coudl test them and let us know if they work or whether theer are
> further problems, that would be great.
>
> Andy
>
>
> On 11/02/2022 17:53, Neubert, Joachim wrote:
> > I've just started tests with xloader. It aborts with
> >
> > 17:21:56 INFO Data :: Triples = 10,000,000 ; Quads = 0
> > 17:21:57 INFO =-=-=-=-=-=-=-=
> > 17:21:57 INFO
> > 17:21:57 INFO Build SPO
> > 17:21:57 INFO (Very long pause likely at this point)
> > 17:21:58 INFO Index :: Build index SPO
> > java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find
> gzip program
> > at
> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn
> dexX.java:207)
> > at
> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIndexX.ja
> va:121)
> > at
> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.java:1
> 06)
> > at
> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.java:94
> )
> > at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
> > at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
> > at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> > at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> > at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
> > Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
> > at
> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java:67
> )
> > at
> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn
> dexX.java:183)
> > ... 8 more
> >
> > Of course, /usr/bin/gzip is in the path. My configuration is below,
> tdb2.xloader was called with --threads=12.
> >
> > Any idea what could be wrong?
> >
> > Cheers, Joachim
> >
> >
> > Configuration:
> > openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime Environment
> > 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build
> > 11.0.13+8-LTS, mixed mode, sharing)
> > JAVA_OPTS: -d64 -Xmx12G
> > Loader: tdb2.xloader
> > Jena: VERSION: 4.4.0
> > Jena: BUILD_DATE: 2022-01-30T15:09:41Z
> > ARQ: VERSION: 4.4.0
> > ARQ: BUILD_DATE: 2022-01-30T15:09:41Z
> > TDB: VERSION: 4.4.0
> > TDB: BUILD_DATE: 2022-01-30T15:09:41Z
> >
> > Use fuseki tdb2.xloader on file
> > /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> > 17:20:13 INFO Setup:
> > 17:20:13 INFO Database: /zbw/var/lib/fuseki/databases/temp
> > 17:20:13 INFO Data: /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> > 17:20:13 INFO TMPDIR: /zbw/var/lib/fuseki/databases/temp
> > 17:20:13 INFO
> > 17:20:13 INFO Load node table
> >
> >
> > --
> > Joachim Neubert
> >
> > ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg 21
> > 20354 Hamburg
> > Phone +49-40-42834-462
> >
> >
Re: xloader "Can't find gzip program"
Posted by Andy Seaborne <an...@apache.org>.
Hi Joachim,
https://issues.apache.org/jira/browse/JENA-2277
https://issues.apache.org/jira/browse/JENA-2279
There are two fixes for tdb2.xloader which are now in the development
builds:
https://repository.apache.org/content/groups/snapshots/org/apache/jena/
(these are not official releases and have not been voted on by the PMC)
If you coudl test them and let us know if they work or whether theer are
further problems, that would be great.
Andy
On 11/02/2022 17:53, Neubert, Joachim wrote:
> I've just started tests with xloader. It aborts with
>
> 17:21:56 INFO Data :: Triples = 10,000,000 ; Quads = 0
> 17:21:57 INFO =-=-=-=-=-=-=-=
> 17:21:57 INFO
> 17:21:57 INFO Build SPO
> 17:21:57 INFO (Very long pause likely at this point)
> 17:21:58 INFO Index :: Build index SPO
> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find gzip program
> at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIndexX.java:207)
> at org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIndexX.java:121)
> at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.java:106)
> at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.java:94)
> at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
> at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
> at org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java:67)
> at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIndexX.java:183)
> ... 8 more
>
> Of course, /usr/bin/gzip is in the path. My configuration is below, tdb2.xloader was called with --threads=12.
>
> Any idea what could be wrong?
>
> Cheers, Joachim
>
>
> Configuration:
> openjdk version "11.0.13" 2021-10-19 LTS
> OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, sharing)
> JAVA_OPTS: -d64 -Xmx12G
> Loader: tdb2.xloader
> Jena: VERSION: 4.4.0
> Jena: BUILD_DATE: 2022-01-30T15:09:41Z
> ARQ: VERSION: 4.4.0
> ARQ: BUILD_DATE: 2022-01-30T15:09:41Z
> TDB: VERSION: 4.4.0
> TDB: BUILD_DATE: 2022-01-30T15:09:41Z
>
> Use fuseki tdb2.xloader on file /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> 17:20:13 INFO Setup:
> 17:20:13 INFO Database: /zbw/var/lib/fuseki/databases/temp
> 17:20:13 INFO Data: /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> 17:20:13 INFO TMPDIR: /zbw/var/lib/fuseki/databases/temp
> 17:20:13 INFO
> 17:20:13 INFO Load node table
>
>
> --
> Joachim Neubert
>
> ZBW - Leibniz Information Centre for Economics
> Neuer Jungfernstieg 21
> 20354 Hamburg
> Phone +49-40-42834-462
>
>