You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Neubert, Joachim" <J....@zbw.eu> on 2022/02/11 17:53:12 UTC

xloader "Can't find gzip program"

I've just started tests with xloader. It aborts with

17:21:56 INFO  Data            :: Triples = 10,000,000 ; Quads = 0
17:21:57 INFO  =-=-=-=-=-=-=-=
17:21:57 INFO
17:21:57 INFO  Build SPO
17:21:57 INFO  (Very long pause likely at this point)
17:21:58 INFO  Index           :: Build index SPO
java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find gzip program
  at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIndexX.java:207)
  at org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIndexX.java:121)
  at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.java:106)
  at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.java:94)
  at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
  at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
  at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
  at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
  at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
  at org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java:67)
  at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIndexX.java:183)
  ... 8 more

Of course, /usr/bin/gzip is in the path. My configuration is below, tdb2.xloader was called with --threads=12.

Any idea what could be wrong?

Cheers, Joachim


Configuration:
openjdk version "11.0.13" 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, sharing)
JAVA_OPTS: -d64 -Xmx12G
Loader: tdb2.xloader
Jena:       VERSION: 4.4.0
Jena:       BUILD_DATE: 2022-01-30T15:09:41Z
ARQ:        VERSION: 4.4.0
ARQ:        BUILD_DATE: 2022-01-30T15:09:41Z
TDB:        VERSION: 4.4.0
TDB:        BUILD_DATE: 2022-01-30T15:09:41Z

Use fuseki tdb2.xloader on file /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
17:20:13 INFO  Setup:
17:20:13 INFO    Database: /zbw/var/lib/fuseki/databases/temp
17:20:13 INFO    Data:     /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
17:20:13 INFO    TMPDIR:   /zbw/var/lib/fuseki/databases/temp
17:20:13 INFO
17:20:13 INFO  Load node table


--
Joachim Neubert

ZBW - Leibniz Information Centre for Economics
Neuer Jungfernstieg 21
20354 Hamburg
Phone +49-40-42834-462


AW: xloader "Can't find gzip program"

Posted by Andy Seaborne <an...@apache.org>.
Thanks for the details.  Good to add to the collective experience.

One reason to parse the file to /dev/null before trying to load it.

It doesn't look like there is much you can do. Reading the man page for 
bzip2recover, it's going to loose some data and if that is not aligned 
to N-triples, it will break the parser.  Only by finding and fixing up 
the damaged (in the NT sense) block file will it recover most of the data.

     Andy

On 14/02/2022 13:19, Neubert, Joachim wrote:
> The error was in the binary:
> lbzcat: "/zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2": compressed data error: bad block header magic
> 
> That created non-RDF input:
> 
>   [nbt@e6810f891672 ~]$ bzcat /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2 | sed -n '4052914958,4052914960p;4052914961q'
> <http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "\u0646\u062C\u0645 \u0641\u064A \u0643\u0648\u0643\u0628\u0629 \u0627\u0644\u062B\u0648\u0631"@ar .
> 
> bzcat: Compressed file ends unexpectedly;
>          perhaps it is corrupted?  *Possible* reason follows.
> bzcat: Success
>          Input file = /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2, output file = (stdout)
> 
> It is possible that the compressed file(s) have become corrupted.
> You can use the -tvv option to test integrity of such files.
> 
> You can use the `bzip2recover' program to attempt to recover
> data from undamaged sections of corrupted files.
> 
> <http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "star in the constellation Taurus"@en .
> <https://www.wikidata.org/wiki/Special:EntityData/Q85112563> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Dataset> .
> 
> which in turn produced:
> 
> 03:02:18 INFO  Nodes           :: Add: 4,052,000,000 latest-truthy.nt (Batch: 108,189 / Avg: 102,550)
> 03:02:26 ERROR riot            :: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream]
> Exception in thread "AsyncParser" org.apache.jena.riot.RiotException: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream]
>          at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
>          at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
>          at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
>          at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:95)
>          at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61)
>          at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53)
>          at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
>          at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:186)
>          at org.apache.jena.riot.RDFParser.read(RDFParser.java:366)
>          at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:335)
>          at org.apache.jena.riot.RDFParser.parse(RDFParser.java:310)
>          at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:552)
>          at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$0(ProcBuildNodeTableX.java:198)
>          at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>          at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$1(ProcBuildNodeTableX.java:194)
>          at java.base/java.lang.Thread.run(Thread.java:829)
> 
> Cheers, Joachim
> 
>> -----Ursprüngliche Nachricht-----
>> Von: Andy Seaborne <an...@apache.org>
>> Gesendet: Montag, 14. Februar 2022 13:46
>> An: users@jena.apache.org
>> Betreff: Re: AW: AW: AW: AW: xloader "Can't find gzip program"
>>
>>
>>
>> On 14/02/2022 08:01, Neubert, Joachim wrote:
>>> Thanks, Andy, the TDB2 assembler fixed it, and all worked well.
>>>
>>> I've tried to load wikidata-truthy then, but apparently the bzip file
>>> was damaged at line 4052914959 - have to try again
>>
>> How annoying.
>>
>> Is it an RDF syntax error or bad binary or somethign else?
>>
>> --
>>
>> My experience is that gz is faster to load.
>>
>> bz2 emphases compactness over speed.
>>
>>       Andy
>>
>>>
>>> Cheers, Joachim
>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Andy Seaborne <an...@apache.org>
>>>> Gesendet: Samstag, 12. Februar 2022 11:15
>>>> An: users@jena.apache.org
>>>> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"
>>>>
>>>> Hi Joachim,
>>>>
>>>> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
>>>>
>>>> The build setup is for repeatable builds of releases. Any build from
>>>> the X.Y.Z release source, with the same JDK, will generate the byte-wise
>> same jar files.
>>>>
>>>> Each release build fixes the timestamp and uses that, and it gets in
>>>> the POM as property <project.build.outputTimestamp>. It only get
>>>> updated when a release happens otherwise the POM file is going to get
>>>> modified several times a week.
>>>>
>>>> Thankfully, we have --version on most commands as well.
>>>>
>>>> That's timestamps explained.
>>>>
>>>> ----
>>>>
>>>> You seem to have run the TDB2 xloader, then given the text index
>>>> builder a assembler description for TDB1.
>>>>
>>>> Fuseki with --loc determines the database type by looking at the file
>>>> layout, but assemblers don't.
>>>>
>>>> The version output can be changed to say "TDB1" without too much
>>>> disruption. Small tweak that might have helped shown this up earlier.
>>>>
>>>>        Andy
>>>>
>>>> On 11/02/2022 23:06, Neubert, Joachim wrote:
>>>>> Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
>>>>>
>>>>> Now the loading works smoothly:
>>>>>
>>>>> 22:50:10 INFO  Load node table  = 62 seconds
>>>>> 22:50:10 INFO  Load ingest data = 37 seconds
>>>>> 22:50:10 INFO  Build index SPO  = 7 seconds
>>>>> 22:50:10 INFO  Build index POS  = 12 seconds
>>>>> 22:50:10 INFO  Build index OSP  = 9 seconds
>>>>> 22:50:10 INFO  Overall          127 seconds
>>>>> 22:50:10 INFO  Overall          00h 02m 07s
>>>>> 22:50:10 INFO  Triples loaded   = 10000000
>>>>> 22:50:10 INFO  Quads loaded     = 0
>>>>> 22:50:10 INFO  Overall Rate     78740 tuples per second
>>>>
>>>> That's output from tdb2.xloader.
>>>>
>>>> At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
>>>> "tdb2.tdbloader --loader=parallel"
>>>>
>>>>> However, the text indexing crashes, when called like that:
>>>>>
>>>>> java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
>>>>> --desc=/tmp/temp.ttl
>>>>>
>>>>> org.apache.jena.assembler.exceptions.AssemblerException: caught:
>>>> Unable to check TDB lock owner, the lock file contents appear to be
>>>> for a
>>>> TDB2 database.  Please try loading this location as a TDB2 database.
>>>> See https://jena.apache.org/documentation/tdb/faqs.html for more
>>>> information.
>>>>>      doing:
>>>>>        root: file:///tmp/temp.ttl#dataset with type:
>>>>> http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
>>>>> org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
>>>>
>>>> But that is TDB1
>>>>
>>>>>        root: http://localhost/jena_example/#text_dataset with type:
>>>>> http://jena.apache.org/text#TextDataset assembler class: class
>>>>> org.apache.jena.query.text.assembler.TextDatasetAssembler
>>>>>
>>>> ...
>>>>> Caused by: org.apache.jena.tdb.base.file.FileException: Unable to
>>>>> check
>>>> TDB lock owner, the lock file contents appear to be for a TDB2 database.
>>>> Please try loading this location as a TDB2 database. See
>>>> https://jena.apache.org/documentation/tdb/faqs.html for more
>>>> information.
>>>>>            at
>>>>> org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
>>>>> 110)
>>>>
>>>> org.apache.jena.tdb == TDB1
>>>>
>>>>>            at
>>>> org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.jav
>>>> a:139)
>>>>>            at
>>>>
>> org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.jav
>>>> a
>>>> :262)
>>>>>            at
>>>> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
>>>>>            at
>>>> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
>>>>>            at
>>>> org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(Datase
>>>> tGra
>>>> phTransaction.java:72)
>>>>>            at
>>>>> org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
>>>> ...
>>>>
>>>>>            ... 23 more
>>>>> 2022-02-11 22:50:12 ABORTED
>>>>>
>>>>> cat /var/lib/fuseki/databases/temp/tdb.lock
>>>>> 32907
>>>>>
>>>>> Cheers, Joachim

AW: AW: AW: AW: AW: xloader "Can't find gzip program"

Posted by "Neubert, Joachim" <J....@zbw.eu>.
The error was in the binary:
lbzcat: "/zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2": compressed data error: bad block header magic

That created non-RDF input:

 [nbt@e6810f891672 ~]$ bzcat /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2 | sed -n '4052914958,4052914960p;4052914961q'
<http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "\u0646\u062C\u0645 \u0641\u064A \u0643\u0648\u0643\u0628\u0629 \u0627\u0644\u062B\u0648\u0631"@ar .

bzcat: Compressed file ends unexpectedly;
        perhaps it is corrupted?  *Possible* reason follows.
bzcat: Success
        Input file = /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2, output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

<http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "star in the constellation Taurus"@en .
<https://www.wikidata.org/wiki/Special:EntityData/Q85112563> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Dataset> .

which in turn produced:

03:02:18 INFO  Nodes           :: Add: 4,052,000,000 latest-truthy.nt (Batch: 108,189 / Avg: 102,550)
03:02:26 ERROR riot            :: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream]
Exception in thread "AsyncParser" org.apache.jena.riot.RiotException: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream]
        at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
        at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
        at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
        at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:95)
        at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61)
        at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53)
        at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
        at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:186)
        at org.apache.jena.riot.RDFParser.read(RDFParser.java:366)
        at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:335)
        at org.apache.jena.riot.RDFParser.parse(RDFParser.java:310)
        at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:552)
        at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$0(ProcBuildNodeTableX.java:198)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
        at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$1(ProcBuildNodeTableX.java:194)
        at java.base/java.lang.Thread.run(Thread.java:829)

Cheers, Joachim

> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne <an...@apache.org>
> Gesendet: Montag, 14. Februar 2022 13:46
> An: users@jena.apache.org
> Betreff: Re: AW: AW: AW: AW: xloader "Can't find gzip program"
> 
> 
> 
> On 14/02/2022 08:01, Neubert, Joachim wrote:
> > Thanks, Andy, the TDB2 assembler fixed it, and all worked well.
> >
> > I've tried to load wikidata-truthy then, but apparently the bzip file
> > was damaged at line 4052914959 - have to try again
> 
> How annoying.
> 
> Is it an RDF syntax error or bad binary or somethign else?
> 
> --
> 
> My experience is that gz is faster to load.
> 
> bz2 emphases compactness over speed.
> 
>      Andy
> 
> >
> > Cheers, Joachim
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Andy Seaborne <an...@apache.org>
> >> Gesendet: Samstag, 12. Februar 2022 11:15
> >> An: users@jena.apache.org
> >> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"
> >>
> >> Hi Joachim,
> >>
> >> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
> >>
> >> The build setup is for repeatable builds of releases. Any build from
> >> the X.Y.Z release source, with the same JDK, will generate the byte-wise
> same jar files.
> >>
> >> Each release build fixes the timestamp and uses that, and it gets in
> >> the POM as property <project.build.outputTimestamp>. It only get
> >> updated when a release happens otherwise the POM file is going to get
> >> modified several times a week.
> >>
> >> Thankfully, we have --version on most commands as well.
> >>
> >> That's timestamps explained.
> >>
> >> ----
> >>
> >> You seem to have run the TDB2 xloader, then given the text index
> >> builder a assembler description for TDB1.
> >>
> >> Fuseki with --loc determines the database type by looking at the file
> >> layout, but assemblers don't.
> >>
> >> The version output can be changed to say "TDB1" without too much
> >> disruption. Small tweak that might have helped shown this up earlier.
> >>
> >>       Andy
> >>
> >> On 11/02/2022 23:06, Neubert, Joachim wrote:
> >>> Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
> >>>
> >>> Now the loading works smoothly:
> >>>
> >>> 22:50:10 INFO  Load node table  = 62 seconds
> >>> 22:50:10 INFO  Load ingest data = 37 seconds
> >>> 22:50:10 INFO  Build index SPO  = 7 seconds
> >>> 22:50:10 INFO  Build index POS  = 12 seconds
> >>> 22:50:10 INFO  Build index OSP  = 9 seconds
> >>> 22:50:10 INFO  Overall          127 seconds
> >>> 22:50:10 INFO  Overall          00h 02m 07s
> >>> 22:50:10 INFO  Triples loaded   = 10000000
> >>> 22:50:10 INFO  Quads loaded     = 0
> >>> 22:50:10 INFO  Overall Rate     78740 tuples per second
> >>
> >> That's output from tdb2.xloader.
> >>
> >> At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
> >> "tdb2.tdbloader --loader=parallel"
> >>
> >>> However, the text indexing crashes, when called like that:
> >>>
> >>> java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
> >>> --desc=/tmp/temp.ttl
> >>>
> >>> org.apache.jena.assembler.exceptions.AssemblerException: caught:
> >> Unable to check TDB lock owner, the lock file contents appear to be
> >> for a
> >> TDB2 database.  Please try loading this location as a TDB2 database.
> >> See https://jena.apache.org/documentation/tdb/faqs.html for more
> >> information.
> >>>     doing:
> >>>       root: file:///tmp/temp.ttl#dataset with type:
> >>> http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
> >>> org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
> >>
> >> But that is TDB1
> >>
> >>>       root: http://localhost/jena_example/#text_dataset with type:
> >>> http://jena.apache.org/text#TextDataset assembler class: class
> >>> org.apache.jena.query.text.assembler.TextDatasetAssembler
> >>>
> >> ...
> >>> Caused by: org.apache.jena.tdb.base.file.FileException: Unable to
> >>> check
> >> TDB lock owner, the lock file contents appear to be for a TDB2 database.
> >> Please try loading this location as a TDB2 database. See
> >> https://jena.apache.org/documentation/tdb/faqs.html for more
> >> information.
> >>>           at
> >>> org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
> >>> 110)
> >>
> >> org.apache.jena.tdb == TDB1
> >>
> >>>           at
> >> org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.jav
> >> a:139)
> >>>           at
> >>
> org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.jav
> >> a
> >> :262)
> >>>           at
> >> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
> >>>           at
> >> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
> >>>           at
> >> org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(Datase
> >> tGra
> >> phTransaction.java:72)
> >>>           at
> >>> org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
> >> ...
> >>
> >>>           ... 23 more
> >>> 2022-02-11 22:50:12 ABORTED
> >>>
> >>> cat /var/lib/fuseki/databases/temp/tdb.lock
> >>> 32907
> >>>
> >>> Cheers, Joachim

Re: AW: AW: AW: AW: xloader "Can't find gzip program"

Posted by Andy Seaborne <an...@apache.org>.

On 14/02/2022 08:01, Neubert, Joachim wrote:
> Thanks, Andy, the TDB2 assembler fixed it, and all worked well.
> 
> I've tried to load wikidata-truthy then, but apparently the bzip file was damaged at line 4052914959 - have to try again

How annoying.

Is it an RDF syntax error or bad binary or somethign else?

--

My experience is that gz is faster to load.

bz2 emphases compactness over speed.

     Andy

> 
> Cheers, Joachim
> 
>> -----Ursprüngliche Nachricht-----
>> Von: Andy Seaborne <an...@apache.org>
>> Gesendet: Samstag, 12. Februar 2022 11:15
>> An: users@jena.apache.org
>> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"
>>
>> Hi Joachim,
>>
>> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
>>
>> The build setup is for repeatable builds of releases. Any build from the X.Y.Z
>> release source, with the same JDK, will generate the byte-wise same jar files.
>>
>> Each release build fixes the timestamp and uses that, and it gets in the POM
>> as property <project.build.outputTimestamp>. It only get updated when a
>> release happens otherwise the POM file is going to get modified several
>> times a week.
>>
>> Thankfully, we have --version on most commands as well.
>>
>> That's timestamps explained.
>>
>> ----
>>
>> You seem to have run the TDB2 xloader, then given the text index builder a
>> assembler description for TDB1.
>>
>> Fuseki with --loc determines the database type by looking at the file layout,
>> but assemblers don't.
>>
>> The version output can be changed to say "TDB1" without too much
>> disruption. Small tweak that might have helped shown this up earlier.
>>
>>       Andy
>>
>> On 11/02/2022 23:06, Neubert, Joachim wrote:
>>> Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
>>>
>>> Now the loading works smoothly:
>>>
>>> 22:50:10 INFO  Load node table  = 62 seconds
>>> 22:50:10 INFO  Load ingest data = 37 seconds
>>> 22:50:10 INFO  Build index SPO  = 7 seconds
>>> 22:50:10 INFO  Build index POS  = 12 seconds
>>> 22:50:10 INFO  Build index OSP  = 9 seconds
>>> 22:50:10 INFO  Overall          127 seconds
>>> 22:50:10 INFO  Overall          00h 02m 07s
>>> 22:50:10 INFO  Triples loaded   = 10000000
>>> 22:50:10 INFO  Quads loaded     = 0
>>> 22:50:10 INFO  Overall Rate     78740 tuples per second
>>
>> That's output from tdb2.xloader.
>>
>> At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
>> "tdb2.tdbloader --loader=parallel"
>>
>>> However, the text indexing crashes, when called like that:
>>>
>>> java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
>>> --desc=/tmp/temp.ttl
>>>
>>> org.apache.jena.assembler.exceptions.AssemblerException: caught:
>> Unable to check TDB lock owner, the lock file contents appear to be for a
>> TDB2 database.  Please try loading this location as a TDB2 database. See
>> https://jena.apache.org/documentation/tdb/faqs.html for more
>> information.
>>>     doing:
>>>       root: file:///tmp/temp.ttl#dataset with type:
>>> http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
>>> org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
>>
>> But that is TDB1
>>
>>>       root: http://localhost/jena_example/#text_dataset with type:
>>> http://jena.apache.org/text#TextDataset assembler class: class
>>> org.apache.jena.query.text.assembler.TextDatasetAssembler
>>>
>> ...
>>> Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check
>> TDB lock owner, the lock file contents appear to be for a TDB2 database.
>> Please try loading this location as a TDB2 database. See
>> https://jena.apache.org/documentation/tdb/faqs.html for more
>> information.
>>>           at
>>> org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
>>> 110)
>>
>> org.apache.jena.tdb == TDB1
>>
>>>           at
>> org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)
>>>           at
>> org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java
>> :262)
>>>           at
>> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
>>>           at
>> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
>>>           at
>> org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGra
>> phTransaction.java:72)
>>>           at
>>> org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
>> ...
>>
>>>           ... 23 more
>>> 2022-02-11 22:50:12 ABORTED
>>>
>>> cat /var/lib/fuseki/databases/temp/tdb.lock
>>> 32907
>>>
>>> Cheers, Joachim

AW: AW: AW: AW: xloader "Can't find gzip program"

Posted by "Neubert, Joachim" <J....@zbw.eu>.
Thanks, Andy, the TDB2 assembler fixed it, and all worked well.

I've tried to load wikidata-truthy then, but apparently the bzip file was damaged at line 4052914959 - have to try again

Cheers, Joachim

> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne <an...@apache.org>
> Gesendet: Samstag, 12. Februar 2022 11:15
> An: users@jena.apache.org
> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"
> 
> Hi Joachim,
> 
> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
> 
> The build setup is for repeatable builds of releases. Any build from the X.Y.Z
> release source, with the same JDK, will generate the byte-wise same jar files.
> 
> Each release build fixes the timestamp and uses that, and it gets in the POM
> as property <project.build.outputTimestamp>. It only get updated when a
> release happens otherwise the POM file is going to get modified several
> times a week.
> 
> Thankfully, we have --version on most commands as well.
> 
> That's timestamps explained.
> 
> ----
> 
> You seem to have run the TDB2 xloader, then given the text index builder a
> assembler description for TDB1.
> 
> Fuseki with --loc determines the database type by looking at the file layout,
> but assemblers don't.
> 
> The version output can be changed to say "TDB1" without too much
> disruption. Small tweak that might have helped shown this up earlier.
> 
>      Andy
> 
> On 11/02/2022 23:06, Neubert, Joachim wrote:
> > Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
> >
> > Now the loading works smoothly:
> >
> > 22:50:10 INFO  Load node table  = 62 seconds
> > 22:50:10 INFO  Load ingest data = 37 seconds
> > 22:50:10 INFO  Build index SPO  = 7 seconds
> > 22:50:10 INFO  Build index POS  = 12 seconds
> > 22:50:10 INFO  Build index OSP  = 9 seconds
> > 22:50:10 INFO  Overall          127 seconds
> > 22:50:10 INFO  Overall          00h 02m 07s
> > 22:50:10 INFO  Triples loaded   = 10000000
> > 22:50:10 INFO  Quads loaded     = 0
> > 22:50:10 INFO  Overall Rate     78740 tuples per second
> 
> That's output from tdb2.xloader.
> 
> At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
> "tdb2.tdbloader --loader=parallel"
> 
> > However, the text indexing crashes, when called like that:
> >
> > java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
> > --desc=/tmp/temp.ttl
> >
> > org.apache.jena.assembler.exceptions.AssemblerException: caught:
> Unable to check TDB lock owner, the lock file contents appear to be for a
> TDB2 database.  Please try loading this location as a TDB2 database. See
> https://jena.apache.org/documentation/tdb/faqs.html for more
> information.
> >    doing:
> >      root: file:///tmp/temp.ttl#dataset with type:
> > http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
> > org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
> 
> But that is TDB1
> 
> >      root: http://localhost/jena_example/#text_dataset with type:
> > http://jena.apache.org/text#TextDataset assembler class: class
> > org.apache.jena.query.text.assembler.TextDatasetAssembler
> >
> ...
> > Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check
> TDB lock owner, the lock file contents appear to be for a TDB2 database.
> Please try loading this location as a TDB2 database. See
> https://jena.apache.org/documentation/tdb/faqs.html for more
> information.
> >          at
> > org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
> > 110)
> 
> org.apache.jena.tdb == TDB1
> 
> >          at
> org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)
> >          at
> org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java
> :262)
> >          at
> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
> >          at
> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
> >          at
> org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGra
> phTransaction.java:72)
> >          at
> > org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
> ...
> 
> >          ... 23 more
> > 2022-02-11 22:50:12 ABORTED
> >
> > cat /var/lib/fuseki/databases/temp/tdb.lock
> > 32907
> >
> > Cheers, Joachim

Re: AW: AW: AW: xloader "Can't find gzip program"

Posted by Andy Seaborne <an...@apache.org>.
Hi Joachim,

Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".

The build setup is for repeatable builds of releases. Any build from the 
X.Y.Z release source, with the same JDK, will generate the byte-wise 
same jar files.

Each release build fixes the timestamp and uses that, and it gets in the 
POM as property <project.build.outputTimestamp>. It only get updated 
when a release happens otherwise the POM file is going to get modified 
several times a week.

Thankfully, we have --version on most commands as well.

That's timestamps explained.

----

You seem to have run the TDB2 xloader, then given the text index builder 
a assembler description for TDB1.

Fuseki with --loc determines the database type by looking at the file 
layout, but assemblers don't.

The version output can be changed to say "TDB1" without too much 
disruption. Small tweak that might have helped shown this up earlier.

     Andy

On 11/02/2022 23:06, Neubert, Joachim wrote:
> Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
> 
> Now the loading works smoothly:
> 
> 22:50:10 INFO  Load node table  = 62 seconds
> 22:50:10 INFO  Load ingest data = 37 seconds
> 22:50:10 INFO  Build index SPO  = 7 seconds
> 22:50:10 INFO  Build index POS  = 12 seconds
> 22:50:10 INFO  Build index OSP  = 9 seconds
> 22:50:10 INFO  Overall          127 seconds
> 22:50:10 INFO  Overall          00h 02m 07s
> 22:50:10 INFO  Triples loaded   = 10000000
> 22:50:10 INFO  Quads loaded     = 0
> 22:50:10 INFO  Overall Rate     78740 tuples per second

That's output from tdb2.xloader.

At 10m up to 500m (laptop) or maybe 1B (server), triples, also try 
"tdb2.tdbloader --loader=parallel"

> However, the text indexing crashes, when called like that:
> 
> java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug --desc=/tmp/temp.ttl
> 
> org.apache.jena.assembler.exceptions.AssemblerException: caught: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database.  Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information.
>    doing:
>      root: file:///tmp/temp.ttl#dataset with type: http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class org.apache.jena.tdb.assembler.DatasetAssemblerTDB1

But that is TDB1

>      root: http://localhost/jena_example/#text_dataset with type: http://jena.apache.org/text#TextDataset assembler class: class org.apache.jena.query.text.assembler.TextDatasetAssembler
> 
...
> Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database.  Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information.
>          at org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:110)

org.apache.jena.tdb == TDB1

>          at org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)
>          at org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java:262)
>          at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
>          at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
>          at org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGraphTransaction.java:72)
>          at org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
...

>          ... 23 more
> 2022-02-11 22:50:12 ABORTED
> 
> cat /var/lib/fuseki/databases/temp/tdb.lock
> 32907
> 
> Cheers, Joachim

AW: AW: AW: xloader "Can't find gzip program"

Posted by "Neubert, Joachim" <J....@zbw.eu>.
Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.

Now the loading works smoothly:

22:50:10 INFO  Load node table  = 62 seconds
22:50:10 INFO  Load ingest data = 37 seconds
22:50:10 INFO  Build index SPO  = 7 seconds
22:50:10 INFO  Build index POS  = 12 seconds
22:50:10 INFO  Build index OSP  = 9 seconds
22:50:10 INFO  Overall          127 seconds
22:50:10 INFO  Overall          00h 02m 07s
22:50:10 INFO  Triples loaded   = 10000000
22:50:10 INFO  Quads loaded     = 0
22:50:10 INFO  Overall Rate     78740 tuples per second

However, the text indexing crashes, when called like that:

java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug --desc=/tmp/temp.ttl

org.apache.jena.assembler.exceptions.AssemblerException: caught: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database.  Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information.
  doing:
    root: file:///tmp/temp.ttl#dataset with type: http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
    root: http://localhost/jena_example/#text_dataset with type: http://jena.apache.org/text#TextDataset assembler class: class org.apache.jena.query.text.assembler.TextDatasetAssembler

        at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:165)
        at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.open(AssemblerGroup.java:144)
        at org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.open(AssemblerGroup.java:93)
        at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:39)
        at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:35)
        at org.apache.jena.query.text.assembler.TextDatasetAssembler.open(TextDatasetAssembler.java:67)
        at org.apache.jena.query.text.assembler.TextDatasetAssembler.open(TextDatasetAssembler.java:42)
        at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:157)
        at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.open(AssemblerGroup.java:144)
        at org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.open(AssemblerGroup.java:93)
        at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:39)
        at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:35)
        at org.apache.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:144)
        at org.apache.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:132)
        at org.apache.jena.query.text.TextDatasetFactory.create(TextDatasetFactory.java:38)
        at org.apache.jena.query.text.cmd.textindexer.processModulesAndArgs(textindexer.java:90)
        at org.apache.jena.cmd.CmdArgModule.process(CmdArgModule.java:39)
        at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:86)
        at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56)
        at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43)
        at org.apache.jena.query.text.cmd.textindexer.main(textindexer.java:52)
        at org.apache.jena.query.text.cmd.InitTextCmds.lambda$cmds$1(InitTextCmds.java:26)
        at org.apache.jena.cmd.Cmds.exec(Cmds.java:65)
        at jena.textindexer.main(textindexer.java:25)
Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database.  Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information.
        at org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:110)
        at org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)
        at org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java:262)
        at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
        at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
        at org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGraphTransaction.java:72)
        at org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
        at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
        at org.apache.jena.tdb.sys.TDBMaker._create(TDBMaker.java:100)
        at org.apache.jena.tdb.sys.TDBMaker.createDatasetGraphTransaction(TDBMaker.java:43)
        at org.apache.jena.tdb.TDBFactory._createDatasetGraph(TDBFactory.java:93)
        at org.apache.jena.tdb.TDBFactory.createDatasetGraph(TDBFactory.java:71)
        at org.apache.jena.tdb.assembler.DatasetAssemblerTDB1.make(DatasetAssemblerTDB1.java:55)
        at org.apache.jena.tdb.assembler.DatasetAssemblerTDB1.createDataset(DatasetAssemblerTDB1.java:46)
        at org.apache.jena.sparql.core.assembler.DatasetAssembler.open(DatasetAssembler.java:40)
        at org.apache.jena.sparql.core.assembler.DatasetAssembler.open(DatasetAssembler.java:33)
        at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:157)
        ... 23 more
2022-02-11 22:50:12 ABORTED

cat /var/lib/fuseki/databases/temp/tdb.lock
32907

Cheers, Joachim

> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne <an...@apache.org>
> Gesendet: Freitag, 11. Februar 2022 23:06
> An: users@jena.apache.org
> Betreff: Re: AW: AW: xloader "Can't find gzip program"
> 
> 
> 
> On 11/02/2022 21:38, Neubert, Joachim wrote:
> > Strange - I should have the same version:
> >
> > sudo tar xzvf
> > /usr/local/src/apache-jena-fuseki-4.5.0-20220209.180144-12.tar.gz
> 
> Different jar file : apache-jena-4.5.0-20220209.180144-12 (no Fuseki) but
> weird anyway.
> 
> wget
> https://repository.apache.org/content/groups/snapshots/org/apache/jena/
> apache-jena/4.5.0-SNAPSHOT/apache-jena-4.5.0-20220209.180144-12.zip
> 
> then the zip file is:
> 
> 27372309 Feb  9 18:26 apache-jena-4.5.0-20220209.180144-12.zip
> 
> 
> apache-jena-4.5.0-SNAPSHOT/bin/tdb2.tdbloader --version
> 
> Jena:       VERSION: 4.5.0-SNAPSHOT
> Jena:       BUILD_DATE: 2022-02-09T18:01:44Z
> ARQ:        VERSION: 4.5.0-SNAPSHOT
> ARQ:        BUILD_DATE: 2022-02-09T18:01:44Z
> TDB2:       VERSION: 4.5.0-SNAPSHOT
> TDB2:       BUILD_DATE: 2022-02-09T18:01:44Z
> 
> yet the TDB2 jar is dated 30th Jan, as are the files inside it -- can't explain that.
> 
> 294846 Jan 30 15:03
> apache-jena-4.5.0-SNAPSHOT/lib/jena-tdb2-4.5.0-SNAPSHOT.jar
> 
> The tdb2.xloader script is 10485 bytes and has
> 
> SORT_THREADS="2"
> 
> in it.  Is that what your copy of the script have in it?
> 
> I'll clear the Jenkins workspace and schedule a new build.
> 
> 	Andy
> 
> >
> > but the jarfile date is of Jan 30:
> >
> > ll apache-jena-fuseki-4.5.0-SNAPSHOT/
> > total 35868
> > -rw-r--r-- 1 root root    36975 Jan 30 15:02 LICENSE
> > -rw-r--r-- 1 root root     8914 Jan 30 15:02 NOTICE
> > -rw-r--r-- 1 root root     1151 Jan 30 15:02 README
> > drwxr-xr-x 2 root root      179 Feb 11 20:47 bin
> > -rwxr-xr-x 1 root root    12339 Jan 30 15:02 fuseki
> > -rwxr-xr-x 1 root root     1241 Jan 30 15:02 fuseki-backup
> > -rwxr-xr-x 1 root root     3370 Jan 30 15:02 fuseki-server
> > -rw-r--r-- 1 root root     1264 Jan 30 15:02 fuseki-server.bat
> > -rw-r--r-- 1 root root 36631864 Jan 30 15:02 fuseki-server.jar
> > -rw-r--r-- 1 root root     2217 Jan 30 15:02 fuseki.service
> > -rw-r--r-- 1 root root     2124 Jan 30 15:02 log4j2.properties
> > drwxr-xr-x 4 root root      121 Jan 30 15:02 webapp
> >
> > Cheers, Joachim
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Andy Seaborne <an...@apache.org>
> >> Gesendet: Freitag, 11. Februar 2022 22:30
> >> An: users@jena.apache.org
> >> Betreff: Re: AW: xloader "Can't find gzip program"
> >>
> >> Works for me - make sure it is the latest dev build (the one down the
> >> bottom)
> >>
> >> I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09)
> >>
> >> and loaded a few millions triples with no problems.
> >>
> >> rm -rf DB2
> >> apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2
> >> ~/Datasets/BSBM/bsbm-5m.nt.gz
> >>
> >>       Andy
> >>
> >> On 11/02/2022 21:20, Neubert, Joachim wrote:
> >>> Hi Andy,
> >>>
> >>> Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster
> >>> -
> >> however, the same error at SPO start.
> >>>
> >>> Please let me know if I can help with tracing/reproducing the error.
> >>>
> >>> Cheers, Joachim
> >>>
> >>>> -----Ursprüngliche Nachricht-----
> >>>> Von: Andy Seaborne <an...@apache.org>
> >>>> Gesendet: Freitag, 11. Februar 2022 21:07
> >>>> An: users@jena.apache.org
> >>>> Betreff: Re: xloader "Can't find gzip program"
> >>>>
> >>>> Hi Joachim,
> >>>>
> >>>> https://issues.apache.org/jira/browse/JENA-2277
> >>>> https://issues.apache.org/jira/browse/JENA-2279
> >>>>
> >>>> There are two fixes for tdb2.xloader which are now in the
> >>>> development
> >>>> builds:
> >>>>
> >>>> https://repository.apache.org/content/groups/snapshots/org/apache/j
> >>>> en
> >>>> a/
> >>>>
> >>>> (these are not official releases and have not been voted on by the
> >>>> PMC)
> >>>>
> >>>> If you coudl test them and let us know if they work or whether
> >>>> theer are further problems, that would be great.
> >>>>
> >>>>        Andy
> >>>>
> >>>>
> >>>> On 11/02/2022 17:53, Neubert, Joachim wrote:
> >>>>> I've just started tests with xloader. It aborts with
> >>>>>
> >>>>> 17:21:56 INFO  Data            :: Triples = 10,000,000 ; Quads = 0
> >>>>> 17:21:57 INFO  =-=-=-=-=-=-=-=
> >>>>> 17:21:57 INFO
> >>>>> 17:21:57 INFO  Build SPO
> >>>>> 17:21:57 INFO  (Very long pause likely at this point)
> >>>>> 17:21:58 INFO  Index           :: Build index SPO
> >>>>> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException:
> >>>>> Can't find
> >>>> gzip program
> >>>>>      at
> >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcB
> >>>> ui
> >>>> ldIn
> >>>> dexX.java:207)
> >>>>>      at
> >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIn
> >>>> de
> >>>> xX.ja
> >>>> va:121)
> >>>>>      at
> >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.
> >>>> ja
> >>>> va:1
> >>>> 06)
> >>>>>      at
> >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.j
> >>>> av
> >>>> a:94
> >>>> )
> >>>>>      at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
> >>>>>      at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
> >>>>>      at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> >>>>>      at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> >>>>>      at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
> >>>>> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip
> program
> >>>>>      at
> >>>> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.ja
> >>>> va
> >>>> :67
> >>>> )
> >>>>>      at
> >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcB
> >>>> ui
> >>>> ldIn
> >>>> dexX.java:183)
> >>>>>      ... 8 more
> >>>>>
> >>>>> Of course, /usr/bin/gzip is in the path. My configuration is
> >>>>> below,
> >>>> tdb2.xloader was called with --threads=12.
> >>>>>
> >>>>> Any idea what could be wrong?
> >>>>>
> >>>>> Cheers, Joachim
> >>>>>
> >>>>>
> >>>>> Configuration:
> >>>>> openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime
> >> Environment
> >>>>> 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build
> >>>>> 11.0.13+8-LTS, mixed mode, sharing)
> >>>>> JAVA_OPTS: -d64 -Xmx12G
> >>>>> Loader: tdb2.xloader
> >>>>> Jena:       VERSION: 4.4.0
> >>>>> Jena:       BUILD_DATE: 2022-01-30T15:09:41Z
> >>>>> ARQ:        VERSION: 4.4.0
> >>>>> ARQ:        BUILD_DATE: 2022-01-30T15:09:41Z
> >>>>> TDB:        VERSION: 4.4.0
> >>>>> TDB:        BUILD_DATE: 2022-01-30T15:09:41Z
> >>>>>
> >>>>> Use fuseki tdb2.xloader on file
> >>>>> /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> >>>>> 17:20:13 INFO  Setup:
> >>>>> 17:20:13 INFO    Database: /zbw/var/lib/fuseki/databases/temp
> >>>>> 17:20:13 INFO    Data:     /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> >>>>> 17:20:13 INFO    TMPDIR:   /zbw/var/lib/fuseki/databases/temp
> >>>>> 17:20:13 INFO
> >>>>> 17:20:13 INFO  Load node table
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Joachim Neubert
> >>>>>
> >>>>> ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg
> >>>>> 21
> >>>>> 20354 Hamburg
> >>>>> Phone +49-40-42834-462
> >>>>>
> >>>>>

Re: AW: AW: xloader "Can't find gzip program"

Posted by Andy Seaborne <an...@apache.org>.

On 11/02/2022 21:38, Neubert, Joachim wrote:
> Strange - I should have the same version:
> 
> sudo tar xzvf /usr/local/src/apache-jena-fuseki-4.5.0-20220209.180144-12.tar.gz

Different jar file : apache-jena-4.5.0-20220209.180144-12 (no Fuseki)
but weird anyway.

wget 
https://repository.apache.org/content/groups/snapshots/org/apache/jena/apache-jena/4.5.0-SNAPSHOT/apache-jena-4.5.0-20220209.180144-12.zip

then the zip file is:

27372309 Feb  9 18:26 apache-jena-4.5.0-20220209.180144-12.zip


apache-jena-4.5.0-SNAPSHOT/bin/tdb2.tdbloader --version

Jena:       VERSION: 4.5.0-SNAPSHOT
Jena:       BUILD_DATE: 2022-02-09T18:01:44Z
ARQ:        VERSION: 4.5.0-SNAPSHOT
ARQ:        BUILD_DATE: 2022-02-09T18:01:44Z
TDB2:       VERSION: 4.5.0-SNAPSHOT
TDB2:       BUILD_DATE: 2022-02-09T18:01:44Z

yet the TDB2 jar is dated 30th Jan, as are the files inside it -- can't 
explain that.

294846 Jan 30 15:03 
apache-jena-4.5.0-SNAPSHOT/lib/jena-tdb2-4.5.0-SNAPSHOT.jar

The tdb2.xloader script is 10485 bytes and has

SORT_THREADS="2"

in it.  Is that what your copy of the script have in it?

I'll clear the Jenkins workspace and schedule a new build.

	Andy

> 
> but the jarfile date is of Jan 30:
> 
> ll apache-jena-fuseki-4.5.0-SNAPSHOT/
> total 35868
> -rw-r--r-- 1 root root    36975 Jan 30 15:02 LICENSE
> -rw-r--r-- 1 root root     8914 Jan 30 15:02 NOTICE
> -rw-r--r-- 1 root root     1151 Jan 30 15:02 README
> drwxr-xr-x 2 root root      179 Feb 11 20:47 bin
> -rwxr-xr-x 1 root root    12339 Jan 30 15:02 fuseki
> -rwxr-xr-x 1 root root     1241 Jan 30 15:02 fuseki-backup
> -rwxr-xr-x 1 root root     3370 Jan 30 15:02 fuseki-server
> -rw-r--r-- 1 root root     1264 Jan 30 15:02 fuseki-server.bat
> -rw-r--r-- 1 root root 36631864 Jan 30 15:02 fuseki-server.jar
> -rw-r--r-- 1 root root     2217 Jan 30 15:02 fuseki.service
> -rw-r--r-- 1 root root     2124 Jan 30 15:02 log4j2.properties
> drwxr-xr-x 4 root root      121 Jan 30 15:02 webapp
> 
> Cheers, Joachim
> 
>> -----Ursprüngliche Nachricht-----
>> Von: Andy Seaborne <an...@apache.org>
>> Gesendet: Freitag, 11. Februar 2022 22:30
>> An: users@jena.apache.org
>> Betreff: Re: AW: xloader "Can't find gzip program"
>>
>> Works for me - make sure it is the latest dev build (the one down the
>> bottom)
>>
>> I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09)
>>
>> and loaded a few millions triples with no problems.
>>
>> rm -rf DB2
>> apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2
>> ~/Datasets/BSBM/bsbm-5m.nt.gz
>>
>>       Andy
>>
>> On 11/02/2022 21:20, Neubert, Joachim wrote:
>>> Hi Andy,
>>>
>>> Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster -
>> however, the same error at SPO start.
>>>
>>> Please let me know if I can help with tracing/reproducing the error.
>>>
>>> Cheers, Joachim
>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Andy Seaborne <an...@apache.org>
>>>> Gesendet: Freitag, 11. Februar 2022 21:07
>>>> An: users@jena.apache.org
>>>> Betreff: Re: xloader "Can't find gzip program"
>>>>
>>>> Hi Joachim,
>>>>
>>>> https://issues.apache.org/jira/browse/JENA-2277
>>>> https://issues.apache.org/jira/browse/JENA-2279
>>>>
>>>> There are two fixes for tdb2.xloader which are now in the development
>>>> builds:
>>>>
>>>> https://repository.apache.org/content/groups/snapshots/org/apache/jen
>>>> a/
>>>>
>>>> (these are not official releases and have not been voted on by the
>>>> PMC)
>>>>
>>>> If you coudl test them and let us know if they work or whether theer
>>>> are further problems, that would be great.
>>>>
>>>>        Andy
>>>>
>>>>
>>>> On 11/02/2022 17:53, Neubert, Joachim wrote:
>>>>> I've just started tests with xloader. It aborts with
>>>>>
>>>>> 17:21:56 INFO  Data            :: Triples = 10,000,000 ; Quads = 0
>>>>> 17:21:57 INFO  =-=-=-=-=-=-=-=
>>>>> 17:21:57 INFO
>>>>> 17:21:57 INFO  Build SPO
>>>>> 17:21:57 INFO  (Very long pause likely at this point)
>>>>> 17:21:58 INFO  Index           :: Build index SPO
>>>>> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't
>>>>> find
>>>> gzip program
>>>>>      at
>>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui
>>>> ldIn
>>>> dexX.java:207)
>>>>>      at
>>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildInde
>>>> xX.ja
>>>> va:121)
>>>>>      at
>>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.ja
>>>> va:1
>>>> 06)
>>>>>      at
>>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.jav
>>>> a:94
>>>> )
>>>>>      at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
>>>>>      at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
>>>>>      at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>      at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>      at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
>>>>> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
>>>>>      at
>>>> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java
>>>> :67
>>>> )
>>>>>      at
>>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui
>>>> ldIn
>>>> dexX.java:183)
>>>>>      ... 8 more
>>>>>
>>>>> Of course, /usr/bin/gzip is in the path. My configuration is below,
>>>> tdb2.xloader was called with --threads=12.
>>>>>
>>>>> Any idea what could be wrong?
>>>>>
>>>>> Cheers, Joachim
>>>>>
>>>>>
>>>>> Configuration:
>>>>> openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime
>> Environment
>>>>> 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build
>>>>> 11.0.13+8-LTS, mixed mode, sharing)
>>>>> JAVA_OPTS: -d64 -Xmx12G
>>>>> Loader: tdb2.xloader
>>>>> Jena:       VERSION: 4.4.0
>>>>> Jena:       BUILD_DATE: 2022-01-30T15:09:41Z
>>>>> ARQ:        VERSION: 4.4.0
>>>>> ARQ:        BUILD_DATE: 2022-01-30T15:09:41Z
>>>>> TDB:        VERSION: 4.4.0
>>>>> TDB:        BUILD_DATE: 2022-01-30T15:09:41Z
>>>>>
>>>>> Use fuseki tdb2.xloader on file
>>>>> /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
>>>>> 17:20:13 INFO  Setup:
>>>>> 17:20:13 INFO    Database: /zbw/var/lib/fuseki/databases/temp
>>>>> 17:20:13 INFO    Data:     /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
>>>>> 17:20:13 INFO    TMPDIR:   /zbw/var/lib/fuseki/databases/temp
>>>>> 17:20:13 INFO
>>>>> 17:20:13 INFO  Load node table
>>>>>
>>>>>
>>>>> --
>>>>> Joachim Neubert
>>>>>
>>>>> ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg
>>>>> 21
>>>>> 20354 Hamburg
>>>>> Phone +49-40-42834-462
>>>>>
>>>>>

AW: AW: xloader "Can't find gzip program"

Posted by "Neubert, Joachim" <J....@zbw.eu>.
Strange - I should have the same version:

sudo tar xzvf /usr/local/src/apache-jena-fuseki-4.5.0-20220209.180144-12.tar.gz

but the jarfile date is of Jan 30:

ll apache-jena-fuseki-4.5.0-SNAPSHOT/
total 35868
-rw-r--r-- 1 root root    36975 Jan 30 15:02 LICENSE
-rw-r--r-- 1 root root     8914 Jan 30 15:02 NOTICE
-rw-r--r-- 1 root root     1151 Jan 30 15:02 README
drwxr-xr-x 2 root root      179 Feb 11 20:47 bin
-rwxr-xr-x 1 root root    12339 Jan 30 15:02 fuseki
-rwxr-xr-x 1 root root     1241 Jan 30 15:02 fuseki-backup
-rwxr-xr-x 1 root root     3370 Jan 30 15:02 fuseki-server
-rw-r--r-- 1 root root     1264 Jan 30 15:02 fuseki-server.bat
-rw-r--r-- 1 root root 36631864 Jan 30 15:02 fuseki-server.jar
-rw-r--r-- 1 root root     2217 Jan 30 15:02 fuseki.service
-rw-r--r-- 1 root root     2124 Jan 30 15:02 log4j2.properties
drwxr-xr-x 4 root root      121 Jan 30 15:02 webapp

Cheers, Joachim

> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne <an...@apache.org>
> Gesendet: Freitag, 11. Februar 2022 22:30
> An: users@jena.apache.org
> Betreff: Re: AW: xloader "Can't find gzip program"
> 
> Works for me - make sure it is the latest dev build (the one down the
> bottom)
> 
> I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09)
> 
> and loaded a few millions triples with no problems.
> 
> rm -rf DB2
> apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2
> ~/Datasets/BSBM/bsbm-5m.nt.gz
> 
>      Andy
> 
> On 11/02/2022 21:20, Neubert, Joachim wrote:
> > Hi Andy,
> >
> > Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster -
> however, the same error at SPO start.
> >
> > Please let me know if I can help with tracing/reproducing the error.
> >
> > Cheers, Joachim
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Andy Seaborne <an...@apache.org>
> >> Gesendet: Freitag, 11. Februar 2022 21:07
> >> An: users@jena.apache.org
> >> Betreff: Re: xloader "Can't find gzip program"
> >>
> >> Hi Joachim,
> >>
> >> https://issues.apache.org/jira/browse/JENA-2277
> >> https://issues.apache.org/jira/browse/JENA-2279
> >>
> >> There are two fixes for tdb2.xloader which are now in the development
> >> builds:
> >>
> >> https://repository.apache.org/content/groups/snapshots/org/apache/jen
> >> a/
> >>
> >> (these are not official releases and have not been voted on by the
> >> PMC)
> >>
> >> If you coudl test them and let us know if they work or whether theer
> >> are further problems, that would be great.
> >>
> >>       Andy
> >>
> >>
> >> On 11/02/2022 17:53, Neubert, Joachim wrote:
> >>> I've just started tests with xloader. It aborts with
> >>>
> >>> 17:21:56 INFO  Data            :: Triples = 10,000,000 ; Quads = 0
> >>> 17:21:57 INFO  =-=-=-=-=-=-=-=
> >>> 17:21:57 INFO
> >>> 17:21:57 INFO  Build SPO
> >>> 17:21:57 INFO  (Very long pause likely at this point)
> >>> 17:21:58 INFO  Index           :: Build index SPO
> >>> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't
> >>> find
> >> gzip program
> >>>     at
> >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui
> >> ldIn
> >> dexX.java:207)
> >>>     at
> >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildInde
> >> xX.ja
> >> va:121)
> >>>     at
> >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.ja
> >> va:1
> >> 06)
> >>>     at
> >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.jav
> >> a:94
> >> )
> >>>     at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
> >>>     at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
> >>>     at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> >>>     at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> >>>     at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
> >>> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
> >>>     at
> >> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java
> >> :67
> >> )
> >>>     at
> >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui
> >> ldIn
> >> dexX.java:183)
> >>>     ... 8 more
> >>>
> >>> Of course, /usr/bin/gzip is in the path. My configuration is below,
> >> tdb2.xloader was called with --threads=12.
> >>>
> >>> Any idea what could be wrong?
> >>>
> >>> Cheers, Joachim
> >>>
> >>>
> >>> Configuration:
> >>> openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime
> Environment
> >>> 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build
> >>> 11.0.13+8-LTS, mixed mode, sharing)
> >>> JAVA_OPTS: -d64 -Xmx12G
> >>> Loader: tdb2.xloader
> >>> Jena:       VERSION: 4.4.0
> >>> Jena:       BUILD_DATE: 2022-01-30T15:09:41Z
> >>> ARQ:        VERSION: 4.4.0
> >>> ARQ:        BUILD_DATE: 2022-01-30T15:09:41Z
> >>> TDB:        VERSION: 4.4.0
> >>> TDB:        BUILD_DATE: 2022-01-30T15:09:41Z
> >>>
> >>> Use fuseki tdb2.xloader on file
> >>> /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> >>> 17:20:13 INFO  Setup:
> >>> 17:20:13 INFO    Database: /zbw/var/lib/fuseki/databases/temp
> >>> 17:20:13 INFO    Data:     /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> >>> 17:20:13 INFO    TMPDIR:   /zbw/var/lib/fuseki/databases/temp
> >>> 17:20:13 INFO
> >>> 17:20:13 INFO  Load node table
> >>>
> >>>
> >>> --
> >>> Joachim Neubert
> >>>
> >>> ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg
> >>> 21
> >>> 20354 Hamburg
> >>> Phone +49-40-42834-462
> >>>
> >>>

Re: AW: xloader "Can't find gzip program"

Posted by Andy Seaborne <an...@apache.org>.
Works for me - make sure it is the latest dev build (the one down the 
bottom)

I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09)

and loaded a few millions triples with no problems.

rm -rf DB2
apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2 
~/Datasets/BSBM/bsbm-5m.nt.gz

     Andy

On 11/02/2022 21:20, Neubert, Joachim wrote:
> Hi Andy,
> 
> Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster - however, the same error at SPO start.
> 
> Please let me know if I can help with tracing/reproducing the error.
> 
> Cheers, Joachim
> 
>> -----Ursprüngliche Nachricht-----
>> Von: Andy Seaborne <an...@apache.org>
>> Gesendet: Freitag, 11. Februar 2022 21:07
>> An: users@jena.apache.org
>> Betreff: Re: xloader "Can't find gzip program"
>>
>> Hi Joachim,
>>
>> https://issues.apache.org/jira/browse/JENA-2277
>> https://issues.apache.org/jira/browse/JENA-2279
>>
>> There are two fixes for tdb2.xloader which are now in the development
>> builds:
>>
>> https://repository.apache.org/content/groups/snapshots/org/apache/jena/
>>
>> (these are not official releases and have not been voted on by the PMC)
>>
>> If you coudl test them and let us know if they work or whether theer are
>> further problems, that would be great.
>>
>>       Andy
>>
>>
>> On 11/02/2022 17:53, Neubert, Joachim wrote:
>>> I've just started tests with xloader. It aborts with
>>>
>>> 17:21:56 INFO  Data            :: Triples = 10,000,000 ; Quads = 0
>>> 17:21:57 INFO  =-=-=-=-=-=-=-=
>>> 17:21:57 INFO
>>> 17:21:57 INFO  Build SPO
>>> 17:21:57 INFO  (Very long pause likely at this point)
>>> 17:21:58 INFO  Index           :: Build index SPO
>>> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find
>> gzip program
>>>     at
>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn
>> dexX.java:207)
>>>     at
>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIndexX.ja
>> va:121)
>>>     at
>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.java:1
>> 06)
>>>     at
>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.java:94
>> )
>>>     at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
>>>     at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
>>>     at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>     at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>     at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
>>> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
>>>     at
>> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java:67
>> )
>>>     at
>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn
>> dexX.java:183)
>>>     ... 8 more
>>>
>>> Of course, /usr/bin/gzip is in the path. My configuration is below,
>> tdb2.xloader was called with --threads=12.
>>>
>>> Any idea what could be wrong?
>>>
>>> Cheers, Joachim
>>>
>>>
>>> Configuration:
>>> openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime Environment
>>> 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build
>>> 11.0.13+8-LTS, mixed mode, sharing)
>>> JAVA_OPTS: -d64 -Xmx12G
>>> Loader: tdb2.xloader
>>> Jena:       VERSION: 4.4.0
>>> Jena:       BUILD_DATE: 2022-01-30T15:09:41Z
>>> ARQ:        VERSION: 4.4.0
>>> ARQ:        BUILD_DATE: 2022-01-30T15:09:41Z
>>> TDB:        VERSION: 4.4.0
>>> TDB:        BUILD_DATE: 2022-01-30T15:09:41Z
>>>
>>> Use fuseki tdb2.xloader on file
>>> /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
>>> 17:20:13 INFO  Setup:
>>> 17:20:13 INFO    Database: /zbw/var/lib/fuseki/databases/temp
>>> 17:20:13 INFO    Data:     /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
>>> 17:20:13 INFO    TMPDIR:   /zbw/var/lib/fuseki/databases/temp
>>> 17:20:13 INFO
>>> 17:20:13 INFO  Load node table
>>>
>>>
>>> --
>>> Joachim Neubert
>>>
>>> ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg 21
>>> 20354 Hamburg
>>> Phone +49-40-42834-462
>>>
>>>

AW: xloader "Can't find gzip program"

Posted by "Neubert, Joachim" <J....@zbw.eu>.
Hi Andy,

Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster - however, the same error at SPO start.

Please let me know if I can help with tracing/reproducing the error.

Cheers, Joachim

> -----Ursprüngliche Nachricht-----
> Von: Andy Seaborne <an...@apache.org>
> Gesendet: Freitag, 11. Februar 2022 21:07
> An: users@jena.apache.org
> Betreff: Re: xloader "Can't find gzip program"
> 
> Hi Joachim,
> 
> https://issues.apache.org/jira/browse/JENA-2277
> https://issues.apache.org/jira/browse/JENA-2279
> 
> There are two fixes for tdb2.xloader which are now in the development
> builds:
> 
> https://repository.apache.org/content/groups/snapshots/org/apache/jena/
> 
> (these are not official releases and have not been voted on by the PMC)
> 
> If you coudl test them and let us know if they work or whether theer are
> further problems, that would be great.
> 
>      Andy
> 
> 
> On 11/02/2022 17:53, Neubert, Joachim wrote:
> > I've just started tests with xloader. It aborts with
> >
> > 17:21:56 INFO  Data            :: Triples = 10,000,000 ; Quads = 0
> > 17:21:57 INFO  =-=-=-=-=-=-=-=
> > 17:21:57 INFO
> > 17:21:57 INFO  Build SPO
> > 17:21:57 INFO  (Very long pause likely at this point)
> > 17:21:58 INFO  Index           :: Build index SPO
> > java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find
> gzip program
> >    at
> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn
> dexX.java:207)
> >    at
> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIndexX.ja
> va:121)
> >    at
> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.java:1
> 06)
> >    at
> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.java:94
> )
> >    at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
> >    at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
> >    at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> >    at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> >    at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
> > Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
> >    at
> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java:67
> )
> >    at
> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn
> dexX.java:183)
> >    ... 8 more
> >
> > Of course, /usr/bin/gzip is in the path. My configuration is below,
> tdb2.xloader was called with --threads=12.
> >
> > Any idea what could be wrong?
> >
> > Cheers, Joachim
> >
> >
> > Configuration:
> > openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime Environment
> > 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build
> > 11.0.13+8-LTS, mixed mode, sharing)
> > JAVA_OPTS: -d64 -Xmx12G
> > Loader: tdb2.xloader
> > Jena:       VERSION: 4.4.0
> > Jena:       BUILD_DATE: 2022-01-30T15:09:41Z
> > ARQ:        VERSION: 4.4.0
> > ARQ:        BUILD_DATE: 2022-01-30T15:09:41Z
> > TDB:        VERSION: 4.4.0
> > TDB:        BUILD_DATE: 2022-01-30T15:09:41Z
> >
> > Use fuseki tdb2.xloader on file
> > /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> > 17:20:13 INFO  Setup:
> > 17:20:13 INFO    Database: /zbw/var/lib/fuseki/databases/temp
> > 17:20:13 INFO    Data:     /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> > 17:20:13 INFO    TMPDIR:   /zbw/var/lib/fuseki/databases/temp
> > 17:20:13 INFO
> > 17:20:13 INFO  Load node table
> >
> >
> > --
> > Joachim Neubert
> >
> > ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg 21
> > 20354 Hamburg
> > Phone +49-40-42834-462
> >
> >

Re: xloader "Can't find gzip program"

Posted by Andy Seaborne <an...@apache.org>.
Hi Joachim,

https://issues.apache.org/jira/browse/JENA-2277
https://issues.apache.org/jira/browse/JENA-2279

There are two fixes for tdb2.xloader which are now in the development 
builds:

https://repository.apache.org/content/groups/snapshots/org/apache/jena/

(these are not official releases and have not been voted on by the PMC)

If you coudl test them and let us know if they work or whether theer are 
further problems, that would be great.

     Andy


On 11/02/2022 17:53, Neubert, Joachim wrote:
> I've just started tests with xloader. It aborts with
> 
> 17:21:56 INFO  Data            :: Triples = 10,000,000 ; Quads = 0
> 17:21:57 INFO  =-=-=-=-=-=-=-=
> 17:21:57 INFO
> 17:21:57 INFO  Build SPO
> 17:21:57 INFO  (Very long pause likely at this point)
> 17:21:58 INFO  Index           :: Build index SPO
> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find gzip program
>    at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIndexX.java:207)
>    at org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIndexX.java:121)
>    at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.java:106)
>    at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.java:94)
>    at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80)
>    at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
>    at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>    at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>    at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28)
> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program
>    at org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java:67)
>    at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIndexX.java:183)
>    ... 8 more
> 
> Of course, /usr/bin/gzip is in the path. My configuration is below, tdb2.xloader was called with --threads=12.
> 
> Any idea what could be wrong?
> 
> Cheers, Joachim
> 
> 
> Configuration:
> openjdk version "11.0.13" 2021-10-19 LTS
> OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, sharing)
> JAVA_OPTS: -d64 -Xmx12G
> Loader: tdb2.xloader
> Jena:       VERSION: 4.4.0
> Jena:       BUILD_DATE: 2022-01-30T15:09:41Z
> ARQ:        VERSION: 4.4.0
> ARQ:        BUILD_DATE: 2022-01-30T15:09:41Z
> TDB:        VERSION: 4.4.0
> TDB:        BUILD_DATE: 2022-01-30T15:09:41Z
> 
> Use fuseki tdb2.xloader on file /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> 17:20:13 INFO  Setup:
> 17:20:13 INFO    Database: /zbw/var/lib/fuseki/databases/temp
> 17:20:13 INFO    Data:     /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz
> 17:20:13 INFO    TMPDIR:   /zbw/var/lib/fuseki/databases/temp
> 17:20:13 INFO
> 17:20:13 INFO  Load node table
> 
> 
> --
> Joachim Neubert
> 
> ZBW - Leibniz Information Centre for Economics
> Neuer Jungfernstieg 21
> 20354 Hamburg
> Phone +49-40-42834-462
> 
>