You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Rob Vesse <rv...@dotnetrdf.org> on 2017/10/17 13:51:40 UTC

Supporting Concatenated Gzip archives

Andy

Would it be worth pulling in commons-compress as a dependency and switching to using their GZip stream implementations that do not have this limitation?

 This is a trivial change but It does add an additional dependency

Rob

On 17/10/2017 14:08, "Andy Seaborne" <an...@apache.org> wrote:

    In addition to Rob's point about multiple file in one GZ file...
    
    What does the Fuseki log say?
    
    Can you unload the NT file uncompressed?
    
    How are you uploading the nt.gz file?
    
         Andy
    
    On 17/10/17 05:15, Rob Vesse wrote:
    > Do you know how the original GZip archive was generated?
    > 
    > Jena uses the standard JDK GZip support to read GZip archives. The JDK doesn’t support the case where multiple separate GZip streams are concatenated into a single file. Therefore, if the archive was created in that way Jena might only read the first stream from the archive and ignore the subsequent streams.
    > 
    > Extracting with rapper probably uses the OS gzip directly or a library implementation of it which does handle this concatentation
    > 
    >   Is this a file you could share somehow?
    > 
    > Rob
    > 
    > On 17/10/2017 03:55, "Andrew U. Frank" <fr...@geoinfo.tuwien.ac.at> wrote:
    > 
    >      i experience a strange effect (replicated a few times):
    >      i upload data in nt.gz format and get a success message, but only a part
    >      (sometimes less than 10%) are uploaded.
    >      if i extract the nt file from gz.nt and then convert with rapper to
    >      turtle format, i get an information on how many tripels are in the nt.gz
    >      file and when i then upload the ttl file all triples are loaded.
    >      i use the browser upload.
    >      
    >      any explanation? i use fuseki 3.4.0.
    >      
    >      thank you!
    >      andrew
    >      
    >      
    > 
    > 
    > 
    > 
    





Re: Supporting Concatenated Gzip archives

Posted by Andy Seaborne <an...@apache.org>.
Not for 3.5.0 :-)

Actually, I'm not clear that the users Q is clear - maybe it's the HTTP 
gzip option which is why I wamted to se the Fuseki log and know how he's 
pushing the file(s).  I assume it's all his RDF/XML files, converted.

Maybe better to see as "upload collection" and include zip and 
tar,tag.gz files?  NT can be concatenated, RDF/XML can not.

     Andy


On 17/10/17 09:51, Rob Vesse wrote:
> Andy
> 
> Would it be worth pulling in commons-compress as a dependency and switching to using their GZip stream implementations that do not have this limitation?
> 
>   This is a trivial change but It does add an additional dependency
> 
> Rob
> 
> On 17/10/2017 14:08, "Andy Seaborne" <an...@apache.org> wrote:
> 
>      In addition to Rob's point about multiple file in one GZ file...
>      
>      What does the Fuseki log say?
>      
>      Can you unload the NT file uncompressed?
>      
>      How are you uploading the nt.gz file?
>      
>           Andy
>      
>      On 17/10/17 05:15, Rob Vesse wrote:
>      > Do you know how the original GZip archive was generated?
>      >
>      > Jena uses the standard JDK GZip support to read GZip archives. The JDK doesn’t support the case where multiple separate GZip streams are concatenated into a single file. Therefore, if the archive was created in that way Jena might only read the first stream from the archive and ignore the subsequent streams.
>      >
>      > Extracting with rapper probably uses the OS gzip directly or a library implementation of it which does handle this concatentation
>      >
>      >   Is this a file you could share somehow?
>      >
>      > Rob
>      >
>      > On 17/10/2017 03:55, "Andrew U. Frank" <fr...@geoinfo.tuwien.ac.at> wrote:
>      >
>      >      i experience a strange effect (replicated a few times):
>      >      i upload data in nt.gz format and get a success message, but only a part
>      >      (sometimes less than 10%) are uploaded.
>      >      if i extract the nt file from gz.nt and then convert with rapper to
>      >      turtle format, i get an information on how many tripels are in the nt.gz
>      >      file and when i then upload the ttl file all triples are loaded.
>      >      i use the browser upload.
>      >
>      >      any explanation? i use fuseki 3.4.0.
>      >
>      >      thank you!
>      >      andrew
>      >
>      >
>      >
>      >
>      >
>      >
>      
> 
> 
> 
>