You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2014/07/11 11:15:05 UTC

[jira] [Commented] (JENA-744) Error importing from large gzip

    [ https://issues.apache.org/jira/browse/JENA-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058578#comment-14058578 ] 

Rob Vesse commented on JENA-744:
--------------------------------

I'm not sure what you expect us to do here?

As you point out the Gzip spec limits the file size to 4GB so the size of files greater than that cannot be detected a priori.  We are just using the standard Java {{GZipInputStream}} to read GZipped files which must logically be the source of the truncation so files exceeding that limit are always going to be problematic.

I am guessing you want us to switch to an alternative Gzip implementation that does not have this problem?  If you know of an alternative Java implementation of Gzip decompression that is compatible with Apache licensing policy and does not experience this bug that we could use then we can happily look at switching to that

> Error importing from large gzip
> -------------------------------
>
>                 Key: JENA-744
>                 URL: https://issues.apache.org/jira/browse/JENA-744
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>            Reporter: Michael Kozakov
>         Attachments: gzip.png
>
>
> gzip has a documented bug: 
> http://www.freebsd.org/cgi/man.cgi?query=gzip#end
> "According to RFC 1952, the	recorded file size is stored in	a 32-bit inte-
>      ger, therefore, it	can not	represent files	larger than 4GB.  This limita-
>      tion also applies to -l option of gzip utility."
> As a result, a 28gb compressed gz shows that the uncompressed size is 1.6gb. (screenshot attached)
> It seems like tdbloader relies on this information to know when to stop importing, and as a result, the imported database is incomplete. As a walkaround, I have to extract the archive before using tdbloader to import the database, otherwise it will be missing the majority of items.



--
This message was sent by Atlassian JIRA
(v6.2#6252)