You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/01/14 16:37:58 UTC

[GitHub] [incubator-daffodil] stevedlawrence commented on pull request #472: Fix how we throw away buckets during parsing

stevedlawrence commented on pull request #472:
URL: https://github.com/apache/incubator-daffodil/pull/472#issuecomment-760313432

> Is this fix relevant only to Daffodil 3.0.0, or is this something affecting code from earlier revisions such as 2.4.0 or 2.6.0 ?

This bug was introduced in 2.5.0, so it's been around for about a year. But without the new streaming capabilities introduced in 3.0.0, I'm not sure how likely it is to actually hit this issue. I suspect of the time you'll just run out of memory before hit this bug. I just now tested this same schema + data with 2.7.0 and if looks like it's just stuck in the garbage collector trying to find memory to free. I suspect if I let it run long enough I'd get an OutOfMemoryException.

It might be worth considering a 3.0.1 patch release for this issue, since there isn't really a good workaround. The only workaround when dealing with files larger than 256MB I can think of is to avoid the InputStream constructor when creating an InputSourceDataInputStream so it just doesn't use the bucketing stuff at all. Instead a user can use the ByteBuffer or Array[Byte] cosntructors, but that means that all the data must be read into memory, and there's a 2GB limit, so it's not that great of a workaround. And that only works for people using the API. There's no workaround for people using the CLI.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org