You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2019/10/31 17:06:53 UTC

[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #281: Remove heap sized limit for parsing

mbeckerle commented on a change in pull request #281: Remove heap sized limit for parsing
URL: https://github.com/apache/incubator-daffodil/pull/281#discussion_r341247715
 
 

 ##########
 File path: daffodil-io/src/main/scala/org/apache/daffodil/io/InputSource.scala
 ##########
 @@ -22,6 +22,9 @@ import java.nio.ByteBuffer
 import scala.collection.mutable.ArrayBuffer
 
 import org.apache.daffodil.exceptions.Assert
+import org.apache.daffodil.exceptions.ThinThrowable
+
+case class BacktrackingException() extends Exception with ThinThrowable
 
 Review comment:
   Block comment below has typo "thing of things" should be "think of things". 
   
   More info needs to be added to this block comment about how this now works. The comment suggests an unbounded potential for backtracking, and that's clearly no longer the case on purpose. 
   
   Comments to this effect, reword as suitable. Something along these lines explaining the gist of it even by example if necessary. Here's a few paragraphs:
   
   There is a finite limit to the distance one can backtrack which is given by the implementation's use of a finite array of  finite fixed size buckets.  If vast amount of data is read in more than the ultimate backtrack limit (number of buckets times bucket size)  in size, then that will cause data buckets to effectively spill off into parser history, and the ability to backtrack to the points in the data they stored is lost along with them.  Any time there is a point-of-uncertainty to which the parser could backtrack, and then the parser advances through more data than the backtracking maximum limit, the ability to backtrack to that point of uncertainty will be lost. This will be detected, and is a fatal error. (Runtime SDE). 
   
   This situation is easiest to envision if BLOB objects are involved, but is not BLOB specific. A format with a choice, then in the first branch of the choice, a group and/or array of small data items, ultimately totaling in size to more than the backtrack limit, and then a branch failure, will cause this backtracking limit error with no BLOBs being used. 
   
   There is no limit created by this module of code to the size of a any data item having to fit within JVM byte array maximums, nor the JVM memory footprint even. Only a limitation on backtracking distance is created.  
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services