You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Mike Beckerle (Jira)" <ji...@apache.org> on 2021/04/30 15:57:00 UTC
[jira] [Assigned] (DAFFODIL-2504) Parse text of non-specified length from TCP hangs needlessly

     [ https://issues.apache.org/jira/browse/DAFFODIL-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Beckerle reassigned DAFFODIL-2504:
---------------------------------------

    Assignee:     (was: Mike Beckerle)

> Parse text of non-specified length from TCP hangs needlessly
> ------------------------------------------------------------
>
>                 Key: DAFFODIL-2504
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2504
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>    Affects Versions: 3.0.0
>            Reporter: Mike Beckerle
>            Priority: Minor
>
> See tests
> {color:#00627a}testDaffodilParseFromNetworkDelimited1{color}
> {color:#00627a}testDaffodilParseFromNetworkDelimited1b{color}
> {color:#00627a}testDaffodilParseFromNetworkDelimited2{color}
> {color:#00627a}testDaffodilParseFromNetworkDelimited2b{color}
> {color:#00627a}When parsing text from a network TCP stream, the parse should succeed once the parser knows it has matched the longest possible delimiter. It should not require more than that many characters to be present on the data stream in order for the parse to complete. {color}
> {color:#00627a}There are no tests as yet, but presumably lengthKind 'pattern' will have a similar issue where only enough characters should be needed to provide the knowably longest match for the regex. (For example, suppose dfdl:lengthPattern="." which is looking for exactly 1 byte. The match of this should NOT require that more than one byte be available on the TCP stream. {color}
> {color:#00627a}The arbitrary size 8 of the CharBuffer in InputSourceDataInputStream leads to this requiring around 8 characters of look ahead beyond the last character matched to the delimiter. Resizing this to 2 allows tests to succeed with fewer lookahead characters, but really the whole approach/algorithm needs to be examined to really consider the lookahead, and if it can be avoided in many cases.{color}
> {color:#00627a}It is known that you can't always avoid looking ahead 1 character. {color}{color:#00627a}For matching delimiters that use DFDL Character Class Entities that can match a variable number of characters (e.g., WSP+, WSP*, and NL) a lookahead of 1 is clearly necessary to know if the match is complete. {color}
> {color:#00627a}For matching regular expressions, since they can lookahead an arbitrary finite distance, the amount of lookahead required depends on the specific regex. {color}
> {color:#00627a}Since some amount of look ahead is required in these cases, fixing this issue for the simpler situation of just basic delimiters with a fixed number of characters seems relatively low priority. {color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)