You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/09/17 18:42:01 UTC
[jira] [Commented] (FLINK-10356) Add sanity checks to SpillingAdaptiveSpanningRecordDeserializer

    [ https://issues.apache.org/jira/browse/FLINK-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617980#comment-16617980 ] 

ASF GitHub Bot commented on FLINK-10356:
----------------------------------------

NicoK opened a new pull request #6705: [FLINK-10356][network] add sanity checks to SpillingAdaptiveSpanningRecordDeserializer
URL: https://github.com/apache/flink/pull/6705
 
 
   ## What is the purpose of the change
   
   `SpillingAdaptiveSpanningRecordDeserializer` doesn't have too many consistency checks for usage calls or serializers behaving properly, e.g. to read only as many bytes as available/promised for that record. At least these checks should be added:
   
   1. Check that buffers have not been read from yet before adding them (this is an invariant `SpillingAdaptiveSpanningRecordDeserializer` works with and from what I can see, it is followed now.
   2. Check that after deserialization, we actually consumed `recordLength` bytes
      - If not, in the spanning deserializer, we currently simply skip the remaining bytes.
      - But in the non-spanning deserializer, we currently continue from the wrong offset.
   3. Protect against `setNextBuffer()` being called before draining all available records
   
   ## Brief change log
   
   - add the aforementioned checks
   - also add `[FLINK-9812][network][tests] fix SpanningRecordSerializationTest not using deserializer correctly` which was needed to not throw based on the added checks
   - fix not resetting all fields correctly in `SpanningWrapper` (to safeguard further usages / future changes)
   
   This PR is based on #6693 
   
   ## Verifying this change
   
   This change is already covered by existing tests for the successful de/serializations, e.g. via `SpanningRecordSerializationTest`.
   
   This change added tests and can be verified as follows:
   
   - added misbehaving serialization tests under `SpanningRecordSerializationTest`
   - adapted and fixed `CustomSerializationITCase`, working with `ExpectedException` now (please also note that `testIncorrectSerializer4` is not actually verifying behaviour - previously it silently passed although no exception was thrown)
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): **no**
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no**
     - The serializers: **yes** (network only!)
     - The runtime per-record code paths (performance sensitive): **yes**
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
     - The S3 file system connector: **no**
   
   ## Documentation
   
     - Does this pull request introduce a new feature? **no**
     - If yes, how is the feature documented? **not applicable**
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add sanity checks to SpillingAdaptiveSpanningRecordDeserializer
> ---------------------------------------------------------------
>
>                 Key: FLINK-10356
>                 URL: https://issues.apache.org/jira/browse/FLINK-10356
>             Project: Flink
>          Issue Type: Improvement
>          Components: Network
>    Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.6.0, 1.6.1, 1.7.0, 1.5.4
>            Reporter: Nico Kruber
>            Assignee: Nico Kruber
>            Priority: Major
>              Labels: pull-request-available
>
> {{SpillingAdaptiveSpanningRecordDeserializer}} doesn't have any consistency checks for usage calls or serializers behaving properly, e.g. to read only as many bytes as available/promised for that record. At least these checks should be added:
>  # Check that buffers have not been read from yet before adding them (this is an invariant {{SpillingAdaptiveSpanningRecordDeserializer}} works with and from what I can see, it is followed now.
>  # Check that after deserialization, we actually consumed {{recordLength}} bytes
>  ** If not, in the spanning deserializer, we currently simply skip the remaining bytes.
>  ** But in the non-spanning deserializer, we currently continue from the wrong offset.
>  # Protect against {{setNextBuffer}} being called before draining all available records



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)