You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by fhueske <gi...@git.apache.org> on 2017/05/15 22:20:23 UTC

[GitHub] flink pull request #3912: [FLINK-6589] [core] Deserialize ArrayList with cap...

GitHub user fhueske opened a pull request:

    https://github.com/apache/flink/pull/3912

    [FLINK-6589] [core] Deserialize ArrayList with capacity of size+1 to prevent growth.

    Several Table API / SQL operators hold records in a `MapState[Long, List[X]]` keyed on a timestamp.
    When a new record arrives, the corresponding list is fetched and the record is added to the list. 
    
    Currently, the `ListSerializer` deserializes lists as `ArrayList` with capacity exactly equal to the number of serialized elements. Hence, the `ArrayList` will grow when a new element is added which is an expensive operation.
    
    This PR changes the capacity of the deserialized `ArrayList` to #elements + 1 to avoid the growing the list when a single element is added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fhueske/flink listSer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3912.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3912
    
----
commit 1fa86b4e282ab0594d8dc768840de4c798e29591
Author: Fabian Hueske <fh...@apache.org>
Date:   2017-05-15T19:41:51Z

    [FLINK-6589] [core] Deserialize ArrayList with capacity of size+1 to prevent growth.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3912: [FLINK-6589] [core] Deserialize ArrayList with capacity o...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on the issue:

    https://github.com/apache/flink/pull/3912
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3912: [FLINK-6589] [core] Deserialize ArrayList with capacity o...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3912
  
    Any concerns regarding this change @tzulitai @StephanEwen @StefanRRichter ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3912: [FLINK-6589] [core] Deserialize ArrayList with capacity o...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/3912
  
    Yes, there is a tradeoff, and probably the nicest way would be to restore the internal array size as it was when we serialized the original. However, I guess that is too much effort for the effect. Since the sweetspot also depends on the access pattern, I think everything from +1 to some percentage could be justified and as there is no clear "best strategy" I would leave this up to your judgment call.
    
    +1 from me then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3912: [FLINK-6589] [core] Deserialize ArrayList with capacity o...

Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/3912
  
    No concern, just curious if there is any reason for the 1 additional. The problem also still exists now for a second element. So is it, in general, unlikely that new elements are added? Or should we add some more additional space, like 10%?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3912: [FLINK-6589] [core] Deserialize ArrayList with capacity o...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3912
  
    At least in the Table API / SQL operators, we use a ProcessFunction add elements in the `processElement()` method. So, in the case that I want to improve just adds a single element.
    
    My rational was that this is the most common use case (I might be wrong here). I would be fine with adding more, but the question is how often is that actually used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3912: [FLINK-6589] [core] Deserialize ArrayList with cap...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/3912


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3912: [FLINK-6589] [core] Deserialize ArrayList with capacity o...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/3912
  
    Share the same question as Stefan. Otherwise LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3912: [FLINK-6589] [core] Deserialize ArrayList with capacity o...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3912
  
    Thank you, I'll merge this then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3912: [FLINK-6589] [core] Deserialize ArrayList with capacity o...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/3912
  
    If the only case we currently know of right now is in the Table API with 1 additional element to be added, then I think this is good for now.
    
    I think the improvement should be bounded to specific cases anyway, because the problem will always exist no matter how much extra space we add. For example the `ArrayListSerializer` used for list state could have any arbitrary amounts of elements added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---