You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/12/04 03:51:00 UTC

[jira] [Commented] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound

    [ https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708154#comment-16708154 ] 

ASF GitHub Bot commented on FLINK-11050:
----------------------------------------

Myracle opened a new pull request #7226: FLINK-11050 add lowerBound and upperBound for optimizing RocksDBMapState's entries
URL: https://github.com/apache/flink/pull/7226
 
 
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   ## What is the purpose of the change
   
   This pull request optimizes the seek of RocksDBMapState's entries by assigning lowerBound and upperBound.
   
   
   ## Brief change log
   
     - *Add entries(lowerBound, upperBound) in MapState and implement it in RocksDBMapState.
     -*Use entries(lowerBound, upperBound) instead of entries() in IntervalJoin.java when get buffer's values.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-11050
>                 URL: https://issues.apache.org/jira/browse/FLINK-11050
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.6.2, 1.7.0
>            Reporter: Liu
>            Priority: Major
>              Labels: performance, pull-request-available
>             Fix For: 1.7.1
>
>
>     When IntervalJoin, it is very slow to get left or right buffer's entries. Because we have to scan all buffer's values, including the deleted values which are out of time range. These deleted values's processing consumes too much time in RocksDB's level 0. Since lowerBound is known, it can be optimized by seek from the timestamp of lowerBound.
>     Our usage is like below:
> {code:java}
> labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid))
>            .between(Time.milliseconds(0), Time.milliseconds(600000))
>            .process(new processFunction())
>            .sink(kafkaProducer)
> {code}
>     Our data is huge. The job always runs for an hour and is stuck by RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate the problem RocksDB and find that it takes too much time in deleted values. So we decide to optimize it by assigning the lowerBound instead of global search.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)