You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Marchant, Hayden " <ha...@citi.com> on 2018/02/26 11:32:02 UTC

'Custom' mapping function on keyed WindowedStream

I would like to create a custom aggregator function for a windowed KeyedStream which I have complete control over - i.e. instead of implementing an AggregatorFunction, I would like to control the lifecycle of the flink state by implementing the CheckpointedFunction interface, though I still want this state to be per-key, per-window. 

I am not sure which function I should be calling on the WindowedStream in order to invoke this custom functionality. I see from the documentation that CheckpointedFunction is for non-keyed state - which I guess eliminates this option.

A little background - I have logic that needs to hold a very large state in the operator - lots of counts by sub-key. Since only a sub-set of these aggregations are updated, I was interesting in trying out incremental checkpointing in RocksDB. However, I don't want to be hitting RocksDB I/O on every update of state since we need very low latency, and instead wanted to hold the state in Java Heap and then update the Flink state on checkpoint - i.e something like CheckpointedFunction.
My assumption is that any update I make to RocksDB backed state will hit the local disk - if this is wrong then I'll be happy

What other options do I have?

Thanks,
Hayden Marchant


Re: 'Custom' mapping function on keyed WindowedStream

Posted by Seth Wiesman <sw...@mediamath.com>.
I had to solve a similar problem, we use a process function with rocksdb and map state for the sub keys. So while we hit rocks on every element, only the specified sub keys are ever read from disk. 

Seth Wiesman| Software Engineer4 World Trade Center, 46th Floor, New York, NY 10007swiesman@mediamath.com <ma...@mediamath.com> 
 

On 2/26/18, 6:32 AM, "Marchant, Hayden " <ha...@citi.com> wrote:

    I would like to create a custom aggregator function for a windowed KeyedStream which I have complete control over - i.e. instead of implementing an AggregatorFunction, I would like to control the lifecycle of the flink state by implementing the CheckpointedFunction interface, though I still want this state to be per-key, per-window. 
    
    I am not sure which function I should be calling on the WindowedStream in order to invoke this custom functionality. I see from the documentation that CheckpointedFunction is for non-keyed state - which I guess eliminates this option.
    
    A little background - I have logic that needs to hold a very large state in the operator - lots of counts by sub-key. Since only a sub-set of these aggregations are updated, I was interesting in trying out incremental checkpointing in RocksDB. However, I don't want to be hitting RocksDB I/O on every update of state since we need very low latency, and instead wanted to hold the state in Java Heap and then update the Flink state on checkpoint - i.e something like CheckpointedFunction.
    My assumption is that any update I make to RocksDB backed state will hit the local disk - if this is wrong then I'll be happy
    
    What other options do I have?
    
    Thanks,
    Hayden Marchant
    
    


Re: 'Custom' mapping function on keyed WindowedStream

Posted by Seth Wiesman <sw...@mediamath.com>.
I had to solve a similar problem, we use a process function with rocksdb and map state for the sub keys. So while we hit rocks on every element, only the specified sub keys are ever read from disk. 

Seth Wiesman| Software Engineer4 World Trade Center, 46th Floor, New York, NY 10007swiesman@mediamath.com <ma...@mediamath.com> 
 

On 2/26/18, 6:32 AM, "Marchant, Hayden " <ha...@citi.com> wrote:

    I would like to create a custom aggregator function for a windowed KeyedStream which I have complete control over - i.e. instead of implementing an AggregatorFunction, I would like to control the lifecycle of the flink state by implementing the CheckpointedFunction interface, though I still want this state to be per-key, per-window. 
    
    I am not sure which function I should be calling on the WindowedStream in order to invoke this custom functionality. I see from the documentation that CheckpointedFunction is for non-keyed state - which I guess eliminates this option.
    
    A little background - I have logic that needs to hold a very large state in the operator - lots of counts by sub-key. Since only a sub-set of these aggregations are updated, I was interesting in trying out incremental checkpointing in RocksDB. However, I don't want to be hitting RocksDB I/O on every update of state since we need very low latency, and instead wanted to hold the state in Java Heap and then update the Flink state on checkpoint - i.e something like CheckpointedFunction.
    My assumption is that any update I make to RocksDB backed state will hit the local disk - if this is wrong then I'll be happy
    
    What other options do I have?
    
    Thanks,
    Hayden Marchant