You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Reuven Lax <re...@google.com> on 2020/06/16 15:47:32 UTC

Re: DISCUSS: Sorted MapState API

Bringing this subject up again,

I've spent some time looking into implementing this for the Dataflow
runner. I'm unable to find a way to implement the arbitrary sorted multimap
efficiently for the case where there are large numbers of unique keys.
Since the primary driving use case is timestamp ordering (i.e. key is event
timestamp), you would expect to have nearly a new key per element. I
considered Luke's suggestion above, but unfortunately it doesn't really
solve this issue.

The primary use case for sorting always seems to be sorting by timestamp. I
want to propose that instead of building the fully-general sorted multimap,
we instead focus on a state type where the sort key is an integral type
(like a timestamp or an integer). There is still a valid use case for
multimap, but we can provide that as an unordered state. At least for
Dataflow, it will be much easier

While my difficulties here may be specific to the Dataflow runner, any such
support would have to be built into other runners as well, and limiting to
integral sorting likely makes it easier for other runners to implement
this. Also, if you look at this
<https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95>
Flink
comment pointed out by Aljoscha, for Flink the main use case identified was
also timestamp sorting. This will also simplify the API design for this
feature: Sorted multimap with arbitrary keys would require us to introduce
a way of mapping natural ordering to encoded ordering (i.e. a new
OrderPreservingCoder), but if we limit sort keys to integral types, the API
design is simpler as integral types can be represented directly.

Reuven

On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com> wrote:

> This sounds to me like a potential runner strategy. However if a runner
> can natively support sorted maps (e.g. we expect the Dataflow runner to be
> able to do so, and I think it would be useful for other runners as well),
> then it's probably preferable to allow the runner to use its native
> capabilities.
>
> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> For the API that you proposed, the map key is always "void" and the sort
>> key == user key. So in my example of
>> key: dummy value
>> key.000: token, (0001, value4)
>> key.001: token, (0010, value1), (0011, value2)
>> key.01: token
>> key.1: token, (1011, value3)
>> you would have:
>> "void": dummy value
>> "void".000: token, (0001, value4)
>> "void".001: token, (0010, value1), (0011, value2)
>> "void".01: token
>> "void".1: token, (1011, value3)
>>
>> Iterable<KV<K, V>> entriesUntil(K limit) translates into walking the the
>> prefixes until you find a common prefix for K and then filter for values
>> where they have a sort key <= K. Using the example above, to find
>> entriesUntil(0010) you would:
>> look for key."", miss
>> look for key.0, miss
>> look for key.00, miss
>> look for key.000, hit, sort all contained values using secondary key,
>> provide value4 to user
>> look for key.001, hit, notice that 001 is a prefix of 0010 so we sort all
>> contained values using secondary key, filter out value2 and provide value1
>>
>> void removeUntil(K limit) also translates into walking the prefixes but
>> instead we will clear them when we have a "hit" with some special logic for
>> when the sort key is a prefix of the key. Used the example, to
>> removeUntil(0010) you would:
>> look for key."", miss
>> look for key.0, miss
>> look for key.00, miss
>> look for key.000, hit, clear
>> look for key.001, hit, notice that 001 is a prefix of 0010 so we sort all
>> contained values using secondary key, store in memory all values that >
>> 0010, clear and append values stored in memory.
>>
>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <re...@google.com> wrote:
>>
>>> Can you explain how fetching and deleting ranges of keys would work with
>>> this data structure?
>>>
>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> Reuven, for the example, I assume that we never want to store more then
>>>> 2 values at a given sort key prefix, and if we do then we will create a new
>>>> longer prefix splitting up the values based upon the sort key.
>>>>
>>>> Tuple representation in examples below is (key, sort key, value) and .
>>>> is a character outside of the alphabet which can be represented by using an
>>>> escaping encoding that wraps the key + sort key encoding.
>>>>
>>>> To insert (key, 0010, value1), we lookup "key" + all the prefixes of
>>>> 0010 finding one that is not empty. In this case its 0, so we append value
>>>> to the map at key.0 ending up with (we also set the key to any dummy value
>>>> to know that it it contains values):
>>>> key: dummy value
>>>> key."": token, (0010, value1)
>>>> Now we insert (key, 0011, value2), we again lookup "key" + all the
>>>> prefixes of 0010, finding "", so we append value2 to key."" ending up with:
>>>> key: dummy value
>>>> key."": token, (0010, value1), (0011, value2)
>>>> Now we insert (key, 1011, value3), we again lookup "key" + all the
>>>> prefixes of 1011 finding "" but notice that it is full, so we partition all
>>>> the values into two prefixes 0 and 1. We also clear the "" prefix ending up
>>>> with:
>>>> key: dummy value
>>>> key.0: token, (0010, value1), (0011, value2)
>>>> key.1: token, (1011, value3)
>>>> Now we insert (key, 0001, value4), we again lookup "key" + all the
>>>> prefixes of the value finding 0 but notice that it is full, so we partition
>>>> all the values into two prefixes 00 and 01 but notice this doesn't help us
>>>> since 00 will be too full so we split 00 again to 000, 001. We also clear
>>>> the 0 prefix ending up with:
>>>> key: dummy value
>>>> key.000: token, (0001, value4)
>>>> key.001: token, (0010, value1), (0011, value2)
>>>> key.01: token
>>>> key.1: token, (1011, value3)
>>>>
>>>> We are effectively building a trie[1] where we only have values at the
>>>> leaves and control how full each leaf can be. There are other trie
>>>> representations like a radix tree that may be better.
>>>>
>>>> Looking up the values in sorted order for "key" would go like this:
>>>> Is key set, yes
>>>> look for key."", miss
>>>> look for key.0, miss
>>>> look for key.00, miss
>>>> look for key.000, hit, sort all contained values using secondary key,
>>>> provide value4 to user
>>>> look for key.001, hit, sort all contained values using secondary key,
>>>> provide value1 followed by value2 to user
>>>> look for key.01, hit, empty, return no values to user
>>>> look for key.1, hit, sort all contained values using secondary key,
>>>> provide value3 to user
>>>> we have walked the entire prefix space, signal end of iterable
>>>>
>>>> Some notes for the above:
>>>> * The dummy value is used to know that the key contains values and the
>>>> token is to know whether there are any values deeper in the trie so when we
>>>> know when to stop searching.
>>>> * If we can recalculate the sort key from the combination of the key
>>>> and value, then we don't need to store it.
>>>> * Keys with lots of values will perform worse then keys with less
>>>> values since we have to look up more keys but they will be empty reads. The
>>>> number of misses can be controlled by how many elements we are willing to
>>>> store at a given node before we subdivide.
>>>>
>>>> In reality you could build a lot of structures (e.g. red black tree,
>>>> binary tree) using the sort key, the issue is the cost of
>>>> rebalancing/re-organizing the structure in map form and whether it has a
>>>> convenient pre-order traversal for lookups.
>>>>
>>>>
>>>>
>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Some great comments!
>>>>>
>>>>> *Aljoscha*: absolutely this would have to be implemented by runners
>>>>> to be efficient. We can of course provide a default (inefficient)
>>>>> implementation, but ideally runners would provide better ones.
>>>>>
>>>>> *Jan* Exactly. I think MapState can be dropped or backed by this.
>>>>> E.g.
>>>>>
>>>>> *Robert* Great point about standard coders not satisfying this.
>>>>> That's why I suggested that we provide a way to tag the coders that do
>>>>> preserve order, and only accept those as key coders Alternatively we could
>>>>> present a more limited API - e.g. only allowing a hard-coded set of types
>>>>> to be used as keys - but that seems counter to the direction Beam usually
>>>>> goes. So users will have two ways .of creating multimap state specs:
>>>>>
>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>> StateSpecs.multimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>
>>>>> or
>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>
>>>>> The second one will validate that the key coder preserves order, and
>>>>> fails otherwise (similar to coder determinism checking in GroupByKey). (BTW
>>>>> we would also have versions of these functions that use coder inference to
>>>>> "guess" the coder, but those will do the same checking)
>>>>>
>>>>> Also the API I proposed did support random access! We could separate
>>>>> out OrderedBagState again if we think the use cases are fundamentally
>>>>> different. I merged the proposal into that of MultimapState because there
>>>>> seemed be 99% overlap.
>>>>>
>>>>> Reuven
>>>>>
>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <re...@google.com> wrote:
>>>>>> >
>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <al...@google.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <lc...@google.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <ru...@google.com>
>>>>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> A few obvious problems with this code:
>>>>>> >>>>>   1. Removing the elements already processed from the bag
>>>>>> requires clearing and rewriting the entire bag. This is O(n^2) in the
>>>>>> number of input trades.
>>>>>> >>>>
>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade state?
>>>>>> >>>>
>>>>>> >>>>>
>>>>>> >>>>> public interface SortedMultimapState<K, V> extends State {
>>>>>> >>>>>   // Add a value to the map.
>>>>>> >>>>>   void put(K key, V value);
>>>>>> >>>>>   // Get all values for a given key.
>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>>>>>> >>>>>  // Return all entries in the map.
>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>>>>>> >>>>>   // Return all entries in the map with keys <= limit. returned
>>>>>> elements are sorted by the key.
>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K limit);
>>>>>> >>>>>
>>>>>> >>>>>  // Remove all values with the given key;
>>>>>> >>>>>   void remove(K key);
>>>>>> >>>>>  // Remove all entries in the map with keys <= limit.
>>>>>> >>>>>   void removeUntil(K limit);
>>>>>> >>>>
>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will remove all
>>>>>> entries in the map with keys < limit.
>>>>>> >>>>
>>>>>> >>>>>
>>>>>> >>>>> Runners will sort based on the encoded value of the key. In
>>>>>> order to make this easier for users, I propose that we introduce a new tag
>>>>>> on Coders PreservesOrder. A Coder that contains this tag guarantees that
>>>>>> the encoded value preserves the same ordering as the base Java type.
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> Could you clarify what is  "encoded value preserves the same
>>>>>> ordering as the base Java type"?
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Lets say A and B represent two different instances of the same
>>>>>> Java type like a double, then A < B (using the languages comparison
>>>>>> operator) iff encode(A) < encode(B) (note the encoded versions are compared
>>>>>> lexicographically)
>>>>>> >>
>>>>>> >>
>>>>>> >> Since coders are shared across SDKs, do we expect A < B iff e(A) <
>>>>>> e(P) property to hold for all languages we support? What happens A, B sort
>>>>>> differently in different languages?
>>>>>> >
>>>>>> >
>>>>>> > That would have to be the property of the coder (which means that
>>>>>> this property probably needs to be represented in the portability
>>>>>> representation of the coder). I imagine the common use cases will be for
>>>>>> simple coders like int, long, string, etc., which are likely to sort the
>>>>>> same in most languages.
>>>>>>
>>>>>> The standard coders for both double and integral types do not respect
>>>>>> the natural ordering (consider negative values). KV coders violate the
>>>>>> "natural" lexicographic ordering on components as well. I think
>>>>>> implicitly sorting on encoded value would yield many surprises. (The
>>>>>> state, of course, could take a order-preserving, bytes
>>>>>> (string?)-producing callable as a parameter of course). (As for
>>>>>> naming, I'd probably call this OrderedBagState or something like
>>>>>> that...rather than Map which tends to imply random access.)
>>>>>>
>>>>>

Re: [External] Re: DISCUSS: Sorted MapState API

Posted by Tyson Hamilton <ty...@google.com>.

On Mon, Sep 28, 2020 at 9:24 AM Kenneth Knowles <ke...@apache.org> wrote:

>
>
> On Wed, Sep 16, 2020 at 8:48 AM Tyson Hamilton <ty...@google.com> wrote:
>
>> The use case is to support an unbounded stream-stream join, where the
>> elements are arriving in roughly time sorted order. Removing a specific
>> element from the timestamp indexed collection is necessary when a match is
>> found.
>>
>
> Just checking - this is an optimization when you already know that the
> join is 1:1?
>

Yes, I suppose my current implementation of a stream-stream join assumes
1:1 joins. If there is a case for other cardinalities we should have a
discussion about this in more detail (in another thread).


> Kenn
>
>
>> Having clearRange is helpful to expire elements that are no longer
>> relevant according to a user-provided time based join predicate (e.g. WHEN
>> ABS(leftElement.timestamp - rightElement.timestamp) < 5 minutes).
>>
>> I'll think a bit more on how to use MapState instead if having a remove()
>> like method for a single element isn't an option.
>>
>> On Tue, Sep 15, 2020 at 8:52 PM Reuven Lax <re...@google.com> wrote:
>>
>>> Hi,
>>>
>>> Currently we only support removing a timestamp range. You can remove a
>>> single timestamp of course by removing [ts, ts+1), however if there are
>>> multiple elements with the same timestamp this will remove all of those
>>> elements.
>>>
>>> Does this fit your use case? If not, I wonder if MapState is closer to
>>> what you are looking for?
>>>
>>> Reuven
>>>
>>> On Tue, Sep 15, 2020 at 2:33 PM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> Hi Reuven,
>>>>
>>>> I noticed that there was an implementation of the in-memory
>>>> OrderedListState introduced [1]. Where can I find out more regarding the
>>>> plan and design? Is there a design doc? I'd like to know more details about
>>>> the implementation to see if it fits my use case. I was hoping it would
>>>> have a remove(TimestampedValue<T> e) method.
>>>>
>>>> Thanks,
>>>> -Tyson
>>>>
>>>>
>>>> [1]:
>>>> https://github.com/apache/beam/commit/9d0d0b0c4506b288164b155c5ce3a23d76db3c41
>>>>
>>>>
>>>> On 2020/08/03 21:41:46, Catlyn Kong <ca...@yelp.com> wrote:
>>>> > Hey folks,
>>>> >
>>>> > Sry I'm late to this thread but this might be very helpful for the
>>>> problem
>>>> > we're dealing with. Do we have a design doc or a jira ticket I can
>>>> follow?
>>>> >
>>>> > Cheers,
>>>> > Catlyn
>>>> >
>>>> > On Thu, Jun 18, 2020 at 1:11 PM Jan Lukavský <je...@seznam.cz> wrote:
>>>> >
>>>> > > My questions were just an example. I fully agree there is a
>>>> fundamental
>>>> > > need for a sorted state (of some form, and I also think this links
>>>> to
>>>> > > efficient implementation of retrations) - I was reacting to Kenn's
>>>> question
>>>> > > about BIP. This one would be pretty nice example why it would be
>>>> good to
>>>> > > have such a "process" - not everything can be solved on ML and
>>>> there are
>>>> > > fundamental decisions that might need a closer attention.
>>>> > > On 6/18/20 5:28 PM, Reuven Lax wrote:
>>>> > >
>>>> > > Jan - my proposal is exactly TimeSortedBagState (more accurately -
>>>> > > TimeSortedListState), though I went a bit further and also proposed
>>>> a way
>>>> > > to have a dynamic number of tagged TimeSortedBagStates.
>>>> > >
>>>> > > You are correct that the runner doesn't really have to store the
>>>> data time
>>>> > > sorted - what's actually needed is the ability to fetch and remove
>>>> > > timestamp ranges of data (though that does include fetching the
>>>> entire
>>>> > > list); TimeOrderedState is probably a more accurate name then
>>>> > > TimeSortedState. I don't think we could get away with operations
>>>> that only
>>>> > > act on the smallest timestamp, however we could limit the API to
>>>> only being
>>>> > > able to fetch and remove prefixes of data (ordered by timestamp).
>>>> However
>>>> > > if we support prefixes, we might as well support arbitrary
>>>> subranges.
>>>> > >
>>>> > > On Thu, Jun 18, 2020 at 7:26 AM Jan Lukavský <je...@seznam.cz>
>>>> wrote:
>>>> > >
>>>> > >> Big +1 for a BIP, as this might really help clarify all the pros
>>>> and cons
>>>> > >> of all possibilities. There seem to be questions that need
>>>> answering and
>>>> > >> motivating use cases - do we need sorted map state or can we solve
>>>> our use
>>>> > >> cases by something simpler - e.g. the mentioned
>>>> TimeSortedBagState? Does
>>>> > >> that really have to be time-sorted structure, or does it "only"
>>>> have to
>>>> > >> have operations that can efficiently find and remove element with
>>>> smallest
>>>> > >> timestamp (like a PriorityQueue)?
>>>> > >>
>>>> > >> Jan
>>>> > >> On 6/18/20 5:32 AM, Kenneth Knowles wrote:
>>>> > >>
>>>> > >> Zooming in from generic philosophy to be clear: adding time ordered
>>>> > >> buffer to the Fn state API is *not* a shortcut.It has benefits
>>>> that will
>>>> > >> not be achieved by SDK-side implementation on top of either
>>>> ordered or
>>>> > >> unordered multimap. Are those benefits worth expanding the API? I
>>>> don't
>>>> > >> know.
>>>> > >>
>>>> > >> A change to allow a runner to have a specialized implementation for
>>>> > >> time-buffered state would be one or more StateKey types, right?
>>>> Reuven,
>>>> > >> maybe put this and your Java API in a doc? A BIP? Seems like
>>>> there's at
>>>> > >> least the following to explore:
>>>> > >>
>>>> > >>  - how that Java API would map to an SDK-side implementation on
>>>> top of
>>>> > >> multimap state key
>>>> > >>  - how that Java API would map to a new StateKey
>>>> > >>  - whether there's actually more than one relevant implementation
>>>> of that
>>>> > >> StateKey
>>>> > >>  - whether SDK-side implementation on some other state key would be
>>>> > >> performant enough in all SDK languages (present and future)
>>>> > >>
>>>> > >> Zooming back out to generic philosophy: Proliferation of StateKey
>>>> > >> types tuned by runners (which can very easily still share
>>>> implementation)
>>>> > >> is probably better than proliferation of complex SDK-side
>>>> implementations
>>>> > >> with varying completeness and performance.
>>>> > >>
>>>> > >> Kenn
>>>> > >>
>>>> > >> On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <re...@google.com>
>>>> wrote:
>>>> > >>
>>>> > >>> It might help for me to describe what I have in mind. I'm still
>>>> > >>> proposing that we build multimap, just not a globally-sorted
>>>> multimap.
>>>> > >>>
>>>> > >>> My previous proposal was that we provide a Multimap<Key, Value>
>>>> state
>>>> > >>> type, sorted by key. this would have two additional operations -
>>>> > >>> multimap.getRange(startKey, endKey) and
>>>> multimap.deleteRange(startKey,
>>>> > >>> endKey). The primary use case was timestamp sorting, but I felt
>>>> that a
>>>> > >>> sorted multimap provided a nice generalization - after all, you
>>>> can simply
>>>> > >>> key the multimap by timestamp to get timestamp sorting.
>>>> > >>>
>>>> > >>> This approach had some issues immediately that would take some
>>>> work to
>>>> > >>> solve. Since a multimap key can have any type and a runner will
>>>> only be
>>>> > >>> able to sort by encoded type, we would need to introduce a
>>>> concept of
>>>> > >>> order-preserving coders into Beam and plumb that through. Robert
>>>> pointed
>>>> > >>> out that even our existing standard coders for simple integral
>>>> types don't
>>>> > >>> preserve order, so there will likely be surprises here.
>>>> > >>>
>>>> > >>> My current proposal is for a multimap that is not sorted by key,
>>>> but
>>>> > >>> that can support.ordered values for a single key. Remember that a
>>>> multimap
>>>> > >>> maps K -> Iterable<V>, so this means that each individual
>>>> Iterable<V> is
>>>> > >>> ordered, but the keys have no specific order relative to each
>>>> other. This
>>>> > >>> is not too different from many multimap implementations where the
>>>> keys are
>>>> > >>> unordered, but the list of values for a single key at least has a
>>>> stable
>>>> > >>> order.
>>>> > >>>
>>>> > >>> The interface would look like this:
>>>> > >>>
>>>> > >>> public interface MultimapState<K, V> extends State {
>>>> > >>>   // Add a value with a default timestamp.
>>>> > >>>   void put(K key, V value);
>>>> > >>>
>>>> > >>>   // Add a timestamped value.
>>>> > >>>   void put(K, key, TimestampedValue<V> value);
>>>> > >>>
>>>> > >>>   // Remove all values for a key.
>>>> > >>>   void remove (K key);
>>>> > >>>
>>>> > >>>   // Remove all values for a key with timestamps within the
>>>> specified
>>>> > >>> range.
>>>> > >>>   void removeRange(K key, Instant startTs, Instant endTs);
>>>> > >>>
>>>> > >>>   // Get an Iterable of values for V. The Iterable will be
>>>> returned
>>>> > >>> sorted by timestamp.
>>>> > >>>   ReadableState<Iterable<TimestampedValue<V>>> get(K key);
>>>> > >>>
>>>> > >>>   // Get an Iterable of values for V in the specified range. The
>>>> > >>> Iterable will be returned sorted by timestamp.
>>>> > >>>   ReadableState<Iterable<TimestampedValue<V>>> getRange(K key,
>>>> Instant
>>>> > >>> startTs, Instant endTs);
>>>> > >>>
>>>> > >>>   ReadableState<Iterable<K>> keys();
>>>> > >>>   ReadableState<Iterable<TimestampedValue<V>>> values();
>>>> > >>>   ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>>
>>>> entries;
>>>> > >>> }
>>>> > >>>
>>>> > >>> We can of course provide helper functions that allow using
>>>> MultimapState
>>>> > >>> without deailing with TimestampValue for users who only want a
>>>> multimap and
>>>> > >>> don't want sorting.
>>>> > >>>
>>>> > >>> I think many users will only need a single sorted list - not a
>>>> full
>>>> > >>> multimap. It's worth offering this as well, and we can simply
>>>> build it on
>>>> > >>> top of MultimapState. It will look like an extension of BagState
>>>> > >>>
>>>> > >>> public interface TimestampSortedListState<T> extends State {
>>>> > >>>   void add(TimestampedValue<T> value);
>>>> > >>>   Iterable<TimestampedValue<T>> read();
>>>> > >>>   Iterable<TimestampedValue<T>> readRange(Instant startTs, Instant
>>>> > >>> endTs);
>>>> > >>>   void clearRange(Instant startTs, Instant endTs);
>>>> > >>> }
>>>> > >>>
>>>> > >>>
>>>> > >>> On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lc...@google.com>
>>>> wrote:
>>>> > >>>
>>>> > >>>> The portability layer is meant to live across multiple versions
>>>> of Beam
>>>> > >>>> and I don't think it should be treated by doing the simple and
>>>> useful thing
>>>> > >>>> now since I believe it will lead to a proliferation of the API.
>>>> > >>>>
>>>> > >>>> On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles <kenn@apache.org
>>>> >
>>>> > >>>> wrote:
>>>> > >>>>
>>>> > >>>>> I have thoughts on the subject of whether to have APIs just for
>>>> the
>>>> > >>>>> lowest-level building blocks versus having APIs for higher-level
>>>> > >>>>> constructs. Specifically this applies to providing only
>>>> unsorted multimap
>>>> > >>>>> vs what I will call "time-ordered buffer". TL;DR: I'd vote to
>>>> focus on
>>>> > >>>>> time-ordered buffer; if it turns out to be easy to go all the
>>>> way to sorted
>>>> > >>>>> multimap that's nice-to-have; if it turns out to be easy to
>>>> implement on
>>>> > >>>>> top of unsorted map state that should probably be under the hood
>>>> > >>>>>
>>>> > >>>>> Reasons to build low-level multimap in the runner & fn api and
>>>> layer
>>>> > >>>>> higher-level things in the SDK:
>>>> > >>>>>
>>>> > >>>>>  - It is less implementation for runners if they have to only
>>>> provide
>>>> > >>>>> fewer lower-level building blocks like multimap state.
>>>> > >>>>>  - There are many more runners than SDKs (and will be even more
>>>> and
>>>> > >>>>> more) so this saves overall.
>>>> > >>>>>
>>>> > >>>>> Reasons to build higher-level constructs directly in the runner
>>>> and fn
>>>> > >>>>> api:
>>>> > >>>>>
>>>> > >>>>>  - Having multiple higher-level state types may actually be less
>>>> > >>>>> implementation than one complex state type, especially if they
>>>> map to
>>>> > >>>>> runner primitives.
>>>> > >>>>>  - The runner may have better specialized implementations,
>>>> especially
>>>> > >>>>> for something like a time-ordered buffer.
>>>> > >>>>>  - The particular access patterns in an SDK-based
>>>> implementation may
>>>> > >>>>> not be ideal for each runner's underlying implementation of the
>>>> low-level
>>>> > >>>>> building block.
>>>> > >>>>>  - There may be excessive gRPC overhead even for optimal access
>>>> > >>>>> patterns.
>>>> > >>>>>
>>>> > >>>>> There are ways to have best of both worlds, like:
>>>> > >>>>>
>>>> > >>>>> 1. Define multiple state types according to fundamental access
>>>> > >>>>> patterns, like we did this before portability.
>>>> > >>>>> 2. If it is easy to layer one on top of the other, do that
>>>> inside the
>>>> > >>>>> runner. Provide shared code so for runners providing the
>>>> lowest-level
>>>> > >>>>> primitive they get all the types for free.
>>>> > >>>>>
>>>> > >>>>> I understand that this is an oversimplification. It still
>>>> creates some
>>>> > >>>>> more work. And APIs are a burden so it is good to introduce as
>>>> few as
>>>> > >>>>> possible for maintenance. But it has performance benefits and
>>>> also unblocks
>>>> > >>>>> "just doing the simple and useful thing now" which I always
>>>> like to do as
>>>> > >>>>> long as it is compatible with future changes. If the APIs are
>>>> fundamental,
>>>> > >>>>> like sets, maps, timestamp ordering, then it is safe to guess
>>>> that they
>>>> > >>>>> will change rarely and be useful forever.
>>>> > >>>>>
>>>> > >>>>> Kenn
>>>> > >>>>>
>>>> > >>>>> On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com>
>>>> wrote:
>>>> > >>>>>
>>>> > >>>>>> I would be glad to take a stab at how to provide sorting on
>>>> top of
>>>> > >>>>>> unsorted multimap state.
>>>> > >>>>>> Based upon your description, you want integer keys representing
>>>> > >>>>>> timestamps and arbitrary user value for the values, is that
>>>> correct?
>>>> > >>>>>> What kinds of operations do you need on the sorted map state
>>>> in order
>>>> > >>>>>> of efficiency requirements?
>>>> > >>>>>> (e.g. Next(x), Previous(x), GetAll(Range[x, y)),
>>>> ClearAll(Range[x, y))
>>>> > >>>>>> What kinds of operations do we expect the underlying unsorted
>>>> map
>>>> > >>>>>> state to be able to provide?
>>>> > >>>>>> (at a minimum Get(K), Append(K), Clear(K) but what else e.g.
>>>> > >>>>>> enumerate(K)?)
>>>> > >>>>>>
>>>> > >>>>>> I went through a similar exercise of how to provide a list
>>>> like side
>>>> > >>>>>> input view over a multimap[1] side input which efficiently
>>>> allowed
>>>> > >>>>>> computation of size and provided random access while only
>>>> having access to
>>>> > >>>>>> get(K) and enumerate K's.
>>>> > >>>>>>
>>>> > >>>>>> 1:
>>>> > >>>>>>
>>>> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>>>> > >>>>>>
>>>> > >>>>>> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com>
>>>> wrote:
>>>> > >>>>>>
>>>> > >>>>>>> Bringing this subject up again,
>>>> > >>>>>>>
>>>> > >>>>>>> I've spent some time looking into implementing this for the
>>>> Dataflow
>>>> > >>>>>>> runner. I'm unable to find a way to implement the arbitrary
>>>> sorted multimap
>>>> > >>>>>>> efficiently for the case where there are large numbers of
>>>> unique keys.
>>>> > >>>>>>> Since the primary driving use case is timestamp ordering
>>>> (i.e. key is event
>>>> > >>>>>>> timestamp), you would expect to have nearly a new key per
>>>> element. I
>>>> > >>>>>>> considered Luke's suggestion above, but unfortunately it
>>>> doesn't really
>>>> > >>>>>>> solve this issue.
>>>> > >>>>>>>
>>>> > >>>>>>> The primary use case for sorting always seems to be sorting by
>>>> > >>>>>>> timestamp. I want to propose that instead of building the
>>>> fully-general
>>>> > >>>>>>> sorted multimap, we instead focus on a state type where the
>>>> sort key is an
>>>> > >>>>>>> integral type (like a timestamp or an integer). There is
>>>> still a valid use
>>>> > >>>>>>> case for multimap, but we can provide that as an unordered
>>>> state. At least
>>>> > >>>>>>> for Dataflow, it will be much easier
>>>> > >>>>>>>
>>>> > >>>>>>> While my difficulties here may be specific to the Dataflow
>>>> runner,
>>>> > >>>>>>> any such support would have to be built into other runners as
>>>> well, and
>>>> > >>>>>>> limiting to integral sorting likely makes it easier for other
>>>> runners to
>>>> > >>>>>>> implement this. Also, if you look at this
>>>> > >>>>>>> <
>>>> https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95>
>>>> Flink
>>>> > >>>>>>> comment pointed out by Aljoscha, for Flink the main use case
>>>> identified was
>>>> > >>>>>>> also timestamp sorting. This will also simplify the API
>>>> design for this
>>>> > >>>>>>> feature: Sorted multimap with arbitrary keys would require us
>>>> to introduce
>>>> > >>>>>>> a way of mapping natural ordering to encoded ordering (i.e. a
>>>> new
>>>> > >>>>>>> OrderPreservingCoder), but if we limit sort keys to integral
>>>> types, the API
>>>> > >>>>>>> design is simpler as integral types can be represented
>>>> directly.
>>>> > >>>>>>>
>>>> > >>>>>>> Reuven
>>>> > >>>>>>>
>>>> > >>>>>>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com>
>>>> wrote:
>>>> > >>>>>>>
>>>> > >>>>>>>> This sounds to me like a potential runner strategy. However
>>>> if a
>>>> > >>>>>>>> runner can natively support sorted maps (e.g. we expect the
>>>> Dataflow runner
>>>> > >>>>>>>> to be able to do so, and I think it would be useful for
>>>> other runners as
>>>> > >>>>>>>> well), then it's probably preferable to allow the runner to
>>>> use its native
>>>> > >>>>>>>> capabilities.
>>>> > >>>>>>>>
>>>> > >>>>>>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <
>>>> lcwik@google.com>
>>>> > >>>>>>>> wrote:
>>>> > >>>>>>>>
>>>> > >>>>>>>>> For the API that you proposed, the map key is always "void"
>>>> and
>>>> > >>>>>>>>> the sort key == user key. So in my example of
>>>> > >>>>>>>>> key: dummy value
>>>> > >>>>>>>>> key.000: token, (0001, value4)
>>>> > >>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>> > >>>>>>>>> key.01: token
>>>> > >>>>>>>>> key.1: token, (1011, value3)
>>>> > >>>>>>>>> you would have:
>>>> > >>>>>>>>> "void": dummy value
>>>> > >>>>>>>>> "void".000: token, (0001, value4)
>>>> > >>>>>>>>> "void".001: token, (0010, value1), (0011, value2)
>>>> > >>>>>>>>> "void".01: token
>>>> > >>>>>>>>> "void".1: token, (1011, value3)
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into
>>>> walking
>>>> > >>>>>>>>> the the prefixes until you find a common prefix for K and
>>>> then filter for
>>>> > >>>>>>>>> values where they have a sort key <= K. Using the example
>>>> above, to find
>>>> > >>>>>>>>> entriesUntil(0010) you would:
>>>> > >>>>>>>>> look for key."", miss
>>>> > >>>>>>>>> look for key.0, miss
>>>> > >>>>>>>>> look for key.00, miss
>>>> > >>>>>>>>> look for key.000, hit, sort all contained values using
>>>> secondary
>>>> > >>>>>>>>> key, provide value4 to user
>>>> > >>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010
>>>> so we
>>>> > >>>>>>>>> sort all contained values using secondary key, filter out
>>>> value2 and
>>>> > >>>>>>>>> provide value1
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> void removeUntil(K limit) also translates into walking the
>>>> > >>>>>>>>> prefixes but instead we will clear them when we have a
>>>> "hit" with some
>>>> > >>>>>>>>> special logic for when the sort key is a prefix of the key.
>>>> Used the
>>>> > >>>>>>>>> example, to removeUntil(0010) you would:
>>>> > >>>>>>>>> look for key."", miss
>>>> > >>>>>>>>> look for key.0, miss
>>>> > >>>>>>>>> look for key.00, miss
>>>> > >>>>>>>>> look for key.000, hit, clear
>>>> > >>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010
>>>> so we
>>>> > >>>>>>>>> sort all contained values using secondary key, store in
>>>> memory all values
>>>> > >>>>>>>>> that > 0010, clear and append values stored in memory.
>>>> > >>>>>>>>>
>>>> > >>>>>>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <
>>>> relax@google.com>
>>>> > >>>>>>>>> wrote:
>>>> > >>>>>>>>>
>>>> > >>>>>>>>>> Can you explain how fetching and deleting ranges of keys
>>>> would
>>>> > >>>>>>>>>> work with this data structure?
>>>> > >>>>>>>>>>
>>>> > >>>>>>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <
>>>> lcwik@google.com>
>>>> > >>>>>>>>>> wrote:
>>>> > >>>>>>>>>>
>>>> > >>>>>>>>>>> Reuven, for the example, I assume that we never want to
>>>> store
>>>> > >>>>>>>>>>> more then 2 values at a given sort key prefix, and if we
>>>> do then we will
>>>> > >>>>>>>>>>> create a new longer prefix splitting up the values based
>>>> upon the sort key.
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>> Tuple representation in examples below is (key, sort key,
>>>> value)
>>>> > >>>>>>>>>>> and . is a character outside of the alphabet which can be
>>>> represented by
>>>> > >>>>>>>>>>> using an escaping encoding that wraps the key + sort key
>>>> encoding.
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>> To insert (key, 0010, value1), we lookup "key" + all the
>>>> > >>>>>>>>>>> prefixes of 0010 finding one that is not empty. In this
>>>> case its 0, so we
>>>> > >>>>>>>>>>> append value to the map at key.0 ending up with (we also
>>>> set the key to any
>>>> > >>>>>>>>>>> dummy value to know that it it contains values):
>>>> > >>>>>>>>>>> key: dummy value
>>>> > >>>>>>>>>>> key."": token, (0010, value1)
>>>> > >>>>>>>>>>> Now we insert (key, 0011, value2), we again lookup "key"
>>>> + all
>>>> > >>>>>>>>>>> the prefixes of 0010, finding "", so we append value2 to
>>>> key."" ending up
>>>> > >>>>>>>>>>> with:
>>>> > >>>>>>>>>>> key: dummy value
>>>> > >>>>>>>>>>> key."": token, (0010, value1), (0011, value2)
>>>> > >>>>>>>>>>> Now we insert (key, 1011, value3), we again lookup "key"
>>>> + all
>>>> > >>>>>>>>>>> the prefixes of 1011 finding "" but notice that it is
>>>> full, so we partition
>>>> > >>>>>>>>>>> all the values into two prefixes 0 and 1. We also clear
>>>> the "" prefix
>>>> > >>>>>>>>>>> ending up with:
>>>> > >>>>>>>>>>> key: dummy value
>>>> > >>>>>>>>>>> key.0: token, (0010, value1), (0011, value2)
>>>> > >>>>>>>>>>> key.1: token, (1011, value3)
>>>> > >>>>>>>>>>> Now we insert (key, 0001, value4), we again lookup "key"
>>>> + all
>>>> > >>>>>>>>>>> the prefixes of the value finding 0 but notice that it is
>>>> full, so we
>>>> > >>>>>>>>>>> partition all the values into two prefixes 00 and 01 but
>>>> notice this
>>>> > >>>>>>>>>>> doesn't help us since 00 will be too full so we split 00
>>>> again to 000, 001.
>>>> > >>>>>>>>>>> We also clear the 0 prefix ending up with:
>>>> > >>>>>>>>>>> key: dummy value
>>>> > >>>>>>>>>>> key.000: token, (0001, value4)
>>>> > >>>>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>> > >>>>>>>>>>> key.01: token
>>>> > >>>>>>>>>>> key.1: token, (1011, value3)
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>> We are effectively building a trie[1] where we only have
>>>> values
>>>> > >>>>>>>>>>> at the leaves and control how full each leaf can be.
>>>> There are other trie
>>>> > >>>>>>>>>>> representations like a radix tree that may be better.
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>> Looking up the values in sorted order for "key" would go
>>>> like
>>>> > >>>>>>>>>>> this:
>>>> > >>>>>>>>>>> Is key set, yes
>>>> > >>>>>>>>>>> look for key."", miss
>>>> > >>>>>>>>>>> look for key.0, miss
>>>> > >>>>>>>>>>> look for key.00, miss
>>>> > >>>>>>>>>>> look for key.000, hit, sort all contained values using
>>>> secondary
>>>> > >>>>>>>>>>> key, provide value4 to user
>>>> > >>>>>>>>>>> look for key.001, hit, sort all contained values using
>>>> secondary
>>>> > >>>>>>>>>>> key, provide value1 followed by value2 to user
>>>> > >>>>>>>>>>> look for key.01, hit, empty, return no values to user
>>>> > >>>>>>>>>>> look for key.1, hit, sort all contained values using
>>>> secondary
>>>> > >>>>>>>>>>> key, provide value3 to user
>>>> > >>>>>>>>>>> we have walked the entire prefix space, signal end of
>>>> iterable
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>> Some notes for the above:
>>>> > >>>>>>>>>>> * The dummy value is used to know that the key contains
>>>> values
>>>> > >>>>>>>>>>> and the token is to know whether there are any values
>>>> deeper in the trie so
>>>> > >>>>>>>>>>> when we know when to stop searching.
>>>> > >>>>>>>>>>> * If we can recalculate the sort key from the combination
>>>> of the
>>>> > >>>>>>>>>>> key and value, then we don't need to store it.
>>>> > >>>>>>>>>>> * Keys with lots of values will perform worse then keys
>>>> with
>>>> > >>>>>>>>>>> less values since we have to look up more keys but they
>>>> will be empty
>>>> > >>>>>>>>>>> reads. The number of misses can be controlled by how many
>>>> elements we are
>>>> > >>>>>>>>>>> willing to store at a given node before we subdivide.
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>> In reality you could build a lot of structures (e.g. red
>>>> black
>>>> > >>>>>>>>>>> tree, binary tree) using the sort key, the issue is the
>>>> cost of
>>>> > >>>>>>>>>>> rebalancing/re-organizing the structure in map form and
>>>> whether it has a
>>>> > >>>>>>>>>>> convenient pre-order traversal for lookups.
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <
>>>> relax@google.com>
>>>> > >>>>>>>>>>> wrote:
>>>> > >>>>>>>>>>>
>>>> > >>>>>>>>>>>> Some great comments!
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> *Aljoscha*: absolutely this would have to be implemented
>>>> by
>>>> > >>>>>>>>>>>> runners to be efficient. We can of course provide a
>>>> default (inefficient)
>>>> > >>>>>>>>>>>> implementation, but ideally runners would provide better
>>>> ones.
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed
>>>> by
>>>> > >>>>>>>>>>>> this. E.g.
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> *Robert* Great point about standard coders not satisfying
>>>> > >>>>>>>>>>>> this. That's why I suggested that we provide a way to
>>>> tag the coders that
>>>> > >>>>>>>>>>>> do preserve order, and only accept those as key coders
>>>> Alternatively we
>>>> > >>>>>>>>>>>> could present a more limited API - e.g. only allowing a
>>>> hard-coded set of
>>>> > >>>>>>>>>>>> types to be used as keys - but that seems counter to the
>>>> direction Beam
>>>> > >>>>>>>>>>>> usually goes. So users will have two ways .of creating
>>>> multimap state specs:
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>>
>>>> state =
>>>> > >>>>>>>>>>>> StateSpecs.multimap(VarLongCoder.of(),
>>>> StringUtf8Coder.of());
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> or
>>>> > >>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>>
>>>> state =
>>>> > >>>>>>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(),
>>>> StringUtf8Coder.of());
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> The second one will validate that the key coder preserves
>>>> > >>>>>>>>>>>> order, and fails otherwise (similar to coder determinism
>>>> checking in
>>>> > >>>>>>>>>>>> GroupByKey). (BTW we would also have versions of these
>>>> functions that use
>>>> > >>>>>>>>>>>> coder inference to "guess" the coder, but those will do
>>>> the same checking)
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> Also the API I proposed did support random access! We
>>>> could
>>>> > >>>>>>>>>>>> separate out OrderedBagState again if we think the use
>>>> cases are
>>>> > >>>>>>>>>>>> fundamentally different. I merged the proposal into that
>>>> of MultimapState
>>>> > >>>>>>>>>>>> because there seemed be 99% overlap.
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> Reuven
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <
>>>> > >>>>>>>>>>>> robertwb@google.com> wrote:
>>>> > >>>>>>>>>>>>
>>>> > >>>>>>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <
>>>> relax@google.com>
>>>> > >>>>>>>>>>>>> wrote:
>>>> > >>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <
>>>> > >>>>>>>>>>>>> altay@google.com> wrote:
>>>> > >>>>>>>>>>>>> >>
>>>> > >>>>>>>>>>>>> >>
>>>> > >>>>>>>>>>>>> >>
>>>> > >>>>>>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <
>>>> > >>>>>>>>>>>>> lcwik@google.com> wrote:
>>>> > >>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <
>>>> > >>>>>>>>>>>>> ruwang@google.com> wrote:
>>>> > >>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>> >>>>> A few obvious problems with this code:
>>>> > >>>>>>>>>>>>> >>>>>   1. Removing the elements already processed from
>>>> the
>>>> > >>>>>>>>>>>>> bag requires clearing and rewriting the entire bag.
>>>> This is O(n^2) in the
>>>> > >>>>>>>>>>>>> number of input trades.
>>>> > >>>>>>>>>>>>> >>>>
>>>> > >>>>>>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting
>>>> trade
>>>> > >>>>>>>>>>>>> state?
>>>> > >>>>>>>>>>>>> >>>>
>>>> > >>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>> >>>>> public interface SortedMultimapState<K, V>
>>>> extends State
>>>> > >>>>>>>>>>>>> {
>>>> > >>>>>>>>>>>>> >>>>>   // Add a value to the map.
>>>> > >>>>>>>>>>>>> >>>>>   void put(K key, V value);
>>>> > >>>>>>>>>>>>> >>>>>   // Get all values for a given key.
>>>> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>>>> > >>>>>>>>>>>>> >>>>>  // Return all entries in the map.
>>>> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>>>> > >>>>>>>>>>>>> >>>>>   // Return all entries in the map with keys <=
>>>> limit.
>>>> > >>>>>>>>>>>>> returned elements are sorted by the key.
>>>> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K
>>>> > >>>>>>>>>>>>> limit);
>>>> > >>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>> >>>>>  // Remove all values with the given key;
>>>> > >>>>>>>>>>>>> >>>>>   void remove(K key);
>>>> > >>>>>>>>>>>>> >>>>>  // Remove all entries in the map with keys <=
>>>> limit.
>>>> > >>>>>>>>>>>>> >>>>>   void removeUntil(K limit);
>>>> > >>>>>>>>>>>>> >>>>
>>>> > >>>>>>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will
>>>> remove
>>>> > >>>>>>>>>>>>> all entries in the map with keys < limit.
>>>> > >>>>>>>>>>>>> >>>>
>>>> > >>>>>>>>>>>>> >>>>>
>>>> > >>>>>>>>>>>>> >>>>> Runners will sort based on the encoded value of
>>>> the key.
>>>> > >>>>>>>>>>>>> In order to make this easier for users, I propose that
>>>> we introduce a new
>>>> > >>>>>>>>>>>>> tag on Coders PreservesOrder. A Coder that contains
>>>> this tag guarantees
>>>> > >>>>>>>>>>>>> that the encoded value preserves the same ordering as
>>>> the base Java type.
>>>> > >>>>>>>>>>>>> >>>>
>>>> > >>>>>>>>>>>>> >>>>
>>>> > >>>>>>>>>>>>> >>>> Could you clarify what is  "encoded value
>>>> preserves the
>>>> > >>>>>>>>>>>>> same ordering as the base Java type"?
>>>> > >>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>> >>>
>>>> > >>>>>>>>>>>>> >>> Lets say A and B represent two different instances
>>>> of the
>>>> > >>>>>>>>>>>>> same Java type like a double, then A < B (using the
>>>> languages comparison
>>>> > >>>>>>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded
>>>> versions are compared
>>>> > >>>>>>>>>>>>> lexicographically)
>>>> > >>>>>>>>>>>>> >>
>>>> > >>>>>>>>>>>>> >>
>>>> > >>>>>>>>>>>>> >> Since coders are shared across SDKs, do we expect A
>>>> < B iff
>>>> > >>>>>>>>>>>>> e(A) < e(P) property to hold for all languages we
>>>> support? What happens A,
>>>> > >>>>>>>>>>>>> B sort differently in different languages?
>>>> > >>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>> >
>>>> > >>>>>>>>>>>>> > That would have to be the property of the coder
>>>> (which means
>>>> > >>>>>>>>>>>>> that this property probably needs to be represented in
>>>> the portability
>>>> > >>>>>>>>>>>>> representation of the coder). I imagine the common use
>>>> cases will be for
>>>> > >>>>>>>>>>>>> simple coders like int, long, string, etc., which are
>>>> likely to sort the
>>>> > >>>>>>>>>>>>> same in most languages.
>>>> > >>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>> The standard coders for both double and integral types
>>>> do not
>>>> > >>>>>>>>>>>>> respect
>>>> > >>>>>>>>>>>>> the natural ordering (consider negative values). KV
>>>> coders
>>>> > >>>>>>>>>>>>> violate the
>>>> > >>>>>>>>>>>>> "natural" lexicographic ordering on components as well.
>>>> I think
>>>> > >>>>>>>>>>>>> implicitly sorting on encoded value would yield many
>>>> > >>>>>>>>>>>>> surprises. (The
>>>> > >>>>>>>>>>>>> state, of course, could take a order-preserving, bytes
>>>> > >>>>>>>>>>>>> (string?)-producing callable as a parameter of course).
>>>> (As for
>>>> > >>>>>>>>>>>>> naming, I'd probably call this OrderedBagState or
>>>> something
>>>> > >>>>>>>>>>>>> like
>>>> > >>>>>>>>>>>>> that...rather than Map which tends to imply random
>>>> access.)
>>>> > >>>>>>>>>>>>>
>>>> > >>>>>>>>>>>>
>>>> >
>>>>
>>>

Re: [External] Re: DISCUSS: Sorted MapState API

Posted by Kenneth Knowles <ke...@apache.org>.

On Wed, Sep 16, 2020 at 8:48 AM Tyson Hamilton <ty...@google.com> wrote:

> The use case is to support an unbounded stream-stream join, where the
> elements are arriving in roughly time sorted order. Removing a specific
> element from the timestamp indexed collection is necessary when a match is
> found.
>

Just checking - this is an optimization when you already know that the join
is 1:1?

Kenn


> Having clearRange is helpful to expire elements that are no longer
> relevant according to a user-provided time based join predicate (e.g. WHEN
> ABS(leftElement.timestamp - rightElement.timestamp) < 5 minutes).
>
> I'll think a bit more on how to use MapState instead if having a remove()
> like method for a single element isn't an option.
>
> On Tue, Sep 15, 2020 at 8:52 PM Reuven Lax <re...@google.com> wrote:
>
>> Hi,
>>
>> Currently we only support removing a timestamp range. You can remove a
>> single timestamp of course by removing [ts, ts+1), however if there are
>> multiple elements with the same timestamp this will remove all of those
>> elements.
>>
>> Does this fit your use case? If not, I wonder if MapState is closer to
>> what you are looking for?
>>
>> Reuven
>>
>> On Tue, Sep 15, 2020 at 2:33 PM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> Hi Reuven,
>>>
>>> I noticed that there was an implementation of the in-memory
>>> OrderedListState introduced [1]. Where can I find out more regarding the
>>> plan and design? Is there a design doc? I'd like to know more details about
>>> the implementation to see if it fits my use case. I was hoping it would
>>> have a remove(TimestampedValue<T> e) method.
>>>
>>> Thanks,
>>> -Tyson
>>>
>>>
>>> [1]:
>>> https://github.com/apache/beam/commit/9d0d0b0c4506b288164b155c5ce3a23d76db3c41
>>>
>>>
>>> On 2020/08/03 21:41:46, Catlyn Kong <ca...@yelp.com> wrote:
>>> > Hey folks,
>>> >
>>> > Sry I'm late to this thread but this might be very helpful for the
>>> problem
>>> > we're dealing with. Do we have a design doc or a jira ticket I can
>>> follow?
>>> >
>>> > Cheers,
>>> > Catlyn
>>> >
>>> > On Thu, Jun 18, 2020 at 1:11 PM Jan Lukavský <je...@seznam.cz> wrote:
>>> >
>>> > > My questions were just an example. I fully agree there is a
>>> fundamental
>>> > > need for a sorted state (of some form, and I also think this links to
>>> > > efficient implementation of retrations) - I was reacting to Kenn's
>>> question
>>> > > about BIP. This one would be pretty nice example why it would be
>>> good to
>>> > > have such a "process" - not everything can be solved on ML and there
>>> are
>>> > > fundamental decisions that might need a closer attention.
>>> > > On 6/18/20 5:28 PM, Reuven Lax wrote:
>>> > >
>>> > > Jan - my proposal is exactly TimeSortedBagState (more accurately -
>>> > > TimeSortedListState), though I went a bit further and also proposed
>>> a way
>>> > > to have a dynamic number of tagged TimeSortedBagStates.
>>> > >
>>> > > You are correct that the runner doesn't really have to store the
>>> data time
>>> > > sorted - what's actually needed is the ability to fetch and remove
>>> > > timestamp ranges of data (though that does include fetching the
>>> entire
>>> > > list); TimeOrderedState is probably a more accurate name then
>>> > > TimeSortedState. I don't think we could get away with operations
>>> that only
>>> > > act on the smallest timestamp, however we could limit the API to
>>> only being
>>> > > able to fetch and remove prefixes of data (ordered by timestamp).
>>> However
>>> > > if we support prefixes, we might as well support arbitrary subranges.
>>> > >
>>> > > On Thu, Jun 18, 2020 at 7:26 AM Jan Lukavský <je...@seznam.cz>
>>> wrote:
>>> > >
>>> > >> Big +1 for a BIP, as this might really help clarify all the pros
>>> and cons
>>> > >> of all possibilities. There seem to be questions that need
>>> answering and
>>> > >> motivating use cases - do we need sorted map state or can we solve
>>> our use
>>> > >> cases by something simpler - e.g. the mentioned TimeSortedBagState?
>>> Does
>>> > >> that really have to be time-sorted structure, or does it "only"
>>> have to
>>> > >> have operations that can efficiently find and remove element with
>>> smallest
>>> > >> timestamp (like a PriorityQueue)?
>>> > >>
>>> > >> Jan
>>> > >> On 6/18/20 5:32 AM, Kenneth Knowles wrote:
>>> > >>
>>> > >> Zooming in from generic philosophy to be clear: adding time ordered
>>> > >> buffer to the Fn state API is *not* a shortcut.It has benefits that
>>> will
>>> > >> not be achieved by SDK-side implementation on top of either ordered
>>> or
>>> > >> unordered multimap. Are those benefits worth expanding the API? I
>>> don't
>>> > >> know.
>>> > >>
>>> > >> A change to allow a runner to have a specialized implementation for
>>> > >> time-buffered state would be one or more StateKey types, right?
>>> Reuven,
>>> > >> maybe put this and your Java API in a doc? A BIP? Seems like
>>> there's at
>>> > >> least the following to explore:
>>> > >>
>>> > >>  - how that Java API would map to an SDK-side implementation on top
>>> of
>>> > >> multimap state key
>>> > >>  - how that Java API would map to a new StateKey
>>> > >>  - whether there's actually more than one relevant implementation
>>> of that
>>> > >> StateKey
>>> > >>  - whether SDK-side implementation on some other state key would be
>>> > >> performant enough in all SDK languages (present and future)
>>> > >>
>>> > >> Zooming back out to generic philosophy: Proliferation of StateKey
>>> > >> types tuned by runners (which can very easily still share
>>> implementation)
>>> > >> is probably better than proliferation of complex SDK-side
>>> implementations
>>> > >> with varying completeness and performance.
>>> > >>
>>> > >> Kenn
>>> > >>
>>> > >> On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <re...@google.com>
>>> wrote:
>>> > >>
>>> > >>> It might help for me to describe what I have in mind. I'm still
>>> > >>> proposing that we build multimap, just not a globally-sorted
>>> multimap.
>>> > >>>
>>> > >>> My previous proposal was that we provide a Multimap<Key, Value>
>>> state
>>> > >>> type, sorted by key. this would have two additional operations -
>>> > >>> multimap.getRange(startKey, endKey) and
>>> multimap.deleteRange(startKey,
>>> > >>> endKey). The primary use case was timestamp sorting, but I felt
>>> that a
>>> > >>> sorted multimap provided a nice generalization - after all, you
>>> can simply
>>> > >>> key the multimap by timestamp to get timestamp sorting.
>>> > >>>
>>> > >>> This approach had some issues immediately that would take some
>>> work to
>>> > >>> solve. Since a multimap key can have any type and a runner will
>>> only be
>>> > >>> able to sort by encoded type, we would need to introduce a concept
>>> of
>>> > >>> order-preserving coders into Beam and plumb that through. Robert
>>> pointed
>>> > >>> out that even our existing standard coders for simple integral
>>> types don't
>>> > >>> preserve order, so there will likely be surprises here.
>>> > >>>
>>> > >>> My current proposal is for a multimap that is not sorted by key,
>>> but
>>> > >>> that can support.ordered values for a single key. Remember that a
>>> multimap
>>> > >>> maps K -> Iterable<V>, so this means that each individual
>>> Iterable<V> is
>>> > >>> ordered, but the keys have no specific order relative to each
>>> other. This
>>> > >>> is not too different from many multimap implementations where the
>>> keys are
>>> > >>> unordered, but the list of values for a single key at least has a
>>> stable
>>> > >>> order.
>>> > >>>
>>> > >>> The interface would look like this:
>>> > >>>
>>> > >>> public interface MultimapState<K, V> extends State {
>>> > >>>   // Add a value with a default timestamp.
>>> > >>>   void put(K key, V value);
>>> > >>>
>>> > >>>   // Add a timestamped value.
>>> > >>>   void put(K, key, TimestampedValue<V> value);
>>> > >>>
>>> > >>>   // Remove all values for a key.
>>> > >>>   void remove (K key);
>>> > >>>
>>> > >>>   // Remove all values for a key with timestamps within the
>>> specified
>>> > >>> range.
>>> > >>>   void removeRange(K key, Instant startTs, Instant endTs);
>>> > >>>
>>> > >>>   // Get an Iterable of values for V. The Iterable will be returned
>>> > >>> sorted by timestamp.
>>> > >>>   ReadableState<Iterable<TimestampedValue<V>>> get(K key);
>>> > >>>
>>> > >>>   // Get an Iterable of values for V in the specified range. The
>>> > >>> Iterable will be returned sorted by timestamp.
>>> > >>>   ReadableState<Iterable<TimestampedValue<V>>> getRange(K key,
>>> Instant
>>> > >>> startTs, Instant endTs);
>>> > >>>
>>> > >>>   ReadableState<Iterable<K>> keys();
>>> > >>>   ReadableState<Iterable<TimestampedValue<V>>> values();
>>> > >>>   ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>> entries;
>>> > >>> }
>>> > >>>
>>> > >>> We can of course provide helper functions that allow using
>>> MultimapState
>>> > >>> without deailing with TimestampValue for users who only want a
>>> multimap and
>>> > >>> don't want sorting.
>>> > >>>
>>> > >>> I think many users will only need a single sorted list - not a full
>>> > >>> multimap. It's worth offering this as well, and we can simply
>>> build it on
>>> > >>> top of MultimapState. It will look like an extension of BagState
>>> > >>>
>>> > >>> public interface TimestampSortedListState<T> extends State {
>>> > >>>   void add(TimestampedValue<T> value);
>>> > >>>   Iterable<TimestampedValue<T>> read();
>>> > >>>   Iterable<TimestampedValue<T>> readRange(Instant startTs, Instant
>>> > >>> endTs);
>>> > >>>   void clearRange(Instant startTs, Instant endTs);
>>> > >>> }
>>> > >>>
>>> > >>>
>>> > >>> On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lc...@google.com>
>>> wrote:
>>> > >>>
>>> > >>>> The portability layer is meant to live across multiple versions
>>> of Beam
>>> > >>>> and I don't think it should be treated by doing the simple and
>>> useful thing
>>> > >>>> now since I believe it will lead to a proliferation of the API.
>>> > >>>>
>>> > >>>> On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles <ke...@apache.org>
>>> > >>>> wrote:
>>> > >>>>
>>> > >>>>> I have thoughts on the subject of whether to have APIs just for
>>> the
>>> > >>>>> lowest-level building blocks versus having APIs for higher-level
>>> > >>>>> constructs. Specifically this applies to providing only unsorted
>>> multimap
>>> > >>>>> vs what I will call "time-ordered buffer". TL;DR: I'd vote to
>>> focus on
>>> > >>>>> time-ordered buffer; if it turns out to be easy to go all the
>>> way to sorted
>>> > >>>>> multimap that's nice-to-have; if it turns out to be easy to
>>> implement on
>>> > >>>>> top of unsorted map state that should probably be under the hood
>>> > >>>>>
>>> > >>>>> Reasons to build low-level multimap in the runner & fn api and
>>> layer
>>> > >>>>> higher-level things in the SDK:
>>> > >>>>>
>>> > >>>>>  - It is less implementation for runners if they have to only
>>> provide
>>> > >>>>> fewer lower-level building blocks like multimap state.
>>> > >>>>>  - There are many more runners than SDKs (and will be even more
>>> and
>>> > >>>>> more) so this saves overall.
>>> > >>>>>
>>> > >>>>> Reasons to build higher-level constructs directly in the runner
>>> and fn
>>> > >>>>> api:
>>> > >>>>>
>>> > >>>>>  - Having multiple higher-level state types may actually be less
>>> > >>>>> implementation than one complex state type, especially if they
>>> map to
>>> > >>>>> runner primitives.
>>> > >>>>>  - The runner may have better specialized implementations,
>>> especially
>>> > >>>>> for something like a time-ordered buffer.
>>> > >>>>>  - The particular access patterns in an SDK-based implementation
>>> may
>>> > >>>>> not be ideal for each runner's underlying implementation of the
>>> low-level
>>> > >>>>> building block.
>>> > >>>>>  - There may be excessive gRPC overhead even for optimal access
>>> > >>>>> patterns.
>>> > >>>>>
>>> > >>>>> There are ways to have best of both worlds, like:
>>> > >>>>>
>>> > >>>>> 1. Define multiple state types according to fundamental access
>>> > >>>>> patterns, like we did this before portability.
>>> > >>>>> 2. If it is easy to layer one on top of the other, do that
>>> inside the
>>> > >>>>> runner. Provide shared code so for runners providing the
>>> lowest-level
>>> > >>>>> primitive they get all the types for free.
>>> > >>>>>
>>> > >>>>> I understand that this is an oversimplification. It still
>>> creates some
>>> > >>>>> more work. And APIs are a burden so it is good to introduce as
>>> few as
>>> > >>>>> possible for maintenance. But it has performance benefits and
>>> also unblocks
>>> > >>>>> "just doing the simple and useful thing now" which I always like
>>> to do as
>>> > >>>>> long as it is compatible with future changes. If the APIs are
>>> fundamental,
>>> > >>>>> like sets, maps, timestamp ordering, then it is safe to guess
>>> that they
>>> > >>>>> will change rarely and be useful forever.
>>> > >>>>>
>>> > >>>>> Kenn
>>> > >>>>>
>>> > >>>>> On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com>
>>> wrote:
>>> > >>>>>
>>> > >>>>>> I would be glad to take a stab at how to provide sorting on top
>>> of
>>> > >>>>>> unsorted multimap state.
>>> > >>>>>> Based upon your description, you want integer keys representing
>>> > >>>>>> timestamps and arbitrary user value for the values, is that
>>> correct?
>>> > >>>>>> What kinds of operations do you need on the sorted map state in
>>> order
>>> > >>>>>> of efficiency requirements?
>>> > >>>>>> (e.g. Next(x), Previous(x), GetAll(Range[x, y)),
>>> ClearAll(Range[x, y))
>>> > >>>>>> What kinds of operations do we expect the underlying unsorted
>>> map
>>> > >>>>>> state to be able to provide?
>>> > >>>>>> (at a minimum Get(K), Append(K), Clear(K) but what else e.g.
>>> > >>>>>> enumerate(K)?)
>>> > >>>>>>
>>> > >>>>>> I went through a similar exercise of how to provide a list like
>>> side
>>> > >>>>>> input view over a multimap[1] side input which efficiently
>>> allowed
>>> > >>>>>> computation of size and provided random access while only
>>> having access to
>>> > >>>>>> get(K) and enumerate K's.
>>> > >>>>>>
>>> > >>>>>> 1:
>>> > >>>>>>
>>> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>>> > >>>>>>
>>> > >>>>>> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com>
>>> wrote:
>>> > >>>>>>
>>> > >>>>>>> Bringing this subject up again,
>>> > >>>>>>>
>>> > >>>>>>> I've spent some time looking into implementing this for the
>>> Dataflow
>>> > >>>>>>> runner. I'm unable to find a way to implement the arbitrary
>>> sorted multimap
>>> > >>>>>>> efficiently for the case where there are large numbers of
>>> unique keys.
>>> > >>>>>>> Since the primary driving use case is timestamp ordering (i.e.
>>> key is event
>>> > >>>>>>> timestamp), you would expect to have nearly a new key per
>>> element. I
>>> > >>>>>>> considered Luke's suggestion above, but unfortunately it
>>> doesn't really
>>> > >>>>>>> solve this issue.
>>> > >>>>>>>
>>> > >>>>>>> The primary use case for sorting always seems to be sorting by
>>> > >>>>>>> timestamp. I want to propose that instead of building the
>>> fully-general
>>> > >>>>>>> sorted multimap, we instead focus on a state type where the
>>> sort key is an
>>> > >>>>>>> integral type (like a timestamp or an integer). There is still
>>> a valid use
>>> > >>>>>>> case for multimap, but we can provide that as an unordered
>>> state. At least
>>> > >>>>>>> for Dataflow, it will be much easier
>>> > >>>>>>>
>>> > >>>>>>> While my difficulties here may be specific to the Dataflow
>>> runner,
>>> > >>>>>>> any such support would have to be built into other runners as
>>> well, and
>>> > >>>>>>> limiting to integral sorting likely makes it easier for other
>>> runners to
>>> > >>>>>>> implement this. Also, if you look at this
>>> > >>>>>>> <
>>> https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95>
>>> Flink
>>> > >>>>>>> comment pointed out by Aljoscha, for Flink the main use case
>>> identified was
>>> > >>>>>>> also timestamp sorting. This will also simplify the API design
>>> for this
>>> > >>>>>>> feature: Sorted multimap with arbitrary keys would require us
>>> to introduce
>>> > >>>>>>> a way of mapping natural ordering to encoded ordering (i.e. a
>>> new
>>> > >>>>>>> OrderPreservingCoder), but if we limit sort keys to integral
>>> types, the API
>>> > >>>>>>> design is simpler as integral types can be represented
>>> directly.
>>> > >>>>>>>
>>> > >>>>>>> Reuven
>>> > >>>>>>>
>>> > >>>>>>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com>
>>> wrote:
>>> > >>>>>>>
>>> > >>>>>>>> This sounds to me like a potential runner strategy. However
>>> if a
>>> > >>>>>>>> runner can natively support sorted maps (e.g. we expect the
>>> Dataflow runner
>>> > >>>>>>>> to be able to do so, and I think it would be useful for other
>>> runners as
>>> > >>>>>>>> well), then it's probably preferable to allow the runner to
>>> use its native
>>> > >>>>>>>> capabilities.
>>> > >>>>>>>>
>>> > >>>>>>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <
>>> lcwik@google.com>
>>> > >>>>>>>> wrote:
>>> > >>>>>>>>
>>> > >>>>>>>>> For the API that you proposed, the map key is always "void"
>>> and
>>> > >>>>>>>>> the sort key == user key. So in my example of
>>> > >>>>>>>>> key: dummy value
>>> > >>>>>>>>> key.000: token, (0001, value4)
>>> > >>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>> > >>>>>>>>> key.01: token
>>> > >>>>>>>>> key.1: token, (1011, value3)
>>> > >>>>>>>>> you would have:
>>> > >>>>>>>>> "void": dummy value
>>> > >>>>>>>>> "void".000: token, (0001, value4)
>>> > >>>>>>>>> "void".001: token, (0010, value1), (0011, value2)
>>> > >>>>>>>>> "void".01: token
>>> > >>>>>>>>> "void".1: token, (1011, value3)
>>> > >>>>>>>>>
>>> > >>>>>>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into
>>> walking
>>> > >>>>>>>>> the the prefixes until you find a common prefix for K and
>>> then filter for
>>> > >>>>>>>>> values where they have a sort key <= K. Using the example
>>> above, to find
>>> > >>>>>>>>> entriesUntil(0010) you would:
>>> > >>>>>>>>> look for key."", miss
>>> > >>>>>>>>> look for key.0, miss
>>> > >>>>>>>>> look for key.00, miss
>>> > >>>>>>>>> look for key.000, hit, sort all contained values using
>>> secondary
>>> > >>>>>>>>> key, provide value4 to user
>>> > >>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010
>>> so we
>>> > >>>>>>>>> sort all contained values using secondary key, filter out
>>> value2 and
>>> > >>>>>>>>> provide value1
>>> > >>>>>>>>>
>>> > >>>>>>>>> void removeUntil(K limit) also translates into walking the
>>> > >>>>>>>>> prefixes but instead we will clear them when we have a "hit"
>>> with some
>>> > >>>>>>>>> special logic for when the sort key is a prefix of the key.
>>> Used the
>>> > >>>>>>>>> example, to removeUntil(0010) you would:
>>> > >>>>>>>>> look for key."", miss
>>> > >>>>>>>>> look for key.0, miss
>>> > >>>>>>>>> look for key.00, miss
>>> > >>>>>>>>> look for key.000, hit, clear
>>> > >>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010
>>> so we
>>> > >>>>>>>>> sort all contained values using secondary key, store in
>>> memory all values
>>> > >>>>>>>>> that > 0010, clear and append values stored in memory.
>>> > >>>>>>>>>
>>> > >>>>>>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <
>>> relax@google.com>
>>> > >>>>>>>>> wrote:
>>> > >>>>>>>>>
>>> > >>>>>>>>>> Can you explain how fetching and deleting ranges of keys
>>> would
>>> > >>>>>>>>>> work with this data structure?
>>> > >>>>>>>>>>
>>> > >>>>>>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <
>>> lcwik@google.com>
>>> > >>>>>>>>>> wrote:
>>> > >>>>>>>>>>
>>> > >>>>>>>>>>> Reuven, for the example, I assume that we never want to
>>> store
>>> > >>>>>>>>>>> more then 2 values at a given sort key prefix, and if we
>>> do then we will
>>> > >>>>>>>>>>> create a new longer prefix splitting up the values based
>>> upon the sort key.
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> Tuple representation in examples below is (key, sort key,
>>> value)
>>> > >>>>>>>>>>> and . is a character outside of the alphabet which can be
>>> represented by
>>> > >>>>>>>>>>> using an escaping encoding that wraps the key + sort key
>>> encoding.
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> To insert (key, 0010, value1), we lookup "key" + all the
>>> > >>>>>>>>>>> prefixes of 0010 finding one that is not empty. In this
>>> case its 0, so we
>>> > >>>>>>>>>>> append value to the map at key.0 ending up with (we also
>>> set the key to any
>>> > >>>>>>>>>>> dummy value to know that it it contains values):
>>> > >>>>>>>>>>> key: dummy value
>>> > >>>>>>>>>>> key."": token, (0010, value1)
>>> > >>>>>>>>>>> Now we insert (key, 0011, value2), we again lookup "key" +
>>> all
>>> > >>>>>>>>>>> the prefixes of 0010, finding "", so we append value2 to
>>> key."" ending up
>>> > >>>>>>>>>>> with:
>>> > >>>>>>>>>>> key: dummy value
>>> > >>>>>>>>>>> key."": token, (0010, value1), (0011, value2)
>>> > >>>>>>>>>>> Now we insert (key, 1011, value3), we again lookup "key" +
>>> all
>>> > >>>>>>>>>>> the prefixes of 1011 finding "" but notice that it is
>>> full, so we partition
>>> > >>>>>>>>>>> all the values into two prefixes 0 and 1. We also clear
>>> the "" prefix
>>> > >>>>>>>>>>> ending up with:
>>> > >>>>>>>>>>> key: dummy value
>>> > >>>>>>>>>>> key.0: token, (0010, value1), (0011, value2)
>>> > >>>>>>>>>>> key.1: token, (1011, value3)
>>> > >>>>>>>>>>> Now we insert (key, 0001, value4), we again lookup "key" +
>>> all
>>> > >>>>>>>>>>> the prefixes of the value finding 0 but notice that it is
>>> full, so we
>>> > >>>>>>>>>>> partition all the values into two prefixes 00 and 01 but
>>> notice this
>>> > >>>>>>>>>>> doesn't help us since 00 will be too full so we split 00
>>> again to 000, 001.
>>> > >>>>>>>>>>> We also clear the 0 prefix ending up with:
>>> > >>>>>>>>>>> key: dummy value
>>> > >>>>>>>>>>> key.000: token, (0001, value4)
>>> > >>>>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>> > >>>>>>>>>>> key.01: token
>>> > >>>>>>>>>>> key.1: token, (1011, value3)
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> We are effectively building a trie[1] where we only have
>>> values
>>> > >>>>>>>>>>> at the leaves and control how full each leaf can be. There
>>> are other trie
>>> > >>>>>>>>>>> representations like a radix tree that may be better.
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> Looking up the values in sorted order for "key" would go
>>> like
>>> > >>>>>>>>>>> this:
>>> > >>>>>>>>>>> Is key set, yes
>>> > >>>>>>>>>>> look for key."", miss
>>> > >>>>>>>>>>> look for key.0, miss
>>> > >>>>>>>>>>> look for key.00, miss
>>> > >>>>>>>>>>> look for key.000, hit, sort all contained values using
>>> secondary
>>> > >>>>>>>>>>> key, provide value4 to user
>>> > >>>>>>>>>>> look for key.001, hit, sort all contained values using
>>> secondary
>>> > >>>>>>>>>>> key, provide value1 followed by value2 to user
>>> > >>>>>>>>>>> look for key.01, hit, empty, return no values to user
>>> > >>>>>>>>>>> look for key.1, hit, sort all contained values using
>>> secondary
>>> > >>>>>>>>>>> key, provide value3 to user
>>> > >>>>>>>>>>> we have walked the entire prefix space, signal end of
>>> iterable
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> Some notes for the above:
>>> > >>>>>>>>>>> * The dummy value is used to know that the key contains
>>> values
>>> > >>>>>>>>>>> and the token is to know whether there are any values
>>> deeper in the trie so
>>> > >>>>>>>>>>> when we know when to stop searching.
>>> > >>>>>>>>>>> * If we can recalculate the sort key from the combination
>>> of the
>>> > >>>>>>>>>>> key and value, then we don't need to store it.
>>> > >>>>>>>>>>> * Keys with lots of values will perform worse then keys
>>> with
>>> > >>>>>>>>>>> less values since we have to look up more keys but they
>>> will be empty
>>> > >>>>>>>>>>> reads. The number of misses can be controlled by how many
>>> elements we are
>>> > >>>>>>>>>>> willing to store at a given node before we subdivide.
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> In reality you could build a lot of structures (e.g. red
>>> black
>>> > >>>>>>>>>>> tree, binary tree) using the sort key, the issue is the
>>> cost of
>>> > >>>>>>>>>>> rebalancing/re-organizing the structure in map form and
>>> whether it has a
>>> > >>>>>>>>>>> convenient pre-order traversal for lookups.
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <
>>> relax@google.com>
>>> > >>>>>>>>>>> wrote:
>>> > >>>>>>>>>>>
>>> > >>>>>>>>>>>> Some great comments!
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> *Aljoscha*: absolutely this would have to be implemented
>>> by
>>> > >>>>>>>>>>>> runners to be efficient. We can of course provide a
>>> default (inefficient)
>>> > >>>>>>>>>>>> implementation, but ideally runners would provide better
>>> ones.
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed
>>> by
>>> > >>>>>>>>>>>> this. E.g.
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> *Robert* Great point about standard coders not satisfying
>>> > >>>>>>>>>>>> this. That's why I suggested that we provide a way to tag
>>> the coders that
>>> > >>>>>>>>>>>> do preserve order, and only accept those as key coders
>>> Alternatively we
>>> > >>>>>>>>>>>> could present a more limited API - e.g. only allowing a
>>> hard-coded set of
>>> > >>>>>>>>>>>> types to be used as keys - but that seems counter to the
>>> direction Beam
>>> > >>>>>>>>>>>> usually goes. So users will have two ways .of creating
>>> multimap state specs:
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>>
>>> state =
>>> > >>>>>>>>>>>> StateSpecs.multimap(VarLongCoder.of(),
>>> StringUtf8Coder.of());
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> or
>>> > >>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>>
>>> state =
>>> > >>>>>>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(),
>>> StringUtf8Coder.of());
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> The second one will validate that the key coder preserves
>>> > >>>>>>>>>>>> order, and fails otherwise (similar to coder determinism
>>> checking in
>>> > >>>>>>>>>>>> GroupByKey). (BTW we would also have versions of these
>>> functions that use
>>> > >>>>>>>>>>>> coder inference to "guess" the coder, but those will do
>>> the same checking)
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> Also the API I proposed did support random access! We
>>> could
>>> > >>>>>>>>>>>> separate out OrderedBagState again if we think the use
>>> cases are
>>> > >>>>>>>>>>>> fundamentally different. I merged the proposal into that
>>> of MultimapState
>>> > >>>>>>>>>>>> because there seemed be 99% overlap.
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> Reuven
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <
>>> > >>>>>>>>>>>> robertwb@google.com> wrote:
>>> > >>>>>>>>>>>>
>>> > >>>>>>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <
>>> relax@google.com>
>>> > >>>>>>>>>>>>> wrote:
>>> > >>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <
>>> > >>>>>>>>>>>>> altay@google.com> wrote:
>>> > >>>>>>>>>>>>> >>
>>> > >>>>>>>>>>>>> >>
>>> > >>>>>>>>>>>>> >>
>>> > >>>>>>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <
>>> > >>>>>>>>>>>>> lcwik@google.com> wrote:
>>> > >>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <
>>> > >>>>>>>>>>>>> ruwang@google.com> wrote:
>>> > >>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>> >>>>> A few obvious problems with this code:
>>> > >>>>>>>>>>>>> >>>>>   1. Removing the elements already processed from
>>> the
>>> > >>>>>>>>>>>>> bag requires clearing and rewriting the entire bag. This
>>> is O(n^2) in the
>>> > >>>>>>>>>>>>> number of input trades.
>>> > >>>>>>>>>>>>> >>>>
>>> > >>>>>>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting
>>> trade
>>> > >>>>>>>>>>>>> state?
>>> > >>>>>>>>>>>>> >>>>
>>> > >>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends
>>> State
>>> > >>>>>>>>>>>>> {
>>> > >>>>>>>>>>>>> >>>>>   // Add a value to the map.
>>> > >>>>>>>>>>>>> >>>>>   void put(K key, V value);
>>> > >>>>>>>>>>>>> >>>>>   // Get all values for a given key.
>>> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>>> > >>>>>>>>>>>>> >>>>>  // Return all entries in the map.
>>> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>>> > >>>>>>>>>>>>> >>>>>   // Return all entries in the map with keys <=
>>> limit.
>>> > >>>>>>>>>>>>> returned elements are sorted by the key.
>>> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K
>>> > >>>>>>>>>>>>> limit);
>>> > >>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>> >>>>>  // Remove all values with the given key;
>>> > >>>>>>>>>>>>> >>>>>   void remove(K key);
>>> > >>>>>>>>>>>>> >>>>>  // Remove all entries in the map with keys <=
>>> limit.
>>> > >>>>>>>>>>>>> >>>>>   void removeUntil(K limit);
>>> > >>>>>>>>>>>>> >>>>
>>> > >>>>>>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will
>>> remove
>>> > >>>>>>>>>>>>> all entries in the map with keys < limit.
>>> > >>>>>>>>>>>>> >>>>
>>> > >>>>>>>>>>>>> >>>>>
>>> > >>>>>>>>>>>>> >>>>> Runners will sort based on the encoded value of
>>> the key.
>>> > >>>>>>>>>>>>> In order to make this easier for users, I propose that
>>> we introduce a new
>>> > >>>>>>>>>>>>> tag on Coders PreservesOrder. A Coder that contains this
>>> tag guarantees
>>> > >>>>>>>>>>>>> that the encoded value preserves the same ordering as
>>> the base Java type.
>>> > >>>>>>>>>>>>> >>>>
>>> > >>>>>>>>>>>>> >>>>
>>> > >>>>>>>>>>>>> >>>> Could you clarify what is  "encoded value preserves
>>> the
>>> > >>>>>>>>>>>>> same ordering as the base Java type"?
>>> > >>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>> >>>
>>> > >>>>>>>>>>>>> >>> Lets say A and B represent two different instances
>>> of the
>>> > >>>>>>>>>>>>> same Java type like a double, then A < B (using the
>>> languages comparison
>>> > >>>>>>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded
>>> versions are compared
>>> > >>>>>>>>>>>>> lexicographically)
>>> > >>>>>>>>>>>>> >>
>>> > >>>>>>>>>>>>> >>
>>> > >>>>>>>>>>>>> >> Since coders are shared across SDKs, do we expect A <
>>> B iff
>>> > >>>>>>>>>>>>> e(A) < e(P) property to hold for all languages we
>>> support? What happens A,
>>> > >>>>>>>>>>>>> B sort differently in different languages?
>>> > >>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>> >
>>> > >>>>>>>>>>>>> > That would have to be the property of the coder (which
>>> means
>>> > >>>>>>>>>>>>> that this property probably needs to be represented in
>>> the portability
>>> > >>>>>>>>>>>>> representation of the coder). I imagine the common use
>>> cases will be for
>>> > >>>>>>>>>>>>> simple coders like int, long, string, etc., which are
>>> likely to sort the
>>> > >>>>>>>>>>>>> same in most languages.
>>> > >>>>>>>>>>>>>
>>> > >>>>>>>>>>>>> The standard coders for both double and integral types
>>> do not
>>> > >>>>>>>>>>>>> respect
>>> > >>>>>>>>>>>>> the natural ordering (consider negative values). KV
>>> coders
>>> > >>>>>>>>>>>>> violate the
>>> > >>>>>>>>>>>>> "natural" lexicographic ordering on components as well.
>>> I think
>>> > >>>>>>>>>>>>> implicitly sorting on encoded value would yield many
>>> > >>>>>>>>>>>>> surprises. (The
>>> > >>>>>>>>>>>>> state, of course, could take a order-preserving, bytes
>>> > >>>>>>>>>>>>> (string?)-producing callable as a parameter of course).
>>> (As for
>>> > >>>>>>>>>>>>> naming, I'd probably call this OrderedBagState or
>>> something
>>> > >>>>>>>>>>>>> like
>>> > >>>>>>>>>>>>> that...rather than Map which tends to imply random
>>> access.)
>>> > >>>>>>>>>>>>>
>>> > >>>>>>>>>>>>
>>> >
>>>
>>

Re: [External] Re: DISCUSS: Sorted MapState API

Posted by Tyson Hamilton <ty...@google.com>.

The use case is to support an unbounded stream-stream join, where the
elements are arriving in roughly time sorted order. Removing a specific
element from the timestamp indexed collection is necessary when a match is
found. Having clearRange is helpful to expire elements that are no longer
relevant according to a user-provided time based join predicate (e.g. WHEN
ABS(leftElement.timestamp - rightElement.timestamp) < 5 minutes).

I'll think a bit more on how to use MapState instead if having a remove()
like method for a single element isn't an option.

On Tue, Sep 15, 2020 at 8:52 PM Reuven Lax <re...@google.com> wrote:

> Hi,
>
> Currently we only support removing a timestamp range. You can remove a
> single timestamp of course by removing [ts, ts+1), however if there are
> multiple elements with the same timestamp this will remove all of those
> elements.
>
> Does this fit your use case? If not, I wonder if MapState is closer to
> what you are looking for?
>
> Reuven
>
> On Tue, Sep 15, 2020 at 2:33 PM Tyson Hamilton <ty...@google.com> wrote:
>
>> Hi Reuven,
>>
>> I noticed that there was an implementation of the in-memory
>> OrderedListState introduced [1]. Where can I find out more regarding the
>> plan and design? Is there a design doc? I'd like to know more details about
>> the implementation to see if it fits my use case. I was hoping it would
>> have a remove(TimestampedValue<T> e) method.
>>
>> Thanks,
>> -Tyson
>>
>>
>> [1]:
>> https://github.com/apache/beam/commit/9d0d0b0c4506b288164b155c5ce3a23d76db3c41
>>
>>
>> On 2020/08/03 21:41:46, Catlyn Kong <ca...@yelp.com> wrote:
>> > Hey folks,
>> >
>> > Sry I'm late to this thread but this might be very helpful for the
>> problem
>> > we're dealing with. Do we have a design doc or a jira ticket I can
>> follow?
>> >
>> > Cheers,
>> > Catlyn
>> >
>> > On Thu, Jun 18, 2020 at 1:11 PM Jan Lukavský <je...@seznam.cz> wrote:
>> >
>> > > My questions were just an example. I fully agree there is a
>> fundamental
>> > > need for a sorted state (of some form, and I also think this links to
>> > > efficient implementation of retrations) - I was reacting to Kenn's
>> question
>> > > about BIP. This one would be pretty nice example why it would be good
>> to
>> > > have such a "process" - not everything can be solved on ML and there
>> are
>> > > fundamental decisions that might need a closer attention.
>> > > On 6/18/20 5:28 PM, Reuven Lax wrote:
>> > >
>> > > Jan - my proposal is exactly TimeSortedBagState (more accurately -
>> > > TimeSortedListState), though I went a bit further and also proposed a
>> way
>> > > to have a dynamic number of tagged TimeSortedBagStates.
>> > >
>> > > You are correct that the runner doesn't really have to store the data
>> time
>> > > sorted - what's actually needed is the ability to fetch and remove
>> > > timestamp ranges of data (though that does include fetching the entire
>> > > list); TimeOrderedState is probably a more accurate name then
>> > > TimeSortedState. I don't think we could get away with operations that
>> only
>> > > act on the smallest timestamp, however we could limit the API to only
>> being
>> > > able to fetch and remove prefixes of data (ordered by timestamp).
>> However
>> > > if we support prefixes, we might as well support arbitrary subranges.
>> > >
>> > > On Thu, Jun 18, 2020 at 7:26 AM Jan Lukavský <je...@seznam.cz> wrote:
>> > >
>> > >> Big +1 for a BIP, as this might really help clarify all the pros and
>> cons
>> > >> of all possibilities. There seem to be questions that need answering
>> and
>> > >> motivating use cases - do we need sorted map state or can we solve
>> our use
>> > >> cases by something simpler - e.g. the mentioned TimeSortedBagState?
>> Does
>> > >> that really have to be time-sorted structure, or does it "only" have
>> to
>> > >> have operations that can efficiently find and remove element with
>> smallest
>> > >> timestamp (like a PriorityQueue)?
>> > >>
>> > >> Jan
>> > >> On 6/18/20 5:32 AM, Kenneth Knowles wrote:
>> > >>
>> > >> Zooming in from generic philosophy to be clear: adding time ordered
>> > >> buffer to the Fn state API is *not* a shortcut.It has benefits that
>> will
>> > >> not be achieved by SDK-side implementation on top of either ordered
>> or
>> > >> unordered multimap. Are those benefits worth expanding the API? I
>> don't
>> > >> know.
>> > >>
>> > >> A change to allow a runner to have a specialized implementation for
>> > >> time-buffered state would be one or more StateKey types, right?
>> Reuven,
>> > >> maybe put this and your Java API in a doc? A BIP? Seems like there's
>> at
>> > >> least the following to explore:
>> > >>
>> > >>  - how that Java API would map to an SDK-side implementation on top
>> of
>> > >> multimap state key
>> > >>  - how that Java API would map to a new StateKey
>> > >>  - whether there's actually more than one relevant implementation of
>> that
>> > >> StateKey
>> > >>  - whether SDK-side implementation on some other state key would be
>> > >> performant enough in all SDK languages (present and future)
>> > >>
>> > >> Zooming back out to generic philosophy: Proliferation of StateKey
>> > >> types tuned by runners (which can very easily still share
>> implementation)
>> > >> is probably better than proliferation of complex SDK-side
>> implementations
>> > >> with varying completeness and performance.
>> > >>
>> > >> Kenn
>> > >>
>> > >> On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <re...@google.com> wrote:
>> > >>
>> > >>> It might help for me to describe what I have in mind. I'm still
>> > >>> proposing that we build multimap, just not a globally-sorted
>> multimap.
>> > >>>
>> > >>> My previous proposal was that we provide a Multimap<Key, Value>
>> state
>> > >>> type, sorted by key. this would have two additional operations -
>> > >>> multimap.getRange(startKey, endKey) and
>> multimap.deleteRange(startKey,
>> > >>> endKey). The primary use case was timestamp sorting, but I felt
>> that a
>> > >>> sorted multimap provided a nice generalization - after all, you can
>> simply
>> > >>> key the multimap by timestamp to get timestamp sorting.
>> > >>>
>> > >>> This approach had some issues immediately that would take some work
>> to
>> > >>> solve. Since a multimap key can have any type and a runner will
>> only be
>> > >>> able to sort by encoded type, we would need to introduce a concept
>> of
>> > >>> order-preserving coders into Beam and plumb that through. Robert
>> pointed
>> > >>> out that even our existing standard coders for simple integral
>> types don't
>> > >>> preserve order, so there will likely be surprises here.
>> > >>>
>> > >>> My current proposal is for a multimap that is not sorted by key, but
>> > >>> that can support.ordered values for a single key. Remember that a
>> multimap
>> > >>> maps K -> Iterable<V>, so this means that each individual
>> Iterable<V> is
>> > >>> ordered, but the keys have no specific order relative to each
>> other. This
>> > >>> is not too different from many multimap implementations where the
>> keys are
>> > >>> unordered, but the list of values for a single key at least has a
>> stable
>> > >>> order.
>> > >>>
>> > >>> The interface would look like this:
>> > >>>
>> > >>> public interface MultimapState<K, V> extends State {
>> > >>>   // Add a value with a default timestamp.
>> > >>>   void put(K key, V value);
>> > >>>
>> > >>>   // Add a timestamped value.
>> > >>>   void put(K, key, TimestampedValue<V> value);
>> > >>>
>> > >>>   // Remove all values for a key.
>> > >>>   void remove (K key);
>> > >>>
>> > >>>   // Remove all values for a key with timestamps within the
>> specified
>> > >>> range.
>> > >>>   void removeRange(K key, Instant startTs, Instant endTs);
>> > >>>
>> > >>>   // Get an Iterable of values for V. The Iterable will be returned
>> > >>> sorted by timestamp.
>> > >>>   ReadableState<Iterable<TimestampedValue<V>>> get(K key);
>> > >>>
>> > >>>   // Get an Iterable of values for V in the specified range. The
>> > >>> Iterable will be returned sorted by timestamp.
>> > >>>   ReadableState<Iterable<TimestampedValue<V>>> getRange(K key,
>> Instant
>> > >>> startTs, Instant endTs);
>> > >>>
>> > >>>   ReadableState<Iterable<K>> keys();
>> > >>>   ReadableState<Iterable<TimestampedValue<V>>> values();
>> > >>>   ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>> entries;
>> > >>> }
>> > >>>
>> > >>> We can of course provide helper functions that allow using
>> MultimapState
>> > >>> without deailing with TimestampValue for users who only want a
>> multimap and
>> > >>> don't want sorting.
>> > >>>
>> > >>> I think many users will only need a single sorted list - not a full
>> > >>> multimap. It's worth offering this as well, and we can simply build
>> it on
>> > >>> top of MultimapState. It will look like an extension of BagState
>> > >>>
>> > >>> public interface TimestampSortedListState<T> extends State {
>> > >>>   void add(TimestampedValue<T> value);
>> > >>>   Iterable<TimestampedValue<T>> read();
>> > >>>   Iterable<TimestampedValue<T>> readRange(Instant startTs, Instant
>> > >>> endTs);
>> > >>>   void clearRange(Instant startTs, Instant endTs);
>> > >>> }
>> > >>>
>> > >>>
>> > >>> On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lc...@google.com> wrote:
>> > >>>
>> > >>>> The portability layer is meant to live across multiple versions of
>> Beam
>> > >>>> and I don't think it should be treated by doing the simple and
>> useful thing
>> > >>>> now since I believe it will lead to a proliferation of the API.
>> > >>>>
>> > >>>> On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles <ke...@apache.org>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> I have thoughts on the subject of whether to have APIs just for
>> the
>> > >>>>> lowest-level building blocks versus having APIs for higher-level
>> > >>>>> constructs. Specifically this applies to providing only unsorted
>> multimap
>> > >>>>> vs what I will call "time-ordered buffer". TL;DR: I'd vote to
>> focus on
>> > >>>>> time-ordered buffer; if it turns out to be easy to go all the way
>> to sorted
>> > >>>>> multimap that's nice-to-have; if it turns out to be easy to
>> implement on
>> > >>>>> top of unsorted map state that should probably be under the hood
>> > >>>>>
>> > >>>>> Reasons to build low-level multimap in the runner & fn api and
>> layer
>> > >>>>> higher-level things in the SDK:
>> > >>>>>
>> > >>>>>  - It is less implementation for runners if they have to only
>> provide
>> > >>>>> fewer lower-level building blocks like multimap state.
>> > >>>>>  - There are many more runners than SDKs (and will be even more
>> and
>> > >>>>> more) so this saves overall.
>> > >>>>>
>> > >>>>> Reasons to build higher-level constructs directly in the runner
>> and fn
>> > >>>>> api:
>> > >>>>>
>> > >>>>>  - Having multiple higher-level state types may actually be less
>> > >>>>> implementation than one complex state type, especially if they
>> map to
>> > >>>>> runner primitives.
>> > >>>>>  - The runner may have better specialized implementations,
>> especially
>> > >>>>> for something like a time-ordered buffer.
>> > >>>>>  - The particular access patterns in an SDK-based implementation
>> may
>> > >>>>> not be ideal for each runner's underlying implementation of the
>> low-level
>> > >>>>> building block.
>> > >>>>>  - There may be excessive gRPC overhead even for optimal access
>> > >>>>> patterns.
>> > >>>>>
>> > >>>>> There are ways to have best of both worlds, like:
>> > >>>>>
>> > >>>>> 1. Define multiple state types according to fundamental access
>> > >>>>> patterns, like we did this before portability.
>> > >>>>> 2. If it is easy to layer one on top of the other, do that inside
>> the
>> > >>>>> runner. Provide shared code so for runners providing the
>> lowest-level
>> > >>>>> primitive they get all the types for free.
>> > >>>>>
>> > >>>>> I understand that this is an oversimplification. It still creates
>> some
>> > >>>>> more work. And APIs are a burden so it is good to introduce as
>> few as
>> > >>>>> possible for maintenance. But it has performance benefits and
>> also unblocks
>> > >>>>> "just doing the simple and useful thing now" which I always like
>> to do as
>> > >>>>> long as it is compatible with future changes. If the APIs are
>> fundamental,
>> > >>>>> like sets, maps, timestamp ordering, then it is safe to guess
>> that they
>> > >>>>> will change rarely and be useful forever.
>> > >>>>>
>> > >>>>> Kenn
>> > >>>>>
>> > >>>>> On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com>
>> wrote:
>> > >>>>>
>> > >>>>>> I would be glad to take a stab at how to provide sorting on top
>> of
>> > >>>>>> unsorted multimap state.
>> > >>>>>> Based upon your description, you want integer keys representing
>> > >>>>>> timestamps and arbitrary user value for the values, is that
>> correct?
>> > >>>>>> What kinds of operations do you need on the sorted map state in
>> order
>> > >>>>>> of efficiency requirements?
>> > >>>>>> (e.g. Next(x), Previous(x), GetAll(Range[x, y)),
>> ClearAll(Range[x, y))
>> > >>>>>> What kinds of operations do we expect the underlying unsorted map
>> > >>>>>> state to be able to provide?
>> > >>>>>> (at a minimum Get(K), Append(K), Clear(K) but what else e.g.
>> > >>>>>> enumerate(K)?)
>> > >>>>>>
>> > >>>>>> I went through a similar exercise of how to provide a list like
>> side
>> > >>>>>> input view over a multimap[1] side input which efficiently
>> allowed
>> > >>>>>> computation of size and provided random access while only having
>> access to
>> > >>>>>> get(K) and enumerate K's.
>> > >>>>>>
>> > >>>>>> 1:
>> > >>>>>>
>> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>> > >>>>>>
>> > >>>>>> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com>
>> wrote:
>> > >>>>>>
>> > >>>>>>> Bringing this subject up again,
>> > >>>>>>>
>> > >>>>>>> I've spent some time looking into implementing this for the
>> Dataflow
>> > >>>>>>> runner. I'm unable to find a way to implement the arbitrary
>> sorted multimap
>> > >>>>>>> efficiently for the case where there are large numbers of
>> unique keys.
>> > >>>>>>> Since the primary driving use case is timestamp ordering (i.e.
>> key is event
>> > >>>>>>> timestamp), you would expect to have nearly a new key per
>> element. I
>> > >>>>>>> considered Luke's suggestion above, but unfortunately it
>> doesn't really
>> > >>>>>>> solve this issue.
>> > >>>>>>>
>> > >>>>>>> The primary use case for sorting always seems to be sorting by
>> > >>>>>>> timestamp. I want to propose that instead of building the
>> fully-general
>> > >>>>>>> sorted multimap, we instead focus on a state type where the
>> sort key is an
>> > >>>>>>> integral type (like a timestamp or an integer). There is still
>> a valid use
>> > >>>>>>> case for multimap, but we can provide that as an unordered
>> state. At least
>> > >>>>>>> for Dataflow, it will be much easier
>> > >>>>>>>
>> > >>>>>>> While my difficulties here may be specific to the Dataflow
>> runner,
>> > >>>>>>> any such support would have to be built into other runners as
>> well, and
>> > >>>>>>> limiting to integral sorting likely makes it easier for other
>> runners to
>> > >>>>>>> implement this. Also, if you look at this
>> > >>>>>>> <
>> https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95>
>> Flink
>> > >>>>>>> comment pointed out by Aljoscha, for Flink the main use case
>> identified was
>> > >>>>>>> also timestamp sorting. This will also simplify the API design
>> for this
>> > >>>>>>> feature: Sorted multimap with arbitrary keys would require us
>> to introduce
>> > >>>>>>> a way of mapping natural ordering to encoded ordering (i.e. a
>> new
>> > >>>>>>> OrderPreservingCoder), but if we limit sort keys to integral
>> types, the API
>> > >>>>>>> design is simpler as integral types can be represented directly.
>> > >>>>>>>
>> > >>>>>>> Reuven
>> > >>>>>>>
>> > >>>>>>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com>
>> wrote:
>> > >>>>>>>
>> > >>>>>>>> This sounds to me like a potential runner strategy. However if
>> a
>> > >>>>>>>> runner can natively support sorted maps (e.g. we expect the
>> Dataflow runner
>> > >>>>>>>> to be able to do so, and I think it would be useful for other
>> runners as
>> > >>>>>>>> well), then it's probably preferable to allow the runner to
>> use its native
>> > >>>>>>>> capabilities.
>> > >>>>>>>>
>> > >>>>>>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lcwik@google.com
>> >
>> > >>>>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> For the API that you proposed, the map key is always "void"
>> and
>> > >>>>>>>>> the sort key == user key. So in my example of
>> > >>>>>>>>> key: dummy value
>> > >>>>>>>>> key.000: token, (0001, value4)
>> > >>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>> > >>>>>>>>> key.01: token
>> > >>>>>>>>> key.1: token, (1011, value3)
>> > >>>>>>>>> you would have:
>> > >>>>>>>>> "void": dummy value
>> > >>>>>>>>> "void".000: token, (0001, value4)
>> > >>>>>>>>> "void".001: token, (0010, value1), (0011, value2)
>> > >>>>>>>>> "void".01: token
>> > >>>>>>>>> "void".1: token, (1011, value3)
>> > >>>>>>>>>
>> > >>>>>>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into
>> walking
>> > >>>>>>>>> the the prefixes until you find a common prefix for K and
>> then filter for
>> > >>>>>>>>> values where they have a sort key <= K. Using the example
>> above, to find
>> > >>>>>>>>> entriesUntil(0010) you would:
>> > >>>>>>>>> look for key."", miss
>> > >>>>>>>>> look for key.0, miss
>> > >>>>>>>>> look for key.00, miss
>> > >>>>>>>>> look for key.000, hit, sort all contained values using
>> secondary
>> > >>>>>>>>> key, provide value4 to user
>> > >>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so
>> we
>> > >>>>>>>>> sort all contained values using secondary key, filter out
>> value2 and
>> > >>>>>>>>> provide value1
>> > >>>>>>>>>
>> > >>>>>>>>> void removeUntil(K limit) also translates into walking the
>> > >>>>>>>>> prefixes but instead we will clear them when we have a "hit"
>> with some
>> > >>>>>>>>> special logic for when the sort key is a prefix of the key.
>> Used the
>> > >>>>>>>>> example, to removeUntil(0010) you would:
>> > >>>>>>>>> look for key."", miss
>> > >>>>>>>>> look for key.0, miss
>> > >>>>>>>>> look for key.00, miss
>> > >>>>>>>>> look for key.000, hit, clear
>> > >>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so
>> we
>> > >>>>>>>>> sort all contained values using secondary key, store in
>> memory all values
>> > >>>>>>>>> that > 0010, clear and append values stored in memory.
>> > >>>>>>>>>
>> > >>>>>>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <relax@google.com
>> >
>> > >>>>>>>>> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> Can you explain how fetching and deleting ranges of keys
>> would
>> > >>>>>>>>>> work with this data structure?
>> > >>>>>>>>>>
>> > >>>>>>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <
>> lcwik@google.com>
>> > >>>>>>>>>> wrote:
>> > >>>>>>>>>>
>> > >>>>>>>>>>> Reuven, for the example, I assume that we never want to
>> store
>> > >>>>>>>>>>> more then 2 values at a given sort key prefix, and if we do
>> then we will
>> > >>>>>>>>>>> create a new longer prefix splitting up the values based
>> upon the sort key.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Tuple representation in examples below is (key, sort key,
>> value)
>> > >>>>>>>>>>> and . is a character outside of the alphabet which can be
>> represented by
>> > >>>>>>>>>>> using an escaping encoding that wraps the key + sort key
>> encoding.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> To insert (key, 0010, value1), we lookup "key" + all the
>> > >>>>>>>>>>> prefixes of 0010 finding one that is not empty. In this
>> case its 0, so we
>> > >>>>>>>>>>> append value to the map at key.0 ending up with (we also
>> set the key to any
>> > >>>>>>>>>>> dummy value to know that it it contains values):
>> > >>>>>>>>>>> key: dummy value
>> > >>>>>>>>>>> key."": token, (0010, value1)
>> > >>>>>>>>>>> Now we insert (key, 0011, value2), we again lookup "key" +
>> all
>> > >>>>>>>>>>> the prefixes of 0010, finding "", so we append value2 to
>> key."" ending up
>> > >>>>>>>>>>> with:
>> > >>>>>>>>>>> key: dummy value
>> > >>>>>>>>>>> key."": token, (0010, value1), (0011, value2)
>> > >>>>>>>>>>> Now we insert (key, 1011, value3), we again lookup "key" +
>> all
>> > >>>>>>>>>>> the prefixes of 1011 finding "" but notice that it is full,
>> so we partition
>> > >>>>>>>>>>> all the values into two prefixes 0 and 1. We also clear the
>> "" prefix
>> > >>>>>>>>>>> ending up with:
>> > >>>>>>>>>>> key: dummy value
>> > >>>>>>>>>>> key.0: token, (0010, value1), (0011, value2)
>> > >>>>>>>>>>> key.1: token, (1011, value3)
>> > >>>>>>>>>>> Now we insert (key, 0001, value4), we again lookup "key" +
>> all
>> > >>>>>>>>>>> the prefixes of the value finding 0 but notice that it is
>> full, so we
>> > >>>>>>>>>>> partition all the values into two prefixes 00 and 01 but
>> notice this
>> > >>>>>>>>>>> doesn't help us since 00 will be too full so we split 00
>> again to 000, 001.
>> > >>>>>>>>>>> We also clear the 0 prefix ending up with:
>> > >>>>>>>>>>> key: dummy value
>> > >>>>>>>>>>> key.000: token, (0001, value4)
>> > >>>>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>> > >>>>>>>>>>> key.01: token
>> > >>>>>>>>>>> key.1: token, (1011, value3)
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> We are effectively building a trie[1] where we only have
>> values
>> > >>>>>>>>>>> at the leaves and control how full each leaf can be. There
>> are other trie
>> > >>>>>>>>>>> representations like a radix tree that may be better.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Looking up the values in sorted order for "key" would go
>> like
>> > >>>>>>>>>>> this:
>> > >>>>>>>>>>> Is key set, yes
>> > >>>>>>>>>>> look for key."", miss
>> > >>>>>>>>>>> look for key.0, miss
>> > >>>>>>>>>>> look for key.00, miss
>> > >>>>>>>>>>> look for key.000, hit, sort all contained values using
>> secondary
>> > >>>>>>>>>>> key, provide value4 to user
>> > >>>>>>>>>>> look for key.001, hit, sort all contained values using
>> secondary
>> > >>>>>>>>>>> key, provide value1 followed by value2 to user
>> > >>>>>>>>>>> look for key.01, hit, empty, return no values to user
>> > >>>>>>>>>>> look for key.1, hit, sort all contained values using
>> secondary
>> > >>>>>>>>>>> key, provide value3 to user
>> > >>>>>>>>>>> we have walked the entire prefix space, signal end of
>> iterable
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Some notes for the above:
>> > >>>>>>>>>>> * The dummy value is used to know that the key contains
>> values
>> > >>>>>>>>>>> and the token is to know whether there are any values
>> deeper in the trie so
>> > >>>>>>>>>>> when we know when to stop searching.
>> > >>>>>>>>>>> * If we can recalculate the sort key from the combination
>> of the
>> > >>>>>>>>>>> key and value, then we don't need to store it.
>> > >>>>>>>>>>> * Keys with lots of values will perform worse then keys with
>> > >>>>>>>>>>> less values since we have to look up more keys but they
>> will be empty
>> > >>>>>>>>>>> reads. The number of misses can be controlled by how many
>> elements we are
>> > >>>>>>>>>>> willing to store at a given node before we subdivide.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> In reality you could build a lot of structures (e.g. red
>> black
>> > >>>>>>>>>>> tree, binary tree) using the sort key, the issue is the
>> cost of
>> > >>>>>>>>>>> rebalancing/re-organizing the structure in map form and
>> whether it has a
>> > >>>>>>>>>>> convenient pre-order traversal for lookups.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <
>> relax@google.com>
>> > >>>>>>>>>>> wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>> Some great comments!
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> *Aljoscha*: absolutely this would have to be implemented by
>> > >>>>>>>>>>>> runners to be efficient. We can of course provide a
>> default (inefficient)
>> > >>>>>>>>>>>> implementation, but ideally runners would provide better
>> ones.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed by
>> > >>>>>>>>>>>> this. E.g.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> *Robert* Great point about standard coders not satisfying
>> > >>>>>>>>>>>> this. That's why I suggested that we provide a way to tag
>> the coders that
>> > >>>>>>>>>>>> do preserve order, and only accept those as key coders
>> Alternatively we
>> > >>>>>>>>>>>> could present a more limited API - e.g. only allowing a
>> hard-coded set of
>> > >>>>>>>>>>>> types to be used as keys - but that seems counter to the
>> direction Beam
>> > >>>>>>>>>>>> usually goes. So users will have two ways .of creating
>> multimap state specs:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>>
>> state =
>> > >>>>>>>>>>>> StateSpecs.multimap(VarLongCoder.of(),
>> StringUtf8Coder.of());
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> or
>> > >>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>>
>> state =
>> > >>>>>>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(),
>> StringUtf8Coder.of());
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> The second one will validate that the key coder preserves
>> > >>>>>>>>>>>> order, and fails otherwise (similar to coder determinism
>> checking in
>> > >>>>>>>>>>>> GroupByKey). (BTW we would also have versions of these
>> functions that use
>> > >>>>>>>>>>>> coder inference to "guess" the coder, but those will do
>> the same checking)
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Also the API I proposed did support random access! We could
>> > >>>>>>>>>>>> separate out OrderedBagState again if we think the use
>> cases are
>> > >>>>>>>>>>>> fundamentally different. I merged the proposal into that
>> of MultimapState
>> > >>>>>>>>>>>> because there seemed be 99% overlap.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Reuven
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <
>> > >>>>>>>>>>>> robertwb@google.com> wrote:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <
>> relax@google.com>
>> > >>>>>>>>>>>>> wrote:
>> > >>>>>>>>>>>>> >
>> > >>>>>>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <
>> > >>>>>>>>>>>>> altay@google.com> wrote:
>> > >>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <
>> > >>>>>>>>>>>>> lcwik@google.com> wrote:
>> > >>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <
>> > >>>>>>>>>>>>> ruwang@google.com> wrote:
>> > >>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>> >>>>> A few obvious problems with this code:
>> > >>>>>>>>>>>>> >>>>>   1. Removing the elements already processed from
>> the
>> > >>>>>>>>>>>>> bag requires clearing and rewriting the entire bag. This
>> is O(n^2) in the
>> > >>>>>>>>>>>>> number of input trades.
>> > >>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade
>> > >>>>>>>>>>>>> state?
>> > >>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends
>> State
>> > >>>>>>>>>>>>> {
>> > >>>>>>>>>>>>> >>>>>   // Add a value to the map.
>> > >>>>>>>>>>>>> >>>>>   void put(K key, V value);
>> > >>>>>>>>>>>>> >>>>>   // Get all values for a given key.
>> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>> > >>>>>>>>>>>>> >>>>>  // Return all entries in the map.
>> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>> > >>>>>>>>>>>>> >>>>>   // Return all entries in the map with keys <=
>> limit.
>> > >>>>>>>>>>>>> returned elements are sorted by the key.
>> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K
>> > >>>>>>>>>>>>> limit);
>> > >>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>> >>>>>  // Remove all values with the given key;
>> > >>>>>>>>>>>>> >>>>>   void remove(K key);
>> > >>>>>>>>>>>>> >>>>>  // Remove all entries in the map with keys <=
>> limit.
>> > >>>>>>>>>>>>> >>>>>   void removeUntil(K limit);
>> > >>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will
>> remove
>> > >>>>>>>>>>>>> all entries in the map with keys < limit.
>> > >>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>> >>>>>
>> > >>>>>>>>>>>>> >>>>> Runners will sort based on the encoded value of the
>> key.
>> > >>>>>>>>>>>>> In order to make this easier for users, I propose that we
>> introduce a new
>> > >>>>>>>>>>>>> tag on Coders PreservesOrder. A Coder that contains this
>> tag guarantees
>> > >>>>>>>>>>>>> that the encoded value preserves the same ordering as the
>> base Java type.
>> > >>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>> >>>>
>> > >>>>>>>>>>>>> >>>> Could you clarify what is  "encoded value preserves
>> the
>> > >>>>>>>>>>>>> same ordering as the base Java type"?
>> > >>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>> >>>
>> > >>>>>>>>>>>>> >>> Lets say A and B represent two different instances of
>> the
>> > >>>>>>>>>>>>> same Java type like a double, then A < B (using the
>> languages comparison
>> > >>>>>>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded
>> versions are compared
>> > >>>>>>>>>>>>> lexicographically)
>> > >>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>> >>
>> > >>>>>>>>>>>>> >> Since coders are shared across SDKs, do we expect A <
>> B iff
>> > >>>>>>>>>>>>> e(A) < e(P) property to hold for all languages we
>> support? What happens A,
>> > >>>>>>>>>>>>> B sort differently in different languages?
>> > >>>>>>>>>>>>> >
>> > >>>>>>>>>>>>> >
>> > >>>>>>>>>>>>> > That would have to be the property of the coder (which
>> means
>> > >>>>>>>>>>>>> that this property probably needs to be represented in
>> the portability
>> > >>>>>>>>>>>>> representation of the coder). I imagine the common use
>> cases will be for
>> > >>>>>>>>>>>>> simple coders like int, long, string, etc., which are
>> likely to sort the
>> > >>>>>>>>>>>>> same in most languages.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> The standard coders for both double and integral types do
>> not
>> > >>>>>>>>>>>>> respect
>> > >>>>>>>>>>>>> the natural ordering (consider negative values). KV coders
>> > >>>>>>>>>>>>> violate the
>> > >>>>>>>>>>>>> "natural" lexicographic ordering on components as well. I
>> think
>> > >>>>>>>>>>>>> implicitly sorting on encoded value would yield many
>> > >>>>>>>>>>>>> surprises. (The
>> > >>>>>>>>>>>>> state, of course, could take a order-preserving, bytes
>> > >>>>>>>>>>>>> (string?)-producing callable as a parameter of course).
>> (As for
>> > >>>>>>>>>>>>> naming, I'd probably call this OrderedBagState or
>> something
>> > >>>>>>>>>>>>> like
>> > >>>>>>>>>>>>> that...rather than Map which tends to imply random
>> access.)
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> >
>>
>

Re: [External] Re: DISCUSS: Sorted MapState API

Posted by Reuven Lax <re...@google.com>.

Hi,

Currently we only support removing a timestamp range. You can remove a
single timestamp of course by removing [ts, ts+1), however if there are
multiple elements with the same timestamp this will remove all of those
elements.

Does this fit your use case? If not, I wonder if MapState is closer to what
you are looking for?

Reuven

On Tue, Sep 15, 2020 at 2:33 PM Tyson Hamilton <ty...@google.com> wrote:

> Hi Reuven,
>
> I noticed that there was an implementation of the in-memory
> OrderedListState introduced [1]. Where can I find out more regarding the
> plan and design? Is there a design doc? I'd like to know more details about
> the implementation to see if it fits my use case. I was hoping it would
> have a remove(TimestampedValue<T> e) method.
>
> Thanks,
> -Tyson
>
>
> [1]:
> https://github.com/apache/beam/commit/9d0d0b0c4506b288164b155c5ce3a23d76db3c41
>
>
> On 2020/08/03 21:41:46, Catlyn Kong <ca...@yelp.com> wrote:
> > Hey folks,
> >
> > Sry I'm late to this thread but this might be very helpful for the
> problem
> > we're dealing with. Do we have a design doc or a jira ticket I can
> follow?
> >
> > Cheers,
> > Catlyn
> >
> > On Thu, Jun 18, 2020 at 1:11 PM Jan Lukavský <je...@seznam.cz> wrote:
> >
> > > My questions were just an example. I fully agree there is a fundamental
> > > need for a sorted state (of some form, and I also think this links to
> > > efficient implementation of retrations) - I was reacting to Kenn's
> question
> > > about BIP. This one would be pretty nice example why it would be good
> to
> > > have such a "process" - not everything can be solved on ML and there
> are
> > > fundamental decisions that might need a closer attention.
> > > On 6/18/20 5:28 PM, Reuven Lax wrote:
> > >
> > > Jan - my proposal is exactly TimeSortedBagState (more accurately -
> > > TimeSortedListState), though I went a bit further and also proposed a
> way
> > > to have a dynamic number of tagged TimeSortedBagStates.
> > >
> > > You are correct that the runner doesn't really have to store the data
> time
> > > sorted - what's actually needed is the ability to fetch and remove
> > > timestamp ranges of data (though that does include fetching the entire
> > > list); TimeOrderedState is probably a more accurate name then
> > > TimeSortedState. I don't think we could get away with operations that
> only
> > > act on the smallest timestamp, however we could limit the API to only
> being
> > > able to fetch and remove prefixes of data (ordered by timestamp).
> However
> > > if we support prefixes, we might as well support arbitrary subranges.
> > >
> > > On Thu, Jun 18, 2020 at 7:26 AM Jan Lukavský <je...@seznam.cz> wrote:
> > >
> > >> Big +1 for a BIP, as this might really help clarify all the pros and
> cons
> > >> of all possibilities. There seem to be questions that need answering
> and
> > >> motivating use cases - do we need sorted map state or can we solve
> our use
> > >> cases by something simpler - e.g. the mentioned TimeSortedBagState?
> Does
> > >> that really have to be time-sorted structure, or does it "only" have
> to
> > >> have operations that can efficiently find and remove element with
> smallest
> > >> timestamp (like a PriorityQueue)?
> > >>
> > >> Jan
> > >> On 6/18/20 5:32 AM, Kenneth Knowles wrote:
> > >>
> > >> Zooming in from generic philosophy to be clear: adding time ordered
> > >> buffer to the Fn state API is *not* a shortcut.It has benefits that
> will
> > >> not be achieved by SDK-side implementation on top of either ordered or
> > >> unordered multimap. Are those benefits worth expanding the API? I
> don't
> > >> know.
> > >>
> > >> A change to allow a runner to have a specialized implementation for
> > >> time-buffered state would be one or more StateKey types, right?
> Reuven,
> > >> maybe put this and your Java API in a doc? A BIP? Seems like there's
> at
> > >> least the following to explore:
> > >>
> > >>  - how that Java API would map to an SDK-side implementation on top of
> > >> multimap state key
> > >>  - how that Java API would map to a new StateKey
> > >>  - whether there's actually more than one relevant implementation of
> that
> > >> StateKey
> > >>  - whether SDK-side implementation on some other state key would be
> > >> performant enough in all SDK languages (present and future)
> > >>
> > >> Zooming back out to generic philosophy: Proliferation of StateKey
> > >> types tuned by runners (which can very easily still share
> implementation)
> > >> is probably better than proliferation of complex SDK-side
> implementations
> > >> with varying completeness and performance.
> > >>
> > >> Kenn
> > >>
> > >> On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <re...@google.com> wrote:
> > >>
> > >>> It might help for me to describe what I have in mind. I'm still
> > >>> proposing that we build multimap, just not a globally-sorted
> multimap.
> > >>>
> > >>> My previous proposal was that we provide a Multimap<Key, Value> state
> > >>> type, sorted by key. this would have two additional operations -
> > >>> multimap.getRange(startKey, endKey) and
> multimap.deleteRange(startKey,
> > >>> endKey). The primary use case was timestamp sorting, but I felt that
> a
> > >>> sorted multimap provided a nice generalization - after all, you can
> simply
> > >>> key the multimap by timestamp to get timestamp sorting.
> > >>>
> > >>> This approach had some issues immediately that would take some work
> to
> > >>> solve. Since a multimap key can have any type and a runner will only
> be
> > >>> able to sort by encoded type, we would need to introduce a concept of
> > >>> order-preserving coders into Beam and plumb that through. Robert
> pointed
> > >>> out that even our existing standard coders for simple integral types
> don't
> > >>> preserve order, so there will likely be surprises here.
> > >>>
> > >>> My current proposal is for a multimap that is not sorted by key, but
> > >>> that can support.ordered values for a single key. Remember that a
> multimap
> > >>> maps K -> Iterable<V>, so this means that each individual
> Iterable<V> is
> > >>> ordered, but the keys have no specific order relative to each other.
> This
> > >>> is not too different from many multimap implementations where the
> keys are
> > >>> unordered, but the list of values for a single key at least has a
> stable
> > >>> order.
> > >>>
> > >>> The interface would look like this:
> > >>>
> > >>> public interface MultimapState<K, V> extends State {
> > >>>   // Add a value with a default timestamp.
> > >>>   void put(K key, V value);
> > >>>
> > >>>   // Add a timestamped value.
> > >>>   void put(K, key, TimestampedValue<V> value);
> > >>>
> > >>>   // Remove all values for a key.
> > >>>   void remove (K key);
> > >>>
> > >>>   // Remove all values for a key with timestamps within the specified
> > >>> range.
> > >>>   void removeRange(K key, Instant startTs, Instant endTs);
> > >>>
> > >>>   // Get an Iterable of values for V. The Iterable will be returned
> > >>> sorted by timestamp.
> > >>>   ReadableState<Iterable<TimestampedValue<V>>> get(K key);
> > >>>
> > >>>   // Get an Iterable of values for V in the specified range. The
> > >>> Iterable will be returned sorted by timestamp.
> > >>>   ReadableState<Iterable<TimestampedValue<V>>> getRange(K key,
> Instant
> > >>> startTs, Instant endTs);
> > >>>
> > >>>   ReadableState<Iterable<K>> keys();
> > >>>   ReadableState<Iterable<TimestampedValue<V>>> values();
> > >>>   ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>> entries;
> > >>> }
> > >>>
> > >>> We can of course provide helper functions that allow using
> MultimapState
> > >>> without deailing with TimestampValue for users who only want a
> multimap and
> > >>> don't want sorting.
> > >>>
> > >>> I think many users will only need a single sorted list - not a full
> > >>> multimap. It's worth offering this as well, and we can simply build
> it on
> > >>> top of MultimapState. It will look like an extension of BagState
> > >>>
> > >>> public interface TimestampSortedListState<T> extends State {
> > >>>   void add(TimestampedValue<T> value);
> > >>>   Iterable<TimestampedValue<T>> read();
> > >>>   Iterable<TimestampedValue<T>> readRange(Instant startTs, Instant
> > >>> endTs);
> > >>>   void clearRange(Instant startTs, Instant endTs);
> > >>> }
> > >>>
> > >>>
> > >>> On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lc...@google.com> wrote:
> > >>>
> > >>>> The portability layer is meant to live across multiple versions of
> Beam
> > >>>> and I don't think it should be treated by doing the simple and
> useful thing
> > >>>> now since I believe it will lead to a proliferation of the API.
> > >>>>
> > >>>> On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles <ke...@apache.org>
> > >>>> wrote:
> > >>>>
> > >>>>> I have thoughts on the subject of whether to have APIs just for the
> > >>>>> lowest-level building blocks versus having APIs for higher-level
> > >>>>> constructs. Specifically this applies to providing only unsorted
> multimap
> > >>>>> vs what I will call "time-ordered buffer". TL;DR: I'd vote to
> focus on
> > >>>>> time-ordered buffer; if it turns out to be easy to go all the way
> to sorted
> > >>>>> multimap that's nice-to-have; if it turns out to be easy to
> implement on
> > >>>>> top of unsorted map state that should probably be under the hood
> > >>>>>
> > >>>>> Reasons to build low-level multimap in the runner & fn api and
> layer
> > >>>>> higher-level things in the SDK:
> > >>>>>
> > >>>>>  - It is less implementation for runners if they have to only
> provide
> > >>>>> fewer lower-level building blocks like multimap state.
> > >>>>>  - There are many more runners than SDKs (and will be even more and
> > >>>>> more) so this saves overall.
> > >>>>>
> > >>>>> Reasons to build higher-level constructs directly in the runner
> and fn
> > >>>>> api:
> > >>>>>
> > >>>>>  - Having multiple higher-level state types may actually be less
> > >>>>> implementation than one complex state type, especially if they map
> to
> > >>>>> runner primitives.
> > >>>>>  - The runner may have better specialized implementations,
> especially
> > >>>>> for something like a time-ordered buffer.
> > >>>>>  - The particular access patterns in an SDK-based implementation
> may
> > >>>>> not be ideal for each runner's underlying implementation of the
> low-level
> > >>>>> building block.
> > >>>>>  - There may be excessive gRPC overhead even for optimal access
> > >>>>> patterns.
> > >>>>>
> > >>>>> There are ways to have best of both worlds, like:
> > >>>>>
> > >>>>> 1. Define multiple state types according to fundamental access
> > >>>>> patterns, like we did this before portability.
> > >>>>> 2. If it is easy to layer one on top of the other, do that inside
> the
> > >>>>> runner. Provide shared code so for runners providing the
> lowest-level
> > >>>>> primitive they get all the types for free.
> > >>>>>
> > >>>>> I understand that this is an oversimplification. It still creates
> some
> > >>>>> more work. And APIs are a burden so it is good to introduce as few
> as
> > >>>>> possible for maintenance. But it has performance benefits and also
> unblocks
> > >>>>> "just doing the simple and useful thing now" which I always like
> to do as
> > >>>>> long as it is compatible with future changes. If the APIs are
> fundamental,
> > >>>>> like sets, maps, timestamp ordering, then it is safe to guess that
> they
> > >>>>> will change rarely and be useful forever.
> > >>>>>
> > >>>>> Kenn
> > >>>>>
> > >>>>> On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com>
> wrote:
> > >>>>>
> > >>>>>> I would be glad to take a stab at how to provide sorting on top of
> > >>>>>> unsorted multimap state.
> > >>>>>> Based upon your description, you want integer keys representing
> > >>>>>> timestamps and arbitrary user value for the values, is that
> correct?
> > >>>>>> What kinds of operations do you need on the sorted map state in
> order
> > >>>>>> of efficiency requirements?
> > >>>>>> (e.g. Next(x), Previous(x), GetAll(Range[x, y)),
> ClearAll(Range[x, y))
> > >>>>>> What kinds of operations do we expect the underlying unsorted map
> > >>>>>> state to be able to provide?
> > >>>>>> (at a minimum Get(K), Append(K), Clear(K) but what else e.g.
> > >>>>>> enumerate(K)?)
> > >>>>>>
> > >>>>>> I went through a similar exercise of how to provide a list like
> side
> > >>>>>> input view over a multimap[1] side input which efficiently allowed
> > >>>>>> computation of size and provided random access while only having
> access to
> > >>>>>> get(K) and enumerate K's.
> > >>>>>>
> > >>>>>> 1:
> > >>>>>>
> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
> > >>>>>>
> > >>>>>> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com>
> wrote:
> > >>>>>>
> > >>>>>>> Bringing this subject up again,
> > >>>>>>>
> > >>>>>>> I've spent some time looking into implementing this for the
> Dataflow
> > >>>>>>> runner. I'm unable to find a way to implement the arbitrary
> sorted multimap
> > >>>>>>> efficiently for the case where there are large numbers of unique
> keys.
> > >>>>>>> Since the primary driving use case is timestamp ordering (i.e.
> key is event
> > >>>>>>> timestamp), you would expect to have nearly a new key per
> element. I
> > >>>>>>> considered Luke's suggestion above, but unfortunately it doesn't
> really
> > >>>>>>> solve this issue.
> > >>>>>>>
> > >>>>>>> The primary use case for sorting always seems to be sorting by
> > >>>>>>> timestamp. I want to propose that instead of building the
> fully-general
> > >>>>>>> sorted multimap, we instead focus on a state type where the sort
> key is an
> > >>>>>>> integral type (like a timestamp or an integer). There is still a
> valid use
> > >>>>>>> case for multimap, but we can provide that as an unordered
> state. At least
> > >>>>>>> for Dataflow, it will be much easier
> > >>>>>>>
> > >>>>>>> While my difficulties here may be specific to the Dataflow
> runner,
> > >>>>>>> any such support would have to be built into other runners as
> well, and
> > >>>>>>> limiting to integral sorting likely makes it easier for other
> runners to
> > >>>>>>> implement this. Also, if you look at this
> > >>>>>>> <
> https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95>
> Flink
> > >>>>>>> comment pointed out by Aljoscha, for Flink the main use case
> identified was
> > >>>>>>> also timestamp sorting. This will also simplify the API design
> for this
> > >>>>>>> feature: Sorted multimap with arbitrary keys would require us to
> introduce
> > >>>>>>> a way of mapping natural ordering to encoded ordering (i.e. a new
> > >>>>>>> OrderPreservingCoder), but if we limit sort keys to integral
> types, the API
> > >>>>>>> design is simpler as integral types can be represented directly.
> > >>>>>>>
> > >>>>>>> Reuven
> > >>>>>>>
> > >>>>>>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com>
> wrote:
> > >>>>>>>
> > >>>>>>>> This sounds to me like a potential runner strategy. However if a
> > >>>>>>>> runner can natively support sorted maps (e.g. we expect the
> Dataflow runner
> > >>>>>>>> to be able to do so, and I think it would be useful for other
> runners as
> > >>>>>>>> well), then it's probably preferable to allow the runner to use
> its native
> > >>>>>>>> capabilities.
> > >>>>>>>>
> > >>>>>>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lc...@google.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> For the API that you proposed, the map key is always "void" and
> > >>>>>>>>> the sort key == user key. So in my example of
> > >>>>>>>>> key: dummy value
> > >>>>>>>>> key.000: token, (0001, value4)
> > >>>>>>>>> key.001: token, (0010, value1), (0011, value2)
> > >>>>>>>>> key.01: token
> > >>>>>>>>> key.1: token, (1011, value3)
> > >>>>>>>>> you would have:
> > >>>>>>>>> "void": dummy value
> > >>>>>>>>> "void".000: token, (0001, value4)
> > >>>>>>>>> "void".001: token, (0010, value1), (0011, value2)
> > >>>>>>>>> "void".01: token
> > >>>>>>>>> "void".1: token, (1011, value3)
> > >>>>>>>>>
> > >>>>>>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into
> walking
> > >>>>>>>>> the the prefixes until you find a common prefix for K and then
> filter for
> > >>>>>>>>> values where they have a sort key <= K. Using the example
> above, to find
> > >>>>>>>>> entriesUntil(0010) you would:
> > >>>>>>>>> look for key."", miss
> > >>>>>>>>> look for key.0, miss
> > >>>>>>>>> look for key.00, miss
> > >>>>>>>>> look for key.000, hit, sort all contained values using
> secondary
> > >>>>>>>>> key, provide value4 to user
> > >>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so
> we
> > >>>>>>>>> sort all contained values using secondary key, filter out
> value2 and
> > >>>>>>>>> provide value1
> > >>>>>>>>>
> > >>>>>>>>> void removeUntil(K limit) also translates into walking the
> > >>>>>>>>> prefixes but instead we will clear them when we have a "hit"
> with some
> > >>>>>>>>> special logic for when the sort key is a prefix of the key.
> Used the
> > >>>>>>>>> example, to removeUntil(0010) you would:
> > >>>>>>>>> look for key."", miss
> > >>>>>>>>> look for key.0, miss
> > >>>>>>>>> look for key.00, miss
> > >>>>>>>>> look for key.000, hit, clear
> > >>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so
> we
> > >>>>>>>>> sort all contained values using secondary key, store in memory
> all values
> > >>>>>>>>> that > 0010, clear and append values stored in memory.
> > >>>>>>>>>
> > >>>>>>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <re...@google.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Can you explain how fetching and deleting ranges of keys would
> > >>>>>>>>>> work with this data structure?
> > >>>>>>>>>>
> > >>>>>>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <lcwik@google.com
> >
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Reuven, for the example, I assume that we never want to store
> > >>>>>>>>>>> more then 2 values at a given sort key prefix, and if we do
> then we will
> > >>>>>>>>>>> create a new longer prefix splitting up the values based
> upon the sort key.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Tuple representation in examples below is (key, sort key,
> value)
> > >>>>>>>>>>> and . is a character outside of the alphabet which can be
> represented by
> > >>>>>>>>>>> using an escaping encoding that wraps the key + sort key
> encoding.
> > >>>>>>>>>>>
> > >>>>>>>>>>> To insert (key, 0010, value1), we lookup "key" + all the
> > >>>>>>>>>>> prefixes of 0010 finding one that is not empty. In this case
> its 0, so we
> > >>>>>>>>>>> append value to the map at key.0 ending up with (we also set
> the key to any
> > >>>>>>>>>>> dummy value to know that it it contains values):
> > >>>>>>>>>>> key: dummy value
> > >>>>>>>>>>> key."": token, (0010, value1)
> > >>>>>>>>>>> Now we insert (key, 0011, value2), we again lookup "key" +
> all
> > >>>>>>>>>>> the prefixes of 0010, finding "", so we append value2 to
> key."" ending up
> > >>>>>>>>>>> with:
> > >>>>>>>>>>> key: dummy value
> > >>>>>>>>>>> key."": token, (0010, value1), (0011, value2)
> > >>>>>>>>>>> Now we insert (key, 1011, value3), we again lookup "key" +
> all
> > >>>>>>>>>>> the prefixes of 1011 finding "" but notice that it is full,
> so we partition
> > >>>>>>>>>>> all the values into two prefixes 0 and 1. We also clear the
> "" prefix
> > >>>>>>>>>>> ending up with:
> > >>>>>>>>>>> key: dummy value
> > >>>>>>>>>>> key.0: token, (0010, value1), (0011, value2)
> > >>>>>>>>>>> key.1: token, (1011, value3)
> > >>>>>>>>>>> Now we insert (key, 0001, value4), we again lookup "key" +
> all
> > >>>>>>>>>>> the prefixes of the value finding 0 but notice that it is
> full, so we
> > >>>>>>>>>>> partition all the values into two prefixes 00 and 01 but
> notice this
> > >>>>>>>>>>> doesn't help us since 00 will be too full so we split 00
> again to 000, 001.
> > >>>>>>>>>>> We also clear the 0 prefix ending up with:
> > >>>>>>>>>>> key: dummy value
> > >>>>>>>>>>> key.000: token, (0001, value4)
> > >>>>>>>>>>> key.001: token, (0010, value1), (0011, value2)
> > >>>>>>>>>>> key.01: token
> > >>>>>>>>>>> key.1: token, (1011, value3)
> > >>>>>>>>>>>
> > >>>>>>>>>>> We are effectively building a trie[1] where we only have
> values
> > >>>>>>>>>>> at the leaves and control how full each leaf can be. There
> are other trie
> > >>>>>>>>>>> representations like a radix tree that may be better.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Looking up the values in sorted order for "key" would go like
> > >>>>>>>>>>> this:
> > >>>>>>>>>>> Is key set, yes
> > >>>>>>>>>>> look for key."", miss
> > >>>>>>>>>>> look for key.0, miss
> > >>>>>>>>>>> look for key.00, miss
> > >>>>>>>>>>> look for key.000, hit, sort all contained values using
> secondary
> > >>>>>>>>>>> key, provide value4 to user
> > >>>>>>>>>>> look for key.001, hit, sort all contained values using
> secondary
> > >>>>>>>>>>> key, provide value1 followed by value2 to user
> > >>>>>>>>>>> look for key.01, hit, empty, return no values to user
> > >>>>>>>>>>> look for key.1, hit, sort all contained values using
> secondary
> > >>>>>>>>>>> key, provide value3 to user
> > >>>>>>>>>>> we have walked the entire prefix space, signal end of
> iterable
> > >>>>>>>>>>>
> > >>>>>>>>>>> Some notes for the above:
> > >>>>>>>>>>> * The dummy value is used to know that the key contains
> values
> > >>>>>>>>>>> and the token is to know whether there are any values deeper
> in the trie so
> > >>>>>>>>>>> when we know when to stop searching.
> > >>>>>>>>>>> * If we can recalculate the sort key from the combination of
> the
> > >>>>>>>>>>> key and value, then we don't need to store it.
> > >>>>>>>>>>> * Keys with lots of values will perform worse then keys with
> > >>>>>>>>>>> less values since we have to look up more keys but they will
> be empty
> > >>>>>>>>>>> reads. The number of misses can be controlled by how many
> elements we are
> > >>>>>>>>>>> willing to store at a given node before we subdivide.
> > >>>>>>>>>>>
> > >>>>>>>>>>> In reality you could build a lot of structures (e.g. red
> black
> > >>>>>>>>>>> tree, binary tree) using the sort key, the issue is the cost
> of
> > >>>>>>>>>>> rebalancing/re-organizing the structure in map form and
> whether it has a
> > >>>>>>>>>>> convenient pre-order traversal for lookups.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <relax@google.com
> >
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Some great comments!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *Aljoscha*: absolutely this would have to be implemented by
> > >>>>>>>>>>>> runners to be efficient. We can of course provide a default
> (inefficient)
> > >>>>>>>>>>>> implementation, but ideally runners would provide better
> ones.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed by
> > >>>>>>>>>>>> this. E.g.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *Robert* Great point about standard coders not satisfying
> > >>>>>>>>>>>> this. That's why I suggested that we provide a way to tag
> the coders that
> > >>>>>>>>>>>> do preserve order, and only accept those as key coders
> Alternatively we
> > >>>>>>>>>>>> could present a more limited API - e.g. only allowing a
> hard-coded set of
> > >>>>>>>>>>>> types to be used as keys - but that seems counter to the
> direction Beam
> > >>>>>>>>>>>> usually goes. So users will have two ways .of creating
> multimap state specs:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>>
> state =
> > >>>>>>>>>>>> StateSpecs.multimap(VarLongCoder.of(),
> StringUtf8Coder.of());
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> or
> > >>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>>
> state =
> > >>>>>>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(),
> StringUtf8Coder.of());
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The second one will validate that the key coder preserves
> > >>>>>>>>>>>> order, and fails otherwise (similar to coder determinism
> checking in
> > >>>>>>>>>>>> GroupByKey). (BTW we would also have versions of these
> functions that use
> > >>>>>>>>>>>> coder inference to "guess" the coder, but those will do the
> same checking)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Also the API I proposed did support random access! We could
> > >>>>>>>>>>>> separate out OrderedBagState again if we think the use
> cases are
> > >>>>>>>>>>>> fundamentally different. I merged the proposal into that of
> MultimapState
> > >>>>>>>>>>>> because there seemed be 99% overlap.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Reuven
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <
> > >>>>>>>>>>>> robertwb@google.com> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <
> relax@google.com>
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>> >
> > >>>>>>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <
> > >>>>>>>>>>>>> altay@google.com> wrote:
> > >>>>>>>>>>>>> >>
> > >>>>>>>>>>>>> >>
> > >>>>>>>>>>>>> >>
> > >>>>>>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <
> > >>>>>>>>>>>>> lcwik@google.com> wrote:
> > >>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <
> > >>>>>>>>>>>>> ruwang@google.com> wrote:
> > >>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>> >>>>> A few obvious problems with this code:
> > >>>>>>>>>>>>> >>>>>   1. Removing the elements already processed from the
> > >>>>>>>>>>>>> bag requires clearing and rewriting the entire bag. This
> is O(n^2) in the
> > >>>>>>>>>>>>> number of input trades.
> > >>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade
> > >>>>>>>>>>>>> state?
> > >>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends
> State
> > >>>>>>>>>>>>> {
> > >>>>>>>>>>>>> >>>>>   // Add a value to the map.
> > >>>>>>>>>>>>> >>>>>   void put(K key, V value);
> > >>>>>>>>>>>>> >>>>>   // Get all values for a given key.
> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
> > >>>>>>>>>>>>> >>>>>  // Return all entries in the map.
> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
> > >>>>>>>>>>>>> >>>>>   // Return all entries in the map with keys <=
> limit.
> > >>>>>>>>>>>>> returned elements are sorted by the key.
> > >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K
> > >>>>>>>>>>>>> limit);
> > >>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>> >>>>>  // Remove all values with the given key;
> > >>>>>>>>>>>>> >>>>>   void remove(K key);
> > >>>>>>>>>>>>> >>>>>  // Remove all entries in the map with keys <= limit.
> > >>>>>>>>>>>>> >>>>>   void removeUntil(K limit);
> > >>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will
> remove
> > >>>>>>>>>>>>> all entries in the map with keys < limit.
> > >>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>> >>>>>
> > >>>>>>>>>>>>> >>>>> Runners will sort based on the encoded value of the
> key.
> > >>>>>>>>>>>>> In order to make this easier for users, I propose that we
> introduce a new
> > >>>>>>>>>>>>> tag on Coders PreservesOrder. A Coder that contains this
> tag guarantees
> > >>>>>>>>>>>>> that the encoded value preserves the same ordering as the
> base Java type.
> > >>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>> >>>>
> > >>>>>>>>>>>>> >>>> Could you clarify what is  "encoded value preserves
> the
> > >>>>>>>>>>>>> same ordering as the base Java type"?
> > >>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>> >>>
> > >>>>>>>>>>>>> >>> Lets say A and B represent two different instances of
> the
> > >>>>>>>>>>>>> same Java type like a double, then A < B (using the
> languages comparison
> > >>>>>>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded
> versions are compared
> > >>>>>>>>>>>>> lexicographically)
> > >>>>>>>>>>>>> >>
> > >>>>>>>>>>>>> >>
> > >>>>>>>>>>>>> >> Since coders are shared across SDKs, do we expect A < B
> iff
> > >>>>>>>>>>>>> e(A) < e(P) property to hold for all languages we support?
> What happens A,
> > >>>>>>>>>>>>> B sort differently in different languages?
> > >>>>>>>>>>>>> >
> > >>>>>>>>>>>>> >
> > >>>>>>>>>>>>> > That would have to be the property of the coder (which
> means
> > >>>>>>>>>>>>> that this property probably needs to be represented in the
> portability
> > >>>>>>>>>>>>> representation of the coder). I imagine the common use
> cases will be for
> > >>>>>>>>>>>>> simple coders like int, long, string, etc., which are
> likely to sort the
> > >>>>>>>>>>>>> same in most languages.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The standard coders for both double and integral types do
> not
> > >>>>>>>>>>>>> respect
> > >>>>>>>>>>>>> the natural ordering (consider negative values). KV coders
> > >>>>>>>>>>>>> violate the
> > >>>>>>>>>>>>> "natural" lexicographic ordering on components as well. I
> think
> > >>>>>>>>>>>>> implicitly sorting on encoded value would yield many
> > >>>>>>>>>>>>> surprises. (The
> > >>>>>>>>>>>>> state, of course, could take a order-preserving, bytes
> > >>>>>>>>>>>>> (string?)-producing callable as a parameter of course).
> (As for
> > >>>>>>>>>>>>> naming, I'd probably call this OrderedBagState or something
> > >>>>>>>>>>>>> like
> > >>>>>>>>>>>>> that...rather than Map which tends to imply random access.)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> >
>

Re: [External] Re: DISCUSS: Sorted MapState API

Posted by Tyson Hamilton <ty...@google.com>.

Hi Reuven,

I noticed that there was an implementation of the in-memory OrderedListState introduced [1]. Where can I find out more regarding the plan and design? Is there a design doc? I'd like to know more details about the implementation to see if it fits my use case. I was hoping it would have a remove(TimestampedValue<T> e) method.

Thanks,
-Tyson


[1]: https://github.com/apache/beam/commit/9d0d0b0c4506b288164b155c5ce3a23d76db3c41


On 2020/08/03 21:41:46, Catlyn Kong <ca...@yelp.com> wrote: 
> Hey folks,
> 
> Sry I'm late to this thread but this might be very helpful for the problem
> we're dealing with. Do we have a design doc or a jira ticket I can follow?
> 
> Cheers,
> Catlyn
> 
> On Thu, Jun 18, 2020 at 1:11 PM Jan Lukavský <je...@seznam.cz> wrote:
> 
> > My questions were just an example. I fully agree there is a fundamental
> > need for a sorted state (of some form, and I also think this links to
> > efficient implementation of retrations) - I was reacting to Kenn's question
> > about BIP. This one would be pretty nice example why it would be good to
> > have such a "process" - not everything can be solved on ML and there are
> > fundamental decisions that might need a closer attention.
> > On 6/18/20 5:28 PM, Reuven Lax wrote:
> >
> > Jan - my proposal is exactly TimeSortedBagState (more accurately -
> > TimeSortedListState), though I went a bit further and also proposed a way
> > to have a dynamic number of tagged TimeSortedBagStates.
> >
> > You are correct that the runner doesn't really have to store the data time
> > sorted - what's actually needed is the ability to fetch and remove
> > timestamp ranges of data (though that does include fetching the entire
> > list); TimeOrderedState is probably a more accurate name then
> > TimeSortedState. I don't think we could get away with operations that only
> > act on the smallest timestamp, however we could limit the API to only being
> > able to fetch and remove prefixes of data (ordered by timestamp). However
> > if we support prefixes, we might as well support arbitrary subranges.
> >
> > On Thu, Jun 18, 2020 at 7:26 AM Jan Lukavský <je...@seznam.cz> wrote:
> >
> >> Big +1 for a BIP, as this might really help clarify all the pros and cons
> >> of all possibilities. There seem to be questions that need answering and
> >> motivating use cases - do we need sorted map state or can we solve our use
> >> cases by something simpler - e.g. the mentioned TimeSortedBagState? Does
> >> that really have to be time-sorted structure, or does it "only" have to
> >> have operations that can efficiently find and remove element with smallest
> >> timestamp (like a PriorityQueue)?
> >>
> >> Jan
> >> On 6/18/20 5:32 AM, Kenneth Knowles wrote:
> >>
> >> Zooming in from generic philosophy to be clear: adding time ordered
> >> buffer to the Fn state API is *not* a shortcut.It has benefits that will
> >> not be achieved by SDK-side implementation on top of either ordered or
> >> unordered multimap. Are those benefits worth expanding the API? I don't
> >> know.
> >>
> >> A change to allow a runner to have a specialized implementation for
> >> time-buffered state would be one or more StateKey types, right? Reuven,
> >> maybe put this and your Java API in a doc? A BIP? Seems like there's at
> >> least the following to explore:
> >>
> >>  - how that Java API would map to an SDK-side implementation on top of
> >> multimap state key
> >>  - how that Java API would map to a new StateKey
> >>  - whether there's actually more than one relevant implementation of that
> >> StateKey
> >>  - whether SDK-side implementation on some other state key would be
> >> performant enough in all SDK languages (present and future)
> >>
> >> Zooming back out to generic philosophy: Proliferation of StateKey
> >> types tuned by runners (which can very easily still share implementation)
> >> is probably better than proliferation of complex SDK-side implementations
> >> with varying completeness and performance.
> >>
> >> Kenn
> >>
> >> On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <re...@google.com> wrote:
> >>
> >>> It might help for me to describe what I have in mind. I'm still
> >>> proposing that we build multimap, just not a globally-sorted multimap.
> >>>
> >>> My previous proposal was that we provide a Multimap<Key, Value> state
> >>> type, sorted by key. this would have two additional operations -
> >>> multimap.getRange(startKey, endKey) and multimap.deleteRange(startKey,
> >>> endKey). The primary use case was timestamp sorting, but I felt that a
> >>> sorted multimap provided a nice generalization - after all, you can simply
> >>> key the multimap by timestamp to get timestamp sorting.
> >>>
> >>> This approach had some issues immediately that would take some work to
> >>> solve. Since a multimap key can have any type and a runner will only be
> >>> able to sort by encoded type, we would need to introduce a concept of
> >>> order-preserving coders into Beam and plumb that through. Robert pointed
> >>> out that even our existing standard coders for simple integral types don't
> >>> preserve order, so there will likely be surprises here.
> >>>
> >>> My current proposal is for a multimap that is not sorted by key, but
> >>> that can support.ordered values for a single key. Remember that a multimap
> >>> maps K -> Iterable<V>, so this means that each individual Iterable<V> is
> >>> ordered, but the keys have no specific order relative to each other. This
> >>> is not too different from many multimap implementations where the keys are
> >>> unordered, but the list of values for a single key at least has a stable
> >>> order.
> >>>
> >>> The interface would look like this:
> >>>
> >>> public interface MultimapState<K, V> extends State {
> >>>   // Add a value with a default timestamp.
> >>>   void put(K key, V value);
> >>>
> >>>   // Add a timestamped value.
> >>>   void put(K, key, TimestampedValue<V> value);
> >>>
> >>>   // Remove all values for a key.
> >>>   void remove (K key);
> >>>
> >>>   // Remove all values for a key with timestamps within the specified
> >>> range.
> >>>   void removeRange(K key, Instant startTs, Instant endTs);
> >>>
> >>>   // Get an Iterable of values for V. The Iterable will be returned
> >>> sorted by timestamp.
> >>>   ReadableState<Iterable<TimestampedValue<V>>> get(K key);
> >>>
> >>>   // Get an Iterable of values for V in the specified range. The
> >>> Iterable will be returned sorted by timestamp.
> >>>   ReadableState<Iterable<TimestampedValue<V>>> getRange(K key, Instant
> >>> startTs, Instant endTs);
> >>>
> >>>   ReadableState<Iterable<K>> keys();
> >>>   ReadableState<Iterable<TimestampedValue<V>>> values();
> >>>   ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>> entries;
> >>> }
> >>>
> >>> We can of course provide helper functions that allow using MultimapState
> >>> without deailing with TimestampValue for users who only want a multimap and
> >>> don't want sorting.
> >>>
> >>> I think many users will only need a single sorted list - not a full
> >>> multimap. It's worth offering this as well, and we can simply build it on
> >>> top of MultimapState. It will look like an extension of BagState
> >>>
> >>> public interface TimestampSortedListState<T> extends State {
> >>>   void add(TimestampedValue<T> value);
> >>>   Iterable<TimestampedValue<T>> read();
> >>>   Iterable<TimestampedValue<T>> readRange(Instant startTs, Instant
> >>> endTs);
> >>>   void clearRange(Instant startTs, Instant endTs);
> >>> }
> >>>
> >>>
> >>> On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lc...@google.com> wrote:
> >>>
> >>>> The portability layer is meant to live across multiple versions of Beam
> >>>> and I don't think it should be treated by doing the simple and useful thing
> >>>> now since I believe it will lead to a proliferation of the API.
> >>>>
> >>>> On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles <ke...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> I have thoughts on the subject of whether to have APIs just for the
> >>>>> lowest-level building blocks versus having APIs for higher-level
> >>>>> constructs. Specifically this applies to providing only unsorted multimap
> >>>>> vs what I will call "time-ordered buffer". TL;DR: I'd vote to focus on
> >>>>> time-ordered buffer; if it turns out to be easy to go all the way to sorted
> >>>>> multimap that's nice-to-have; if it turns out to be easy to implement on
> >>>>> top of unsorted map state that should probably be under the hood
> >>>>>
> >>>>> Reasons to build low-level multimap in the runner & fn api and layer
> >>>>> higher-level things in the SDK:
> >>>>>
> >>>>>  - It is less implementation for runners if they have to only provide
> >>>>> fewer lower-level building blocks like multimap state.
> >>>>>  - There are many more runners than SDKs (and will be even more and
> >>>>> more) so this saves overall.
> >>>>>
> >>>>> Reasons to build higher-level constructs directly in the runner and fn
> >>>>> api:
> >>>>>
> >>>>>  - Having multiple higher-level state types may actually be less
> >>>>> implementation than one complex state type, especially if they map to
> >>>>> runner primitives.
> >>>>>  - The runner may have better specialized implementations, especially
> >>>>> for something like a time-ordered buffer.
> >>>>>  - The particular access patterns in an SDK-based implementation may
> >>>>> not be ideal for each runner's underlying implementation of the low-level
> >>>>> building block.
> >>>>>  - There may be excessive gRPC overhead even for optimal access
> >>>>> patterns.
> >>>>>
> >>>>> There are ways to have best of both worlds, like:
> >>>>>
> >>>>> 1. Define multiple state types according to fundamental access
> >>>>> patterns, like we did this before portability.
> >>>>> 2. If it is easy to layer one on top of the other, do that inside the
> >>>>> runner. Provide shared code so for runners providing the lowest-level
> >>>>> primitive they get all the types for free.
> >>>>>
> >>>>> I understand that this is an oversimplification. It still creates some
> >>>>> more work. And APIs are a burden so it is good to introduce as few as
> >>>>> possible for maintenance. But it has performance benefits and also unblocks
> >>>>> "just doing the simple and useful thing now" which I always like to do as
> >>>>> long as it is compatible with future changes. If the APIs are fundamental,
> >>>>> like sets, maps, timestamp ordering, then it is safe to guess that they
> >>>>> will change rarely and be useful forever.
> >>>>>
> >>>>> Kenn
> >>>>>
> >>>>> On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com> wrote:
> >>>>>
> >>>>>> I would be glad to take a stab at how to provide sorting on top of
> >>>>>> unsorted multimap state.
> >>>>>> Based upon your description, you want integer keys representing
> >>>>>> timestamps and arbitrary user value for the values, is that correct?
> >>>>>> What kinds of operations do you need on the sorted map state in order
> >>>>>> of efficiency requirements?
> >>>>>> (e.g. Next(x), Previous(x), GetAll(Range[x, y)), ClearAll(Range[x, y))
> >>>>>> What kinds of operations do we expect the underlying unsorted map
> >>>>>> state to be able to provide?
> >>>>>> (at a minimum Get(K), Append(K), Clear(K) but what else e.g.
> >>>>>> enumerate(K)?)
> >>>>>>
> >>>>>> I went through a similar exercise of how to provide a list like side
> >>>>>> input view over a multimap[1] side input which efficiently allowed
> >>>>>> computation of size and provided random access while only having access to
> >>>>>> get(K) and enumerate K's.
> >>>>>>
> >>>>>> 1:
> >>>>>> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
> >>>>>>
> >>>>>> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com> wrote:
> >>>>>>
> >>>>>>> Bringing this subject up again,
> >>>>>>>
> >>>>>>> I've spent some time looking into implementing this for the Dataflow
> >>>>>>> runner. I'm unable to find a way to implement the arbitrary sorted multimap
> >>>>>>> efficiently for the case where there are large numbers of unique keys.
> >>>>>>> Since the primary driving use case is timestamp ordering (i.e. key is event
> >>>>>>> timestamp), you would expect to have nearly a new key per element. I
> >>>>>>> considered Luke's suggestion above, but unfortunately it doesn't really
> >>>>>>> solve this issue.
> >>>>>>>
> >>>>>>> The primary use case for sorting always seems to be sorting by
> >>>>>>> timestamp. I want to propose that instead of building the fully-general
> >>>>>>> sorted multimap, we instead focus on a state type where the sort key is an
> >>>>>>> integral type (like a timestamp or an integer). There is still a valid use
> >>>>>>> case for multimap, but we can provide that as an unordered state. At least
> >>>>>>> for Dataflow, it will be much easier
> >>>>>>>
> >>>>>>> While my difficulties here may be specific to the Dataflow runner,
> >>>>>>> any such support would have to be built into other runners as well, and
> >>>>>>> limiting to integral sorting likely makes it easier for other runners to
> >>>>>>> implement this. Also, if you look at this
> >>>>>>> <https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95> Flink
> >>>>>>> comment pointed out by Aljoscha, for Flink the main use case identified was
> >>>>>>> also timestamp sorting. This will also simplify the API design for this
> >>>>>>> feature: Sorted multimap with arbitrary keys would require us to introduce
> >>>>>>> a way of mapping natural ordering to encoded ordering (i.e. a new
> >>>>>>> OrderPreservingCoder), but if we limit sort keys to integral types, the API
> >>>>>>> design is simpler as integral types can be represented directly.
> >>>>>>>
> >>>>>>> Reuven
> >>>>>>>
> >>>>>>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com> wrote:
> >>>>>>>
> >>>>>>>> This sounds to me like a potential runner strategy. However if a
> >>>>>>>> runner can natively support sorted maps (e.g. we expect the Dataflow runner
> >>>>>>>> to be able to do so, and I think it would be useful for other runners as
> >>>>>>>> well), then it's probably preferable to allow the runner to use its native
> >>>>>>>> capabilities.
> >>>>>>>>
> >>>>>>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lc...@google.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> For the API that you proposed, the map key is always "void" and
> >>>>>>>>> the sort key == user key. So in my example of
> >>>>>>>>> key: dummy value
> >>>>>>>>> key.000: token, (0001, value4)
> >>>>>>>>> key.001: token, (0010, value1), (0011, value2)
> >>>>>>>>> key.01: token
> >>>>>>>>> key.1: token, (1011, value3)
> >>>>>>>>> you would have:
> >>>>>>>>> "void": dummy value
> >>>>>>>>> "void".000: token, (0001, value4)
> >>>>>>>>> "void".001: token, (0010, value1), (0011, value2)
> >>>>>>>>> "void".01: token
> >>>>>>>>> "void".1: token, (1011, value3)
> >>>>>>>>>
> >>>>>>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into walking
> >>>>>>>>> the the prefixes until you find a common prefix for K and then filter for
> >>>>>>>>> values where they have a sort key <= K. Using the example above, to find
> >>>>>>>>> entriesUntil(0010) you would:
> >>>>>>>>> look for key."", miss
> >>>>>>>>> look for key.0, miss
> >>>>>>>>> look for key.00, miss
> >>>>>>>>> look for key.000, hit, sort all contained values using secondary
> >>>>>>>>> key, provide value4 to user
> >>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we
> >>>>>>>>> sort all contained values using secondary key, filter out value2 and
> >>>>>>>>> provide value1
> >>>>>>>>>
> >>>>>>>>> void removeUntil(K limit) also translates into walking the
> >>>>>>>>> prefixes but instead we will clear them when we have a "hit" with some
> >>>>>>>>> special logic for when the sort key is a prefix of the key. Used the
> >>>>>>>>> example, to removeUntil(0010) you would:
> >>>>>>>>> look for key."", miss
> >>>>>>>>> look for key.0, miss
> >>>>>>>>> look for key.00, miss
> >>>>>>>>> look for key.000, hit, clear
> >>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we
> >>>>>>>>> sort all contained values using secondary key, store in memory all values
> >>>>>>>>> that > 0010, clear and append values stored in memory.
> >>>>>>>>>
> >>>>>>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <re...@google.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Can you explain how fetching and deleting ranges of keys would
> >>>>>>>>>> work with this data structure?
> >>>>>>>>>>
> >>>>>>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <lc...@google.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Reuven, for the example, I assume that we never want to store
> >>>>>>>>>>> more then 2 values at a given sort key prefix, and if we do then we will
> >>>>>>>>>>> create a new longer prefix splitting up the values based upon the sort key.
> >>>>>>>>>>>
> >>>>>>>>>>> Tuple representation in examples below is (key, sort key, value)
> >>>>>>>>>>> and . is a character outside of the alphabet which can be represented by
> >>>>>>>>>>> using an escaping encoding that wraps the key + sort key encoding.
> >>>>>>>>>>>
> >>>>>>>>>>> To insert (key, 0010, value1), we lookup "key" + all the
> >>>>>>>>>>> prefixes of 0010 finding one that is not empty. In this case its 0, so we
> >>>>>>>>>>> append value to the map at key.0 ending up with (we also set the key to any
> >>>>>>>>>>> dummy value to know that it it contains values):
> >>>>>>>>>>> key: dummy value
> >>>>>>>>>>> key."": token, (0010, value1)
> >>>>>>>>>>> Now we insert (key, 0011, value2), we again lookup "key" + all
> >>>>>>>>>>> the prefixes of 0010, finding "", so we append value2 to key."" ending up
> >>>>>>>>>>> with:
> >>>>>>>>>>> key: dummy value
> >>>>>>>>>>> key."": token, (0010, value1), (0011, value2)
> >>>>>>>>>>> Now we insert (key, 1011, value3), we again lookup "key" + all
> >>>>>>>>>>> the prefixes of 1011 finding "" but notice that it is full, so we partition
> >>>>>>>>>>> all the values into two prefixes 0 and 1. We also clear the "" prefix
> >>>>>>>>>>> ending up with:
> >>>>>>>>>>> key: dummy value
> >>>>>>>>>>> key.0: token, (0010, value1), (0011, value2)
> >>>>>>>>>>> key.1: token, (1011, value3)
> >>>>>>>>>>> Now we insert (key, 0001, value4), we again lookup "key" + all
> >>>>>>>>>>> the prefixes of the value finding 0 but notice that it is full, so we
> >>>>>>>>>>> partition all the values into two prefixes 00 and 01 but notice this
> >>>>>>>>>>> doesn't help us since 00 will be too full so we split 00 again to 000, 001.
> >>>>>>>>>>> We also clear the 0 prefix ending up with:
> >>>>>>>>>>> key: dummy value
> >>>>>>>>>>> key.000: token, (0001, value4)
> >>>>>>>>>>> key.001: token, (0010, value1), (0011, value2)
> >>>>>>>>>>> key.01: token
> >>>>>>>>>>> key.1: token, (1011, value3)
> >>>>>>>>>>>
> >>>>>>>>>>> We are effectively building a trie[1] where we only have values
> >>>>>>>>>>> at the leaves and control how full each leaf can be. There are other trie
> >>>>>>>>>>> representations like a radix tree that may be better.
> >>>>>>>>>>>
> >>>>>>>>>>> Looking up the values in sorted order for "key" would go like
> >>>>>>>>>>> this:
> >>>>>>>>>>> Is key set, yes
> >>>>>>>>>>> look for key."", miss
> >>>>>>>>>>> look for key.0, miss
> >>>>>>>>>>> look for key.00, miss
> >>>>>>>>>>> look for key.000, hit, sort all contained values using secondary
> >>>>>>>>>>> key, provide value4 to user
> >>>>>>>>>>> look for key.001, hit, sort all contained values using secondary
> >>>>>>>>>>> key, provide value1 followed by value2 to user
> >>>>>>>>>>> look for key.01, hit, empty, return no values to user
> >>>>>>>>>>> look for key.1, hit, sort all contained values using secondary
> >>>>>>>>>>> key, provide value3 to user
> >>>>>>>>>>> we have walked the entire prefix space, signal end of iterable
> >>>>>>>>>>>
> >>>>>>>>>>> Some notes for the above:
> >>>>>>>>>>> * The dummy value is used to know that the key contains values
> >>>>>>>>>>> and the token is to know whether there are any values deeper in the trie so
> >>>>>>>>>>> when we know when to stop searching.
> >>>>>>>>>>> * If we can recalculate the sort key from the combination of the
> >>>>>>>>>>> key and value, then we don't need to store it.
> >>>>>>>>>>> * Keys with lots of values will perform worse then keys with
> >>>>>>>>>>> less values since we have to look up more keys but they will be empty
> >>>>>>>>>>> reads. The number of misses can be controlled by how many elements we are
> >>>>>>>>>>> willing to store at a given node before we subdivide.
> >>>>>>>>>>>
> >>>>>>>>>>> In reality you could build a lot of structures (e.g. red black
> >>>>>>>>>>> tree, binary tree) using the sort key, the issue is the cost of
> >>>>>>>>>>> rebalancing/re-organizing the structure in map form and whether it has a
> >>>>>>>>>>> convenient pre-order traversal for lookups.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <re...@google.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Some great comments!
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Aljoscha*: absolutely this would have to be implemented by
> >>>>>>>>>>>> runners to be efficient. We can of course provide a default (inefficient)
> >>>>>>>>>>>> implementation, but ideally runners would provide better ones.
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed by
> >>>>>>>>>>>> this. E.g.
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Robert* Great point about standard coders not satisfying
> >>>>>>>>>>>> this. That's why I suggested that we provide a way to tag the coders that
> >>>>>>>>>>>> do preserve order, and only accept those as key coders Alternatively we
> >>>>>>>>>>>> could present a more limited API - e.g. only allowing a hard-coded set of
> >>>>>>>>>>>> types to be used as keys - but that seems counter to the direction Beam
> >>>>>>>>>>>> usually goes. So users will have two ways .of creating multimap state specs:
> >>>>>>>>>>>>
> >>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
> >>>>>>>>>>>> StateSpecs.multimap(VarLongCoder.of(), StringUtf8Coder.of());
> >>>>>>>>>>>>
> >>>>>>>>>>>> or
> >>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
> >>>>>>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(), StringUtf8Coder.of());
> >>>>>>>>>>>>
> >>>>>>>>>>>> The second one will validate that the key coder preserves
> >>>>>>>>>>>> order, and fails otherwise (similar to coder determinism checking in
> >>>>>>>>>>>> GroupByKey). (BTW we would also have versions of these functions that use
> >>>>>>>>>>>> coder inference to "guess" the coder, but those will do the same checking)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also the API I proposed did support random access! We could
> >>>>>>>>>>>> separate out OrderedBagState again if we think the use cases are
> >>>>>>>>>>>> fundamentally different. I merged the proposal into that of MultimapState
> >>>>>>>>>>>> because there seemed be 99% overlap.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Reuven
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <
> >>>>>>>>>>>> robertwb@google.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <re...@google.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>> >
> >>>>>>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <
> >>>>>>>>>>>>> altay@google.com> wrote:
> >>>>>>>>>>>>> >>
> >>>>>>>>>>>>> >>
> >>>>>>>>>>>>> >>
> >>>>>>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <
> >>>>>>>>>>>>> lcwik@google.com> wrote:
> >>>>>>>>>>>>> >>>
> >>>>>>>>>>>>> >>>
> >>>>>>>>>>>>> >>>
> >>>>>>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <
> >>>>>>>>>>>>> ruwang@google.com> wrote:
> >>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>> >>>>> A few obvious problems with this code:
> >>>>>>>>>>>>> >>>>>   1. Removing the elements already processed from the
> >>>>>>>>>>>>> bag requires clearing and rewriting the entire bag. This is O(n^2) in the
> >>>>>>>>>>>>> number of input trades.
> >>>>>>>>>>>>> >>>>
> >>>>>>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade
> >>>>>>>>>>>>> state?
> >>>>>>>>>>>>> >>>>
> >>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends State
> >>>>>>>>>>>>> {
> >>>>>>>>>>>>> >>>>>   // Add a value to the map.
> >>>>>>>>>>>>> >>>>>   void put(K key, V value);
> >>>>>>>>>>>>> >>>>>   // Get all values for a given key.
> >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
> >>>>>>>>>>>>> >>>>>  // Return all entries in the map.
> >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
> >>>>>>>>>>>>> >>>>>   // Return all entries in the map with keys <= limit.
> >>>>>>>>>>>>> returned elements are sorted by the key.
> >>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K
> >>>>>>>>>>>>> limit);
> >>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>> >>>>>  // Remove all values with the given key;
> >>>>>>>>>>>>> >>>>>   void remove(K key);
> >>>>>>>>>>>>> >>>>>  // Remove all entries in the map with keys <= limit.
> >>>>>>>>>>>>> >>>>>   void removeUntil(K limit);
> >>>>>>>>>>>>> >>>>
> >>>>>>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will remove
> >>>>>>>>>>>>> all entries in the map with keys < limit.
> >>>>>>>>>>>>> >>>>
> >>>>>>>>>>>>> >>>>>
> >>>>>>>>>>>>> >>>>> Runners will sort based on the encoded value of the key.
> >>>>>>>>>>>>> In order to make this easier for users, I propose that we introduce a new
> >>>>>>>>>>>>> tag on Coders PreservesOrder. A Coder that contains this tag guarantees
> >>>>>>>>>>>>> that the encoded value preserves the same ordering as the base Java type.
> >>>>>>>>>>>>> >>>>
> >>>>>>>>>>>>> >>>>
> >>>>>>>>>>>>> >>>> Could you clarify what is  "encoded value preserves the
> >>>>>>>>>>>>> same ordering as the base Java type"?
> >>>>>>>>>>>>> >>>
> >>>>>>>>>>>>> >>>
> >>>>>>>>>>>>> >>> Lets say A and B represent two different instances of the
> >>>>>>>>>>>>> same Java type like a double, then A < B (using the languages comparison
> >>>>>>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded versions are compared
> >>>>>>>>>>>>> lexicographically)
> >>>>>>>>>>>>> >>
> >>>>>>>>>>>>> >>
> >>>>>>>>>>>>> >> Since coders are shared across SDKs, do we expect A < B iff
> >>>>>>>>>>>>> e(A) < e(P) property to hold for all languages we support? What happens A,
> >>>>>>>>>>>>> B sort differently in different languages?
> >>>>>>>>>>>>> >
> >>>>>>>>>>>>> >
> >>>>>>>>>>>>> > That would have to be the property of the coder (which means
> >>>>>>>>>>>>> that this property probably needs to be represented in the portability
> >>>>>>>>>>>>> representation of the coder). I imagine the common use cases will be for
> >>>>>>>>>>>>> simple coders like int, long, string, etc., which are likely to sort the
> >>>>>>>>>>>>> same in most languages.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The standard coders for both double and integral types do not
> >>>>>>>>>>>>> respect
> >>>>>>>>>>>>> the natural ordering (consider negative values). KV coders
> >>>>>>>>>>>>> violate the
> >>>>>>>>>>>>> "natural" lexicographic ordering on components as well. I think
> >>>>>>>>>>>>> implicitly sorting on encoded value would yield many
> >>>>>>>>>>>>> surprises. (The
> >>>>>>>>>>>>> state, of course, could take a order-preserving, bytes
> >>>>>>>>>>>>> (string?)-producing callable as a parameter of course). (As for
> >>>>>>>>>>>>> naming, I'd probably call this OrderedBagState or something
> >>>>>>>>>>>>> like
> >>>>>>>>>>>>> that...rather than Map which tends to imply random access.)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
>

Re: [External] Re: DISCUSS: Sorted MapState API

Posted by Catlyn Kong <ca...@yelp.com>.

Hey folks,

Sry I'm late to this thread but this might be very helpful for the problem
we're dealing with. Do we have a design doc or a jira ticket I can follow?

Cheers,
Catlyn

On Thu, Jun 18, 2020 at 1:11 PM Jan Lukavský <je...@seznam.cz> wrote:

> My questions were just an example. I fully agree there is a fundamental
> need for a sorted state (of some form, and I also think this links to
> efficient implementation of retrations) - I was reacting to Kenn's question
> about BIP. This one would be pretty nice example why it would be good to
> have such a "process" - not everything can be solved on ML and there are
> fundamental decisions that might need a closer attention.
> On 6/18/20 5:28 PM, Reuven Lax wrote:
>
> Jan - my proposal is exactly TimeSortedBagState (more accurately -
> TimeSortedListState), though I went a bit further and also proposed a way
> to have a dynamic number of tagged TimeSortedBagStates.
>
> You are correct that the runner doesn't really have to store the data time
> sorted - what's actually needed is the ability to fetch and remove
> timestamp ranges of data (though that does include fetching the entire
> list); TimeOrderedState is probably a more accurate name then
> TimeSortedState. I don't think we could get away with operations that only
> act on the smallest timestamp, however we could limit the API to only being
> able to fetch and remove prefixes of data (ordered by timestamp). However
> if we support prefixes, we might as well support arbitrary subranges.
>
> On Thu, Jun 18, 2020 at 7:26 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Big +1 for a BIP, as this might really help clarify all the pros and cons
>> of all possibilities. There seem to be questions that need answering and
>> motivating use cases - do we need sorted map state or can we solve our use
>> cases by something simpler - e.g. the mentioned TimeSortedBagState? Does
>> that really have to be time-sorted structure, or does it "only" have to
>> have operations that can efficiently find and remove element with smallest
>> timestamp (like a PriorityQueue)?
>>
>> Jan
>> On 6/18/20 5:32 AM, Kenneth Knowles wrote:
>>
>> Zooming in from generic philosophy to be clear: adding time ordered
>> buffer to the Fn state API is *not* a shortcut.It has benefits that will
>> not be achieved by SDK-side implementation on top of either ordered or
>> unordered multimap. Are those benefits worth expanding the API? I don't
>> know.
>>
>> A change to allow a runner to have a specialized implementation for
>> time-buffered state would be one or more StateKey types, right? Reuven,
>> maybe put this and your Java API in a doc? A BIP? Seems like there's at
>> least the following to explore:
>>
>>  - how that Java API would map to an SDK-side implementation on top of
>> multimap state key
>>  - how that Java API would map to a new StateKey
>>  - whether there's actually more than one relevant implementation of that
>> StateKey
>>  - whether SDK-side implementation on some other state key would be
>> performant enough in all SDK languages (present and future)
>>
>> Zooming back out to generic philosophy: Proliferation of StateKey
>> types tuned by runners (which can very easily still share implementation)
>> is probably better than proliferation of complex SDK-side implementations
>> with varying completeness and performance.
>>
>> Kenn
>>
>> On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <re...@google.com> wrote:
>>
>>> It might help for me to describe what I have in mind. I'm still
>>> proposing that we build multimap, just not a globally-sorted multimap.
>>>
>>> My previous proposal was that we provide a Multimap<Key, Value> state
>>> type, sorted by key. this would have two additional operations -
>>> multimap.getRange(startKey, endKey) and multimap.deleteRange(startKey,
>>> endKey). The primary use case was timestamp sorting, but I felt that a
>>> sorted multimap provided a nice generalization - after all, you can simply
>>> key the multimap by timestamp to get timestamp sorting.
>>>
>>> This approach had some issues immediately that would take some work to
>>> solve. Since a multimap key can have any type and a runner will only be
>>> able to sort by encoded type, we would need to introduce a concept of
>>> order-preserving coders into Beam and plumb that through. Robert pointed
>>> out that even our existing standard coders for simple integral types don't
>>> preserve order, so there will likely be surprises here.
>>>
>>> My current proposal is for a multimap that is not sorted by key, but
>>> that can support.ordered values for a single key. Remember that a multimap
>>> maps K -> Iterable<V>, so this means that each individual Iterable<V> is
>>> ordered, but the keys have no specific order relative to each other. This
>>> is not too different from many multimap implementations where the keys are
>>> unordered, but the list of values for a single key at least has a stable
>>> order.
>>>
>>> The interface would look like this:
>>>
>>> public interface MultimapState<K, V> extends State {
>>>   // Add a value with a default timestamp.
>>>   void put(K key, V value);
>>>
>>>   // Add a timestamped value.
>>>   void put(K, key, TimestampedValue<V> value);
>>>
>>>   // Remove all values for a key.
>>>   void remove (K key);
>>>
>>>   // Remove all values for a key with timestamps within the specified
>>> range.
>>>   void removeRange(K key, Instant startTs, Instant endTs);
>>>
>>>   // Get an Iterable of values for V. The Iterable will be returned
>>> sorted by timestamp.
>>>   ReadableState<Iterable<TimestampedValue<V>>> get(K key);
>>>
>>>   // Get an Iterable of values for V in the specified range. The
>>> Iterable will be returned sorted by timestamp.
>>>   ReadableState<Iterable<TimestampedValue<V>>> getRange(K key, Instant
>>> startTs, Instant endTs);
>>>
>>>   ReadableState<Iterable<K>> keys();
>>>   ReadableState<Iterable<TimestampedValue<V>>> values();
>>>   ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>> entries;
>>> }
>>>
>>> We can of course provide helper functions that allow using MultimapState
>>> without deailing with TimestampValue for users who only want a multimap and
>>> don't want sorting.
>>>
>>> I think many users will only need a single sorted list - not a full
>>> multimap. It's worth offering this as well, and we can simply build it on
>>> top of MultimapState. It will look like an extension of BagState
>>>
>>> public interface TimestampSortedListState<T> extends State {
>>>   void add(TimestampedValue<T> value);
>>>   Iterable<TimestampedValue<T>> read();
>>>   Iterable<TimestampedValue<T>> readRange(Instant startTs, Instant
>>> endTs);
>>>   void clearRange(Instant startTs, Instant endTs);
>>> }
>>>
>>>
>>> On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> The portability layer is meant to live across multiple versions of Beam
>>>> and I don't think it should be treated by doing the simple and useful thing
>>>> now since I believe it will lead to a proliferation of the API.
>>>>
>>>> On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles <ke...@apache.org>
>>>> wrote:
>>>>
>>>>> I have thoughts on the subject of whether to have APIs just for the
>>>>> lowest-level building blocks versus having APIs for higher-level
>>>>> constructs. Specifically this applies to providing only unsorted multimap
>>>>> vs what I will call "time-ordered buffer". TL;DR: I'd vote to focus on
>>>>> time-ordered buffer; if it turns out to be easy to go all the way to sorted
>>>>> multimap that's nice-to-have; if it turns out to be easy to implement on
>>>>> top of unsorted map state that should probably be under the hood
>>>>>
>>>>> Reasons to build low-level multimap in the runner & fn api and layer
>>>>> higher-level things in the SDK:
>>>>>
>>>>>  - It is less implementation for runners if they have to only provide
>>>>> fewer lower-level building blocks like multimap state.
>>>>>  - There are many more runners than SDKs (and will be even more and
>>>>> more) so this saves overall.
>>>>>
>>>>> Reasons to build higher-level constructs directly in the runner and fn
>>>>> api:
>>>>>
>>>>>  - Having multiple higher-level state types may actually be less
>>>>> implementation than one complex state type, especially if they map to
>>>>> runner primitives.
>>>>>  - The runner may have better specialized implementations, especially
>>>>> for something like a time-ordered buffer.
>>>>>  - The particular access patterns in an SDK-based implementation may
>>>>> not be ideal for each runner's underlying implementation of the low-level
>>>>> building block.
>>>>>  - There may be excessive gRPC overhead even for optimal access
>>>>> patterns.
>>>>>
>>>>> There are ways to have best of both worlds, like:
>>>>>
>>>>> 1. Define multiple state types according to fundamental access
>>>>> patterns, like we did this before portability.
>>>>> 2. If it is easy to layer one on top of the other, do that inside the
>>>>> runner. Provide shared code so for runners providing the lowest-level
>>>>> primitive they get all the types for free.
>>>>>
>>>>> I understand that this is an oversimplification. It still creates some
>>>>> more work. And APIs are a burden so it is good to introduce as few as
>>>>> possible for maintenance. But it has performance benefits and also unblocks
>>>>> "just doing the simple and useful thing now" which I always like to do as
>>>>> long as it is compatible with future changes. If the APIs are fundamental,
>>>>> like sets, maps, timestamp ordering, then it is safe to guess that they
>>>>> will change rarely and be useful forever.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> I would be glad to take a stab at how to provide sorting on top of
>>>>>> unsorted multimap state.
>>>>>> Based upon your description, you want integer keys representing
>>>>>> timestamps and arbitrary user value for the values, is that correct?
>>>>>> What kinds of operations do you need on the sorted map state in order
>>>>>> of efficiency requirements?
>>>>>> (e.g. Next(x), Previous(x), GetAll(Range[x, y)), ClearAll(Range[x, y))
>>>>>> What kinds of operations do we expect the underlying unsorted map
>>>>>> state to be able to provide?
>>>>>> (at a minimum Get(K), Append(K), Clear(K) but what else e.g.
>>>>>> enumerate(K)?)
>>>>>>
>>>>>> I went through a similar exercise of how to provide a list like side
>>>>>> input view over a multimap[1] side input which efficiently allowed
>>>>>> computation of size and provided random access while only having access to
>>>>>> get(K) and enumerate K's.
>>>>>>
>>>>>> 1:
>>>>>> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>>>>>>
>>>>>> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> Bringing this subject up again,
>>>>>>>
>>>>>>> I've spent some time looking into implementing this for the Dataflow
>>>>>>> runner. I'm unable to find a way to implement the arbitrary sorted multimap
>>>>>>> efficiently for the case where there are large numbers of unique keys.
>>>>>>> Since the primary driving use case is timestamp ordering (i.e. key is event
>>>>>>> timestamp), you would expect to have nearly a new key per element. I
>>>>>>> considered Luke's suggestion above, but unfortunately it doesn't really
>>>>>>> solve this issue.
>>>>>>>
>>>>>>> The primary use case for sorting always seems to be sorting by
>>>>>>> timestamp. I want to propose that instead of building the fully-general
>>>>>>> sorted multimap, we instead focus on a state type where the sort key is an
>>>>>>> integral type (like a timestamp or an integer). There is still a valid use
>>>>>>> case for multimap, but we can provide that as an unordered state. At least
>>>>>>> for Dataflow, it will be much easier
>>>>>>>
>>>>>>> While my difficulties here may be specific to the Dataflow runner,
>>>>>>> any such support would have to be built into other runners as well, and
>>>>>>> limiting to integral sorting likely makes it easier for other runners to
>>>>>>> implement this. Also, if you look at this
>>>>>>> <https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95> Flink
>>>>>>> comment pointed out by Aljoscha, for Flink the main use case identified was
>>>>>>> also timestamp sorting. This will also simplify the API design for this
>>>>>>> feature: Sorted multimap with arbitrary keys would require us to introduce
>>>>>>> a way of mapping natural ordering to encoded ordering (i.e. a new
>>>>>>> OrderPreservingCoder), but if we limit sort keys to integral types, the API
>>>>>>> design is simpler as integral types can be represented directly.
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> This sounds to me like a potential runner strategy. However if a
>>>>>>>> runner can natively support sorted maps (e.g. we expect the Dataflow runner
>>>>>>>> to be able to do so, and I think it would be useful for other runners as
>>>>>>>> well), then it's probably preferable to allow the runner to use its native
>>>>>>>> capabilities.
>>>>>>>>
>>>>>>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lc...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> For the API that you proposed, the map key is always "void" and
>>>>>>>>> the sort key == user key. So in my example of
>>>>>>>>> key: dummy value
>>>>>>>>> key.000: token, (0001, value4)
>>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>>>>>> key.01: token
>>>>>>>>> key.1: token, (1011, value3)
>>>>>>>>> you would have:
>>>>>>>>> "void": dummy value
>>>>>>>>> "void".000: token, (0001, value4)
>>>>>>>>> "void".001: token, (0010, value1), (0011, value2)
>>>>>>>>> "void".01: token
>>>>>>>>> "void".1: token, (1011, value3)
>>>>>>>>>
>>>>>>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into walking
>>>>>>>>> the the prefixes until you find a common prefix for K and then filter for
>>>>>>>>> values where they have a sort key <= K. Using the example above, to find
>>>>>>>>> entriesUntil(0010) you would:
>>>>>>>>> look for key."", miss
>>>>>>>>> look for key.0, miss
>>>>>>>>> look for key.00, miss
>>>>>>>>> look for key.000, hit, sort all contained values using secondary
>>>>>>>>> key, provide value4 to user
>>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we
>>>>>>>>> sort all contained values using secondary key, filter out value2 and
>>>>>>>>> provide value1
>>>>>>>>>
>>>>>>>>> void removeUntil(K limit) also translates into walking the
>>>>>>>>> prefixes but instead we will clear them when we have a "hit" with some
>>>>>>>>> special logic for when the sort key is a prefix of the key. Used the
>>>>>>>>> example, to removeUntil(0010) you would:
>>>>>>>>> look for key."", miss
>>>>>>>>> look for key.0, miss
>>>>>>>>> look for key.00, miss
>>>>>>>>> look for key.000, hit, clear
>>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we
>>>>>>>>> sort all contained values using secondary key, store in memory all values
>>>>>>>>> that > 0010, clear and append values stored in memory.
>>>>>>>>>
>>>>>>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Can you explain how fetching and deleting ranges of keys would
>>>>>>>>>> work with this data structure?
>>>>>>>>>>
>>>>>>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <lc...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Reuven, for the example, I assume that we never want to store
>>>>>>>>>>> more then 2 values at a given sort key prefix, and if we do then we will
>>>>>>>>>>> create a new longer prefix splitting up the values based upon the sort key.
>>>>>>>>>>>
>>>>>>>>>>> Tuple representation in examples below is (key, sort key, value)
>>>>>>>>>>> and . is a character outside of the alphabet which can be represented by
>>>>>>>>>>> using an escaping encoding that wraps the key + sort key encoding.
>>>>>>>>>>>
>>>>>>>>>>> To insert (key, 0010, value1), we lookup "key" + all the
>>>>>>>>>>> prefixes of 0010 finding one that is not empty. In this case its 0, so we
>>>>>>>>>>> append value to the map at key.0 ending up with (we also set the key to any
>>>>>>>>>>> dummy value to know that it it contains values):
>>>>>>>>>>> key: dummy value
>>>>>>>>>>> key."": token, (0010, value1)
>>>>>>>>>>> Now we insert (key, 0011, value2), we again lookup "key" + all
>>>>>>>>>>> the prefixes of 0010, finding "", so we append value2 to key."" ending up
>>>>>>>>>>> with:
>>>>>>>>>>> key: dummy value
>>>>>>>>>>> key."": token, (0010, value1), (0011, value2)
>>>>>>>>>>> Now we insert (key, 1011, value3), we again lookup "key" + all
>>>>>>>>>>> the prefixes of 1011 finding "" but notice that it is full, so we partition
>>>>>>>>>>> all the values into two prefixes 0 and 1. We also clear the "" prefix
>>>>>>>>>>> ending up with:
>>>>>>>>>>> key: dummy value
>>>>>>>>>>> key.0: token, (0010, value1), (0011, value2)
>>>>>>>>>>> key.1: token, (1011, value3)
>>>>>>>>>>> Now we insert (key, 0001, value4), we again lookup "key" + all
>>>>>>>>>>> the prefixes of the value finding 0 but notice that it is full, so we
>>>>>>>>>>> partition all the values into two prefixes 00 and 01 but notice this
>>>>>>>>>>> doesn't help us since 00 will be too full so we split 00 again to 000, 001.
>>>>>>>>>>> We also clear the 0 prefix ending up with:
>>>>>>>>>>> key: dummy value
>>>>>>>>>>> key.000: token, (0001, value4)
>>>>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>>>>>>>> key.01: token
>>>>>>>>>>> key.1: token, (1011, value3)
>>>>>>>>>>>
>>>>>>>>>>> We are effectively building a trie[1] where we only have values
>>>>>>>>>>> at the leaves and control how full each leaf can be. There are other trie
>>>>>>>>>>> representations like a radix tree that may be better.
>>>>>>>>>>>
>>>>>>>>>>> Looking up the values in sorted order for "key" would go like
>>>>>>>>>>> this:
>>>>>>>>>>> Is key set, yes
>>>>>>>>>>> look for key."", miss
>>>>>>>>>>> look for key.0, miss
>>>>>>>>>>> look for key.00, miss
>>>>>>>>>>> look for key.000, hit, sort all contained values using secondary
>>>>>>>>>>> key, provide value4 to user
>>>>>>>>>>> look for key.001, hit, sort all contained values using secondary
>>>>>>>>>>> key, provide value1 followed by value2 to user
>>>>>>>>>>> look for key.01, hit, empty, return no values to user
>>>>>>>>>>> look for key.1, hit, sort all contained values using secondary
>>>>>>>>>>> key, provide value3 to user
>>>>>>>>>>> we have walked the entire prefix space, signal end of iterable
>>>>>>>>>>>
>>>>>>>>>>> Some notes for the above:
>>>>>>>>>>> * The dummy value is used to know that the key contains values
>>>>>>>>>>> and the token is to know whether there are any values deeper in the trie so
>>>>>>>>>>> when we know when to stop searching.
>>>>>>>>>>> * If we can recalculate the sort key from the combination of the
>>>>>>>>>>> key and value, then we don't need to store it.
>>>>>>>>>>> * Keys with lots of values will perform worse then keys with
>>>>>>>>>>> less values since we have to look up more keys but they will be empty
>>>>>>>>>>> reads. The number of misses can be controlled by how many elements we are
>>>>>>>>>>> willing to store at a given node before we subdivide.
>>>>>>>>>>>
>>>>>>>>>>> In reality you could build a lot of structures (e.g. red black
>>>>>>>>>>> tree, binary tree) using the sort key, the issue is the cost of
>>>>>>>>>>> rebalancing/re-organizing the structure in map form and whether it has a
>>>>>>>>>>> convenient pre-order traversal for lookups.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <re...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Some great comments!
>>>>>>>>>>>>
>>>>>>>>>>>> *Aljoscha*: absolutely this would have to be implemented by
>>>>>>>>>>>> runners to be efficient. We can of course provide a default (inefficient)
>>>>>>>>>>>> implementation, but ideally runners would provide better ones.
>>>>>>>>>>>>
>>>>>>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed by
>>>>>>>>>>>> this. E.g.
>>>>>>>>>>>>
>>>>>>>>>>>> *Robert* Great point about standard coders not satisfying
>>>>>>>>>>>> this. That's why I suggested that we provide a way to tag the coders that
>>>>>>>>>>>> do preserve order, and only accept those as key coders Alternatively we
>>>>>>>>>>>> could present a more limited API - e.g. only allowing a hard-coded set of
>>>>>>>>>>>> types to be used as keys - but that seems counter to the direction Beam
>>>>>>>>>>>> usually goes. So users will have two ways .of creating multimap state specs:
>>>>>>>>>>>>
>>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>>>>>>> StateSpecs.multimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>>>>>>
>>>>>>>>>>>> or
>>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>>>>>>
>>>>>>>>>>>> The second one will validate that the key coder preserves
>>>>>>>>>>>> order, and fails otherwise (similar to coder determinism checking in
>>>>>>>>>>>> GroupByKey). (BTW we would also have versions of these functions that use
>>>>>>>>>>>> coder inference to "guess" the coder, but those will do the same checking)
>>>>>>>>>>>>
>>>>>>>>>>>> Also the API I proposed did support random access! We could
>>>>>>>>>>>> separate out OrderedBagState again if we think the use cases are
>>>>>>>>>>>> fundamentally different. I merged the proposal into that of MultimapState
>>>>>>>>>>>> because there seemed be 99% overlap.
>>>>>>>>>>>>
>>>>>>>>>>>> Reuven
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <
>>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <re...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <
>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <
>>>>>>>>>>>>> lcwik@google.com> wrote:
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <
>>>>>>>>>>>>> ruwang@google.com> wrote:
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> A few obvious problems with this code:
>>>>>>>>>>>>> >>>>>   1. Removing the elements already processed from the
>>>>>>>>>>>>> bag requires clearing and rewriting the entire bag. This is O(n^2) in the
>>>>>>>>>>>>> number of input trades.
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade
>>>>>>>>>>>>> state?
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends State
>>>>>>>>>>>>> {
>>>>>>>>>>>>> >>>>>   // Add a value to the map.
>>>>>>>>>>>>> >>>>>   void put(K key, V value);
>>>>>>>>>>>>> >>>>>   // Get all values for a given key.
>>>>>>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>>>>>>>>>>>>> >>>>>  // Return all entries in the map.
>>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>>>>>>>>>>>>> >>>>>   // Return all entries in the map with keys <= limit.
>>>>>>>>>>>>> returned elements are sorted by the key.
>>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K
>>>>>>>>>>>>> limit);
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>>  // Remove all values with the given key;
>>>>>>>>>>>>> >>>>>   void remove(K key);
>>>>>>>>>>>>> >>>>>  // Remove all entries in the map with keys <= limit.
>>>>>>>>>>>>> >>>>>   void removeUntil(K limit);
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will remove
>>>>>>>>>>>>> all entries in the map with keys < limit.
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>> >>>>> Runners will sort based on the encoded value of the key.
>>>>>>>>>>>>> In order to make this easier for users, I propose that we introduce a new
>>>>>>>>>>>>> tag on Coders PreservesOrder. A Coder that contains this tag guarantees
>>>>>>>>>>>>> that the encoded value preserves the same ordering as the base Java type.
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>> >>>> Could you clarify what is  "encoded value preserves the
>>>>>>>>>>>>> same ordering as the base Java type"?
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> Lets say A and B represent two different instances of the
>>>>>>>>>>>>> same Java type like a double, then A < B (using the languages comparison
>>>>>>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded versions are compared
>>>>>>>>>>>>> lexicographically)
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Since coders are shared across SDKs, do we expect A < B iff
>>>>>>>>>>>>> e(A) < e(P) property to hold for all languages we support? What happens A,
>>>>>>>>>>>>> B sort differently in different languages?
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > That would have to be the property of the coder (which means
>>>>>>>>>>>>> that this property probably needs to be represented in the portability
>>>>>>>>>>>>> representation of the coder). I imagine the common use cases will be for
>>>>>>>>>>>>> simple coders like int, long, string, etc., which are likely to sort the
>>>>>>>>>>>>> same in most languages.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The standard coders for both double and integral types do not
>>>>>>>>>>>>> respect
>>>>>>>>>>>>> the natural ordering (consider negative values). KV coders
>>>>>>>>>>>>> violate the
>>>>>>>>>>>>> "natural" lexicographic ordering on components as well. I think
>>>>>>>>>>>>> implicitly sorting on encoded value would yield many
>>>>>>>>>>>>> surprises. (The
>>>>>>>>>>>>> state, of course, could take a order-preserving, bytes
>>>>>>>>>>>>> (string?)-producing callable as a parameter of course). (As for
>>>>>>>>>>>>> naming, I'd probably call this OrderedBagState or something
>>>>>>>>>>>>> like
>>>>>>>>>>>>> that...rather than Map which tends to imply random access.)
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: DISCUSS: Sorted MapState API

Posted by Jan Lukavský <je...@seznam.cz>.

My questions were just an example. I fully agree there is a fundamental 
need for a sorted state (of some form, and I also think this links to 
efficient implementation of retrations) - I was reacting to Kenn's 
question about BIP. This one would be pretty nice example why it would 
be good to have such a "process" - not everything can be solved on ML 
and there are fundamental decisions that might need a closer attention.

On 6/18/20 5:28 PM, Reuven Lax wrote:
> Jan - my proposal is exactly TimeSortedBagState (more accurately - 
> TimeSortedListState), though I went a bit further and also proposed a 
> way to have a dynamic number of tagged TimeSortedBagStates.
>
> You are correct that the runner doesn't really have to store the data 
> time sorted - what's actually needed is the ability to fetch and 
> remove timestamp ranges of data (though that does include fetching the 
> entire list); TimeOrderedState is probably a more accurate name then 
> TimeSortedState. I don't think we could get away with operations that 
> only act on the smallest timestamp, however we could limit the API to 
> only being able to fetch and remove prefixes of data (ordered by 
> timestamp). However if we support prefixes, we might as well support 
> arbitrary subranges.
>
> On Thu, Jun 18, 2020 at 7:26 AM Jan Lukavský <je.ik@seznam.cz 
> <ma...@seznam.cz>> wrote:
>
>     Big +1 for a BIP, as this might really help clarify all the pros
>     and cons of all possibilities. There seem to be questions that
>     need answering and motivating use cases - do we need sorted map
>     state or can we solve our use cases by something simpler - e.g.
>     the mentioned TimeSortedBagState? Does that really have to be
>     time-sorted structure, or does it "only" have to have operations
>     that can efficiently find and remove element with smallest
>     timestamp (like a PriorityQueue)?
>
>     Jan
>
>     On 6/18/20 5:32 AM, Kenneth Knowles wrote:
>>     Zooming in from generic philosophy to be clear: adding time
>>     ordered buffer to the Fn state API is *not* a shortcut.It has
>>     benefits that will not be achieved by SDK-side implementation on
>>     top of either ordered or unordered multimap. Are those benefits
>>     worth expanding the API? I don't know.
>>
>>     A change to allow a runner to have a specialized implementation
>>     for time-buffered state would be one or more StateKey types,
>>     right? Reuven, maybe put this and your Java API in a doc? A BIP?
>>     Seems like there's at least the following to explore:
>>
>>      - how that Java API would map to an SDK-side implementation on
>>     top of multimap state key
>>      - how that Java API would map to a new StateKey
>>      - whether there's actually more than one relevant implementation
>>     of that StateKey
>>      - whether SDK-side implementation on some other state key would
>>     be performant enough in all SDK languages (present and future)
>>
>>     Zooming back out to generic philosophy: Proliferation of StateKey
>>     types tuned by runners (which can very easily still share
>>     implementation) is probably better than proliferation of complex
>>     SDK-side implementations with varying completeness and performance.
>>
>>     Kenn
>>
>>     On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <relax@google.com
>>     <ma...@google.com>> wrote:
>>
>>         It might help for me to describe what I have in mind. I'm
>>         still proposing that we build multimap, just not a
>>         globally-sorted multimap.
>>
>>         My previous proposal was that we provide a Multimap<Key,
>>         Value> state type, sorted by key. this would have two
>>         additional operations - multimap.getRange(startKey, endKey)
>>         and multimap.deleteRange(startKey, endKey). The primary use
>>         case was timestamp sorting, but I felt that a sorted multimap
>>         provided a nice generalization - after all, you can simply
>>         key the multimap by timestamp to get timestamp sorting.
>>
>>         This approach had some issues immediately that would take
>>         some work to solve. Since a multimap key can have any type
>>         and a runner will only be able to sort by encoded type, we
>>         would need to introduce a concept of order-preserving coders
>>         into Beam and plumb that through. Robert pointed out that
>>         even our existing standard coders for simple integral types
>>         don't preserve order, so there will likely be surprises here.
>>
>>         My current proposal is for a multimap that is not sorted by
>>         key, but that can support.ordered values for a single key.
>>         Remember that a multimap maps K -> Iterable<V>, so this means
>>         that each individual Iterable<V> is ordered, but the keys
>>         have no specific order relative to each other. This is not
>>         too different from many multimap implementations where the
>>         keys are unordered, but the list of values for a single key
>>         at least has a stable order.
>>
>>         The interface would look like this:
>>
>>         public interface MultimapState<K, V> extends State {
>>           // Add a value with a default timestamp.
>>           void put(K key, V value);
>>
>>           // Add a timestamped value.
>>           void put(K, key, TimestampedValue<V> value);
>>
>>           // Remove all values for a key.
>>           void remove (K key);
>>
>>           // Remove all values for a key with timestamps within the
>>         specified range.
>>           void removeRange(K key, Instant startTs, Instant endTs);
>>
>>           // Get an Iterable of values for V. The Iterable will be
>>         returned sorted by timestamp.
>>         ReadableState<Iterable<TimestampedValue<V>>> get(K key);
>>
>>           // Get an Iterable of values for V in the specified range.
>>         The Iterable will be returned sorted by timestamp.
>>         ReadableState<Iterable<TimestampedValue<V>>> getRange(K key,
>>         Instant startTs, Instant endTs);
>>
>>           ReadableState<Iterable<K>> keys();
>>         ReadableState<Iterable<TimestampedValue<V>>> values();
>>           ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>>
>>         entries;
>>         }
>>
>>         We can of course provide helper functions that allow using
>>         MultimapState without deailing with TimestampValue for users
>>         who only want a multimap and don't want sorting.
>>
>>         I think many users will only need a single sorted list - not
>>         a full multimap. It's worth offering this as well, and we can
>>         simply build it on top of MultimapState. It will look like an
>>         extension of BagState
>>
>>         public interface TimestampSortedListState<T> extends State {
>>           void add(TimestampedValue<T> value);
>>           Iterable<TimestampedValue<T>> read();
>>           Iterable<TimestampedValue<T>> readRange(Instant startTs,
>>         Instant endTs);
>>           void clearRange(Instant startTs, Instant endTs);
>>         }
>>
>>
>>         On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lcwik@google.com
>>         <ma...@google.com>> wrote:
>>
>>             The portability layer is meant to live across multiple
>>             versions of Beam and I don't think it should be treated
>>             by doing the simple and useful thing now since I believe
>>             it will lead to a proliferation of the API.
>>
>>             On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles
>>             <kenn@apache.org <ma...@apache.org>> wrote:
>>
>>                 I have thoughts on the subject of whether to have
>>                 APIs just for the lowest-level building blocks versus
>>                 having APIs for higher-level constructs. Specifically
>>                 this applies to providing only unsorted multimap vs
>>                 what I will call "time-ordered buffer". TL;DR: I'd
>>                 vote to focus on time-ordered buffer; if it turns out
>>                 to be easy to go all the way to sorted multimap
>>                 that's nice-to-have; if it turns out to be easy to
>>                 implement on top of unsorted map state that should
>>                 probably be under the hood
>>
>>                 Reasons to build low-level multimap in the runner &
>>                 fn api and layer higher-level things in the SDK:
>>
>>                  - It is less implementation for runners if they have
>>                 to only provide fewer lower-level building blocks
>>                 like multimap state.
>>                  - There are many more runners than SDKs (and will be
>>                 even more and more) so this saves overall.
>>
>>                 Reasons to build higher-level constructs directly in
>>                 the runner and fn api:
>>
>>                  - Having multiple higher-level state types may
>>                 actually be less implementation than one complex
>>                 state type, especially if they map to runner primitives.
>>                  - The runner may have better specialized
>>                 implementations, especially for something like a
>>                 time-ordered buffer.
>>                  - The particular access patterns in an SDK-based
>>                 implementation may not be ideal for each runner's
>>                 underlying implementation of the low-level building
>>                 block.
>>                  - There may be excessive gRPC overhead even for
>>                 optimal access patterns.
>>
>>                 There are ways to have best of both worlds, like:
>>
>>                 1. Define multiple state types according to
>>                 fundamental access patterns, like we did this before
>>                 portability.
>>                 2. If it is easy to layer one on top of the other, do
>>                 that inside the runner. Provide shared code so for
>>                 runners providing the lowest-level primitive they get
>>                 all the types for free.
>>
>>                 I understand that this is an oversimplification. It
>>                 still creates some more work. And APIs are a burden
>>                 so it is good to introduce as few as possible for
>>                 maintenance. But it has performance benefits and also
>>                 unblocks "just doing the simple and useful thing now"
>>                 which I always like to do as long as it is compatible
>>                 with future changes. If the APIs are fundamental,
>>                 like sets, maps, timestamp ordering, then it is safe
>>                 to guess that they will change rarely and be useful
>>                 forever.
>>
>>                 Kenn
>>
>>                 On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik
>>                 <lcwik@google.com <ma...@google.com>> wrote:
>>
>>                     I would be glad to take a stab at how to provide
>>                     sorting on top of unsorted multimap state.
>>                     Based upon your description, you want integer
>>                     keys representing timestamps and arbitrary user
>>                     value for the values, is that correct?
>>                     What kinds of operations do you need on the
>>                     sorted map state in order of efficiency requirements?
>>                     (e.g. Next(x), Previous(x), GetAll(Range[x, y)),
>>                     ClearAll(Range[x, y))
>>                     What kinds of operations do we expect the
>>                     underlying unsorted map state to be able to provide?
>>                     (at a minimum Get(K), Append(K), Clear(K) but
>>                     what else e.g. enumerate(K)?)
>>
>>                     I went through a similar exercise of how to
>>                     provide a list like side input view over a
>>                     multimap[1] side input which efficiently allowed
>>                     computation of size and provided random access
>>                     while only having access to get(K) and enumerate K's.
>>
>>                     1:
>>                     https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>>
>>                     On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax
>>                     <relax@google.com <ma...@google.com>> wrote:
>>
>>                         Bringing this subject up again,
>>
>>                         I've spent some time looking into
>>                         implementing this for the Dataflow runner.
>>                         I'm unable to find a way to implement the
>>                         arbitrary sorted multimap efficiently for the
>>                         case where there are large numbers of unique
>>                         keys. Since the primary driving use case is
>>                         timestamp ordering (i.e. key is event
>>                         timestamp), you would expect to have nearly a
>>                         new key per element. I considered Luke's
>>                         suggestion above, but unfortunately it
>>                         doesn't really solve this issue.
>>
>>                         The primary use case for sorting always seems
>>                         to be sorting by timestamp. I want to propose
>>                         that instead of building the fully-general
>>                         sorted multimap, we instead focus on a state
>>                         type where the sort key is an integral type
>>                         (like a timestamp or an integer). There is
>>                         still a valid use case for multimap, but we
>>                         can provide that as an unordered state. At
>>                         least for Dataflow, it will be much easier
>>
>>                         While my difficulties here may be specific to
>>                         the Dataflow runner, any such support would
>>                         have to be built into other runners as well,
>>                         and limiting to integral sorting likely makes
>>                         it easier for other runners to implement
>>                         this. Also, if you look at this
>>                         <https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95> Flink
>>                         comment pointed out by Aljoscha, for Flink
>>                         the main use case identified was also
>>                         timestamp sorting. This will also simplify
>>                         the API design for this feature: Sorted
>>                         multimap with arbitrary keys would require us
>>                         to introduce a way of mapping natural
>>                         ordering to encoded ordering (i.e. a new
>>                         OrderPreservingCoder), but if we limit sort
>>                         keys to integral types, the API design is
>>                         simpler as integral types can be represented
>>                         directly.
>>
>>                         Reuven
>>
>>                         On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax
>>                         <relax@google.com <ma...@google.com>>
>>                         wrote:
>>
>>                             This sounds to me like a potential runner
>>                             strategy. However if a runner can
>>                             natively support sorted maps (e.g. we
>>                             expect the Dataflow runner to be able to
>>                             do so, and I think it would be useful for
>>                             other runners as well), then it's
>>                             probably preferable to allow the runner
>>                             to use its native capabilities.
>>
>>                             On Fri, May 24, 2019 at 11:05 AM Lukasz
>>                             Cwik <lcwik@google.com
>>                             <ma...@google.com>> wrote:
>>
>>                                 For the API that you proposed, the
>>                                 map key is always "void" and the sort
>>                                 key == user key. So in my example of
>>                                 key: dummy value
>>                                 key.000: token, (0001, value4)
>>                                 key.001: token, (0010, value1),
>>                                 (0011, value2)
>>                                 key.01: token
>>                                 key.1: token, (1011, value3)
>>                                 you would have:
>>                                 "void": dummy value
>>                                 "void".000: token, (0001, value4)
>>                                 "void".001: token, (0010, value1),
>>                                 (0011, value2)
>>                                 "void".01: token
>>                                 "void".1: token, (1011, value3)
>>
>>                                 Iterable<KV<K, V>> entriesUntil(K
>>                                 limit) translates into walking the
>>                                 the prefixes until you find a common
>>                                 prefix for K and then filter for
>>                                 values where they have a sort key <=
>>                                 K. Using the example above, to find
>>                                 entriesUntil(0010) you would:
>>                                 look for key."", miss
>>                                 look for key.0, miss
>>                                 look for key.00, miss
>>                                 look for key.000, hit, sort all
>>                                 contained values using secondary key,
>>                                 provide value4 to user
>>                                 look for key.001, hit, notice that
>>                                 001 is a prefix of 0010 so we sort
>>                                 all contained values using secondary
>>                                 key, filter out value2 and provide value1
>>
>>                                 void removeUntil(K limit) also
>>                                 translates into walking the prefixes
>>                                 but instead we will clear them when
>>                                 we have a "hit" with some special
>>                                 logic for when the sort key is a
>>                                 prefix of the key. Used the example,
>>                                 to removeUntil(0010) you would:
>>                                 look for key."", miss
>>                                 look for key.0, miss
>>                                 look for key.00, miss
>>                                 look for key.000, hit, clear
>>                                 look for key.001, hit, notice that
>>                                 001 is a prefix of 0010 so we sort
>>                                 all contained values using secondary
>>                                 key, store in memory all values that
>>                                 > 0010, clear and append values
>>                                 stored in memory.
>>
>>                                 On Fri, May 24, 2019 at 10:36 AM
>>                                 Reuven Lax <relax@google.com
>>                                 <ma...@google.com>> wrote:
>>
>>                                     Can you explain how fetching and
>>                                     deleting ranges of keys would
>>                                     work with this data structure?
>>
>>                                     On Fri, May 24, 2019 at 9:50 AM
>>                                     Lukasz Cwik <lcwik@google.com
>>                                     <ma...@google.com>> wrote:
>>
>>                                         Reuven, for the example, I
>>                                         assume that we never want to
>>                                         store more then 2 values at a
>>                                         given sort key prefix, and if
>>                                         we do then we will create a
>>                                         new longer prefix splitting
>>                                         up the values based upon the
>>                                         sort key.
>>
>>                                         Tuple representation in
>>                                         examples below is (key, sort
>>                                         key, value) and . is a
>>                                         character outside of the
>>                                         alphabet which can be
>>                                         represented by using an
>>                                         escaping encoding that wraps
>>                                         the key + sort key encoding.
>>
>>                                         To insert (key, 0010,
>>                                         value1), we lookup "key" +
>>                                         all the prefixes of 0010
>>                                         finding one that is not
>>                                         empty. In this case its 0, so
>>                                         we append value to the map at
>>                                         key.0 ending up with (we also
>>                                         set the key to any dummy
>>                                         value to know that it it
>>                                         contains values):
>>                                         key: dummy value
>>                                         key."": token, (0010, value1)
>>                                         Now we insert (key, 0011,
>>                                         value2), we again lookup
>>                                         "key" + all the prefixes of
>>                                         0010, finding "", so we
>>                                         append value2 to key.""
>>                                         ending up with:
>>                                         key: dummy value
>>                                         key."": token, (0010,
>>                                         value1), (0011, value2)
>>                                         Now we insert (key, 1011,
>>                                         value3), we again lookup
>>                                         "key" + all the prefixes of
>>                                         1011 finding "" but notice
>>                                         that it is full, so we
>>                                         partition all the values into
>>                                         two prefixes 0 and 1. We also
>>                                         clear the "" prefix ending up
>>                                         with:
>>                                         key: dummy value
>>                                         key.0: token, (0010, value1),
>>                                         (0011, value2)
>>                                         key.1: token, (1011, value3)
>>                                         Now we insert (key, 0001,
>>                                         value4), we again lookup
>>                                         "key" + all the prefixes of
>>                                         the value finding 0 but
>>                                         notice that it is full, so we
>>                                         partition all the values into
>>                                         two prefixes 00 and 01 but
>>                                         notice this doesn't help us
>>                                         since 00 will be too full so
>>                                         we split 00 again to 000,
>>                                         001. We also clear the 0
>>                                         prefix ending up with:
>>                                         key: dummy value
>>                                         key.000: token, (0001, value4)
>>                                         key.001: token, (0010,
>>                                         value1), (0011, value2)
>>                                         key.01: token
>>                                         key.1: token, (1011, value3)
>>
>>                                         We are effectively building a
>>                                         trie[1] where we only have
>>                                         values at the leaves and
>>                                         control how full each leaf
>>                                         can be. There are other trie
>>                                         representations like a radix
>>                                         tree that may be better.
>>
>>                                         Looking up the values in
>>                                         sorted order for "key" would
>>                                         go like this:
>>                                         Is key set, yes
>>                                         look for key."", miss
>>                                         look for key.0, miss
>>                                         look for key.00, miss
>>                                         look for key.000, hit, sort
>>                                         all contained values using
>>                                         secondary key, provide value4
>>                                         to user
>>                                         look for key.001, hit, sort
>>                                         all contained values using
>>                                         secondary key, provide value1
>>                                         followed by value2 to user
>>                                         look for key.01, hit, empty,
>>                                         return no values to user
>>                                         look for key.1, hit, sort all
>>                                         contained values using
>>                                         secondary key, provide value3
>>                                         to user
>>                                         we have walked the entire
>>                                         prefix space, signal end of
>>                                         iterable
>>
>>                                         Some notes for the above:
>>                                         * The dummy value is used to
>>                                         know that the key contains
>>                                         values and the token is to
>>                                         know whether there are any
>>                                         values deeper in the trie so
>>                                         when we know when to stop
>>                                         searching.
>>                                         * If we can recalculate the
>>                                         sort key from the combination
>>                                         of the key and value, then we
>>                                         don't need to store it.
>>                                         * Keys with lots of values
>>                                         will perform worse then keys
>>                                         with less values since we
>>                                         have to look up more keys but
>>                                         they will be empty reads. The
>>                                         number of misses can be
>>                                         controlled by how many
>>                                         elements we are willing to
>>                                         store at a given node before
>>                                         we subdivide.
>>
>>                                         In reality you could build a
>>                                         lot of structures (e.g. red
>>                                         black tree, binary tree)
>>                                         using the sort key, the issue
>>                                         is the cost of
>>                                         rebalancing/re-organizing the
>>                                         structure in map form and
>>                                         whether it has a convenient
>>                                         pre-order traversal for lookups.
>>
>>
>>
>>                                         On Fri, May 24, 2019 at 8:14
>>                                         AM Reuven Lax
>>                                         <relax@google.com
>>                                         <ma...@google.com>> wrote:
>>
>>                                             Some great comments!
>>
>>                                             *Aljoscha*: absolutely
>>                                             this would have to be
>>                                             implemented by runners to
>>                                             be efficient. We can of
>>                                             course provide a default
>>                                             (inefficient)
>>                                             implementation, but
>>                                             ideally runners would
>>                                             provide better ones.
>>
>>                                             *Jan* Exactly. I think
>>                                             MapState can be dropped
>>                                             or backed by this. E.g.
>>
>>                                             *Robert* Great point
>>                                             about standard coders not
>>                                             satisfying this. That's
>>                                             why I suggested that we
>>                                             provide a way to tag the
>>                                             coders that do preserve
>>                                             order, and only accept
>>                                             those as key coders
>>                                             Alternatively we could
>>                                             present a more limited
>>                                             API - e.g. only allowing
>>                                             a hard-coded set of types
>>                                             to be used as keys - but
>>                                             that seems counter to the
>>                                             direction Beam usually
>>                                             goes. So users will have
>>                                             two ways .of creating
>>                                             multimap state specs:
>>
>>                                                private final
>>                                             StateSpec<MultimapState<Long,
>>                                             String>> state =
>>                                             StateSpecs.multimap(VarLongCoder.of(),
>>                                             StringUtf8Coder.of());
>>
>>                                             or
>>                                              private final
>>                                             StateSpec<MultimapState<Long,
>>                                             String>> state =
>>                                             StateSpecs.orderedMultimap(VarLongCoder.of(),
>>                                             StringUtf8Coder.of());
>>
>>                                             The second one will
>>                                             validate that the key
>>                                             coder preserves order,
>>                                             and fails otherwise
>>                                             (similar to coder
>>                                             determinism checking in
>>                                             GroupByKey). (BTW we
>>                                             would also have versions
>>                                             of these functions that
>>                                             use coder inference to
>>                                             "guess" the coder, but
>>                                             those will do the same
>>                                             checking)
>>
>>                                             Also the API I proposed
>>                                             did support
>>                                             random access! We could
>>                                             separate out
>>                                             OrderedBagState again if
>>                                             we think the use cases
>>                                             are fundamentally
>>                                             different. I merged the
>>                                             proposal into that of
>>                                             MultimapState because
>>                                             there seemed be 99% overlap.
>>
>>                                             Reuven
>>
>>                                             On Fri, May 24, 2019 at
>>                                             6:19 AM Robert Bradshaw
>>                                             <robertwb@google.com
>>                                             <ma...@google.com>>
>>                                             wrote:
>>
>>                                                 On Fri, May 24, 2019
>>                                                 at 5:32 AM Reuven Lax
>>                                                 <relax@google.com
>>                                                 <ma...@google.com>>
>>                                                 wrote:
>>                                                 >
>>                                                 > On Thu, May 23,
>>                                                 2019 at 1:53 PM Ahmet
>>                                                 Altay
>>                                                 <altay@google.com
>>                                                 <ma...@google.com>>
>>                                                 wrote:
>>                                                 >>
>>                                                 >>
>>                                                 >>
>>                                                 >> On Thu, May 23,
>>                                                 2019 at 1:38 PM
>>                                                 Lukasz Cwik
>>                                                 <lcwik@google.com
>>                                                 <ma...@google.com>>
>>                                                 wrote:
>>                                                 >>>
>>                                                 >>>
>>                                                 >>>
>>                                                 >>> On Thu, May 23,
>>                                                 2019 at 11:37 AM Rui
>>                                                 Wang
>>                                                 <ruwang@google.com
>>                                                 <ma...@google.com>>
>>                                                 wrote:
>>                                                 >>>>>
>>                                                 >>>>> A few obvious
>>                                                 problems with this code:
>>                                                 >>>>>   1. Removing
>>                                                 the elements already
>>                                                 processed from the
>>                                                 bag requires clearing
>>                                                 and rewriting the
>>                                                 entire bag. This is
>>                                                 O(n^2) in the number
>>                                                 of input trades.
>>                                                 >>>>
>>                                                 >>>> why it's not O(2
>>                                                 * n) to clearing and
>>                                                 rewriting trade state?
>>                                                 >>>>
>>                                                 >>>>>
>>                                                 >>>>> public
>>                                                 interface
>>                                                 SortedMultimapState<K,
>>                                                 V> extends State {
>>                                                 >>>>>   // Add a
>>                                                 value to the map.
>>                                                 >>>>>   void put(K
>>                                                 key, V value);
>>                                                 >>>>>   // Get all
>>                                                 values for a given key.
>>                                                 >>>>>
>>                                                  ReadableState<Iterable<V>>
>>                                                 get(K key);
>>                                                 >>>>>  // Return all
>>                                                 entries in the map.
>>                                                 >>>>>
>>                                                  ReadableState<Iterable<KV<K,
>>                                                 V>>> allEntries();
>>                                                 >>>>>   // Return all
>>                                                 entries in the map
>>                                                 with keys <= limit.
>>                                                 returned elements are
>>                                                 sorted by the key.
>>                                                 >>>>>
>>                                                  ReadableState<Iterable<KV<K,
>>                                                 V>>> entriesUntil(K
>>                                                 limit);
>>                                                 >>>>>
>>                                                 >>>>>  // Remove all
>>                                                 values with the given
>>                                                 key;
>>                                                 >>>>>   void remove(K
>>                                                 key);
>>                                                 >>>>>  // Remove all
>>                                                 entries in the map
>>                                                 with keys <= limit.
>>                                                 >>>>>   void
>>                                                 removeUntil(K limit);
>>                                                 >>>>
>>                                                 >>>> Will
>>                                                 removeUntilExcl(K
>>                                                 limit) also useful?
>>                                                 It will remove all
>>                                                 entries in the map
>>                                                 with keys < limit.
>>                                                 >>>>
>>                                                 >>>>>
>>                                                 >>>>> Runners will
>>                                                 sort based on the
>>                                                 encoded value of the
>>                                                 key. In order to make
>>                                                 this easier for
>>                                                 users, I propose that
>>                                                 we introduce a new
>>                                                 tag on Coders
>>                                                 PreservesOrder. A
>>                                                 Coder that contains
>>                                                 this tag guarantees
>>                                                 that the encoded
>>                                                 value preserves the
>>                                                 same ordering as the
>>                                                 base Java type.
>>                                                 >>>>
>>                                                 >>>>
>>                                                 >>>> Could you
>>                                                 clarify what is 
>>                                                 "encoded value
>>                                                 preserves the same
>>                                                 ordering as the base
>>                                                 Java type"?
>>                                                 >>>
>>                                                 >>>
>>                                                 >>> Lets say A and B
>>                                                 represent two
>>                                                 different instances
>>                                                 of the same Java type
>>                                                 like a double, then A
>>                                                 < B (using the
>>                                                 languages comparison
>>                                                 operator) iff
>>                                                 encode(A) < encode(B)
>>                                                 (note the encoded
>>                                                 versions are compared
>>                                                 lexicographically)
>>                                                 >>
>>                                                 >>
>>                                                 >> Since coders are
>>                                                 shared across SDKs,
>>                                                 do we expect A < B
>>                                                 iff e(A) < e(P)
>>                                                 property to hold for
>>                                                 all languages we
>>                                                 support? What happens
>>                                                 A, B sort differently
>>                                                 in different languages?
>>                                                 >
>>                                                 >
>>                                                 > That would have to
>>                                                 be the property of
>>                                                 the coder (which
>>                                                 means that this
>>                                                 property probably
>>                                                 needs to be
>>                                                 represented in the
>>                                                 portability
>>                                                 representation of the
>>                                                 coder). I imagine the
>>                                                 common use cases will
>>                                                 be for simple coders
>>                                                 like int, long,
>>                                                 string, etc., which
>>                                                 are likely to sort
>>                                                 the same in most
>>                                                 languages.
>>
>>                                                 The standard coders
>>                                                 for both double and
>>                                                 integral types do not
>>                                                 respect
>>                                                 the natural ordering
>>                                                 (consider negative
>>                                                 values). KV coders
>>                                                 violate the
>>                                                 "natural"
>>                                                 lexicographic
>>                                                 ordering on
>>                                                 components as well. I
>>                                                 think
>>                                                 implicitly sorting on
>>                                                 encoded value would
>>                                                 yield many surprises.
>>                                                 (The
>>                                                 state, of course,
>>                                                 could take a
>>                                                 order-preserving, bytes
>>                                                 (string?)-producing
>>                                                 callable as a
>>                                                 parameter of course).
>>                                                 (As for
>>                                                 naming, I'd probably
>>                                                 call this
>>                                                 OrderedBagState or
>>                                                 something like
>>                                                 that...rather than
>>                                                 Map which tends to
>>                                                 imply random access.)
>>

Re: DISCUSS: Sorted MapState API

Posted by Reuven Lax <re...@google.com>.

Jan - my proposal is exactly TimeSortedBagState (more accurately -
TimeSortedListState), though I went a bit further and also proposed a way
to have a dynamic number of tagged TimeSortedBagStates.

You are correct that the runner doesn't really have to store the data time
sorted - what's actually needed is the ability to fetch and remove
timestamp ranges of data (though that does include fetching the entire
list); TimeOrderedState is probably a more accurate name then
TimeSortedState. I don't think we could get away with operations that only
act on the smallest timestamp, however we could limit the API to only being
able to fetch and remove prefixes of data (ordered by timestamp). However
if we support prefixes, we might as well support arbitrary subranges.

On Thu, Jun 18, 2020 at 7:26 AM Jan Lukavský <je...@seznam.cz> wrote:

> Big +1 for a BIP, as this might really help clarify all the pros and cons
> of all possibilities. There seem to be questions that need answering and
> motivating use cases - do we need sorted map state or can we solve our use
> cases by something simpler - e.g. the mentioned TimeSortedBagState? Does
> that really have to be time-sorted structure, or does it "only" have to
> have operations that can efficiently find and remove element with smallest
> timestamp (like a PriorityQueue)?
>
> Jan
> On 6/18/20 5:32 AM, Kenneth Knowles wrote:
>
> Zooming in from generic philosophy to be clear: adding time ordered buffer
> to the Fn state API is *not* a shortcut.It has benefits that will not be
> achieved by SDK-side implementation on top of either ordered or unordered
> multimap. Are those benefits worth expanding the API? I don't know.
>
> A change to allow a runner to have a specialized implementation for
> time-buffered state would be one or more StateKey types, right? Reuven,
> maybe put this and your Java API in a doc? A BIP? Seems like there's at
> least the following to explore:
>
>  - how that Java API would map to an SDK-side implementation on top of
> multimap state key
>  - how that Java API would map to a new StateKey
>  - whether there's actually more than one relevant implementation of that
> StateKey
>  - whether SDK-side implementation on some other state key would be
> performant enough in all SDK languages (present and future)
>
> Zooming back out to generic philosophy: Proliferation of StateKey
> types tuned by runners (which can very easily still share implementation)
> is probably better than proliferation of complex SDK-side implementations
> with varying completeness and performance.
>
> Kenn
>
> On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <re...@google.com> wrote:
>
>> It might help for me to describe what I have in mind. I'm still proposing
>> that we build multimap, just not a globally-sorted multimap.
>>
>> My previous proposal was that we provide a Multimap<Key, Value> state
>> type, sorted by key. this would have two additional operations -
>> multimap.getRange(startKey, endKey) and multimap.deleteRange(startKey,
>> endKey). The primary use case was timestamp sorting, but I felt that a
>> sorted multimap provided a nice generalization - after all, you can simply
>> key the multimap by timestamp to get timestamp sorting.
>>
>> This approach had some issues immediately that would take some work to
>> solve. Since a multimap key can have any type and a runner will only be
>> able to sort by encoded type, we would need to introduce a concept of
>> order-preserving coders into Beam and plumb that through. Robert pointed
>> out that even our existing standard coders for simple integral types don't
>> preserve order, so there will likely be surprises here.
>>
>> My current proposal is for a multimap that is not sorted by key, but that
>> can support.ordered values for a single key. Remember that a multimap maps
>> K -> Iterable<V>, so this means that each individual Iterable<V> is
>> ordered, but the keys have no specific order relative to each other. This
>> is not too different from many multimap implementations where the keys are
>> unordered, but the list of values for a single key at least has a stable
>> order.
>>
>> The interface would look like this:
>>
>> public interface MultimapState<K, V> extends State {
>>   // Add a value with a default timestamp.
>>   void put(K key, V value);
>>
>>   // Add a timestamped value.
>>   void put(K, key, TimestampedValue<V> value);
>>
>>   // Remove all values for a key.
>>   void remove (K key);
>>
>>   // Remove all values for a key with timestamps within the specified
>> range.
>>   void removeRange(K key, Instant startTs, Instant endTs);
>>
>>   // Get an Iterable of values for V. The Iterable will be returned
>> sorted by timestamp.
>>   ReadableState<Iterable<TimestampedValue<V>>> get(K key);
>>
>>   // Get an Iterable of values for V in the specified range. The Iterable
>> will be returned sorted by timestamp.
>>   ReadableState<Iterable<TimestampedValue<V>>> getRange(K key, Instant
>> startTs, Instant endTs);
>>
>>   ReadableState<Iterable<K>> keys();
>>   ReadableState<Iterable<TimestampedValue<V>>> values();
>>   ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>> entries;
>> }
>>
>> We can of course provide helper functions that allow using MultimapState
>> without deailing with TimestampValue for users who only want a multimap and
>> don't want sorting.
>>
>> I think many users will only need a single sorted list - not a full
>> multimap. It's worth offering this as well, and we can simply build it on
>> top of MultimapState. It will look like an extension of BagState
>>
>> public interface TimestampSortedListState<T> extends State {
>>   void add(TimestampedValue<T> value);
>>   Iterable<TimestampedValue<T>> read();
>>   Iterable<TimestampedValue<T>> readRange(Instant startTs, Instant endTs);
>>   void clearRange(Instant startTs, Instant endTs);
>> }
>>
>>
>> On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lc...@google.com> wrote:
>>
>>> The portability layer is meant to live across multiple versions of Beam
>>> and I don't think it should be treated by doing the simple and useful thing
>>> now since I believe it will lead to a proliferation of the API.
>>>
>>> On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> I have thoughts on the subject of whether to have APIs just for the
>>>> lowest-level building blocks versus having APIs for higher-level
>>>> constructs. Specifically this applies to providing only unsorted multimap
>>>> vs what I will call "time-ordered buffer". TL;DR: I'd vote to focus on
>>>> time-ordered buffer; if it turns out to be easy to go all the way to sorted
>>>> multimap that's nice-to-have; if it turns out to be easy to implement on
>>>> top of unsorted map state that should probably be under the hood
>>>>
>>>> Reasons to build low-level multimap in the runner & fn api and layer
>>>> higher-level things in the SDK:
>>>>
>>>>  - It is less implementation for runners if they have to only provide
>>>> fewer lower-level building blocks like multimap state.
>>>>  - There are many more runners than SDKs (and will be even more and
>>>> more) so this saves overall.
>>>>
>>>> Reasons to build higher-level constructs directly in the runner and fn
>>>> api:
>>>>
>>>>  - Having multiple higher-level state types may actually be less
>>>> implementation than one complex state type, especially if they map to
>>>> runner primitives.
>>>>  - The runner may have better specialized implementations, especially
>>>> for something like a time-ordered buffer.
>>>>  - The particular access patterns in an SDK-based implementation may
>>>> not be ideal for each runner's underlying implementation of the low-level
>>>> building block.
>>>>  - There may be excessive gRPC overhead even for optimal access
>>>> patterns.
>>>>
>>>> There are ways to have best of both worlds, like:
>>>>
>>>> 1. Define multiple state types according to fundamental access
>>>> patterns, like we did this before portability.
>>>> 2. If it is easy to layer one on top of the other, do that inside the
>>>> runner. Provide shared code so for runners providing the lowest-level
>>>> primitive they get all the types for free.
>>>>
>>>> I understand that this is an oversimplification. It still creates some
>>>> more work. And APIs are a burden so it is good to introduce as few as
>>>> possible for maintenance. But it has performance benefits and also unblocks
>>>> "just doing the simple and useful thing now" which I always like to do as
>>>> long as it is compatible with future changes. If the APIs are fundamental,
>>>> like sets, maps, timestamp ordering, then it is safe to guess that they
>>>> will change rarely and be useful forever.
>>>>
>>>> Kenn
>>>>
>>>> On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> I would be glad to take a stab at how to provide sorting on top of
>>>>> unsorted multimap state.
>>>>> Based upon your description, you want integer keys representing
>>>>> timestamps and arbitrary user value for the values, is that correct?
>>>>> What kinds of operations do you need on the sorted map state in order
>>>>> of efficiency requirements?
>>>>> (e.g. Next(x), Previous(x), GetAll(Range[x, y)), ClearAll(Range[x, y))
>>>>> What kinds of operations do we expect the underlying unsorted map
>>>>> state to be able to provide?
>>>>> (at a minimum Get(K), Append(K), Clear(K) but what else e.g.
>>>>> enumerate(K)?)
>>>>>
>>>>> I went through a similar exercise of how to provide a list like side
>>>>> input view over a multimap[1] side input which efficiently allowed
>>>>> computation of size and provided random access while only having access to
>>>>> get(K) and enumerate K's.
>>>>>
>>>>> 1:
>>>>> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>>>>>
>>>>> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> Bringing this subject up again,
>>>>>>
>>>>>> I've spent some time looking into implementing this for the Dataflow
>>>>>> runner. I'm unable to find a way to implement the arbitrary sorted multimap
>>>>>> efficiently for the case where there are large numbers of unique keys.
>>>>>> Since the primary driving use case is timestamp ordering (i.e. key is event
>>>>>> timestamp), you would expect to have nearly a new key per element. I
>>>>>> considered Luke's suggestion above, but unfortunately it doesn't really
>>>>>> solve this issue.
>>>>>>
>>>>>> The primary use case for sorting always seems to be sorting by
>>>>>> timestamp. I want to propose that instead of building the fully-general
>>>>>> sorted multimap, we instead focus on a state type where the sort key is an
>>>>>> integral type (like a timestamp or an integer). There is still a valid use
>>>>>> case for multimap, but we can provide that as an unordered state. At least
>>>>>> for Dataflow, it will be much easier
>>>>>>
>>>>>> While my difficulties here may be specific to the Dataflow runner,
>>>>>> any such support would have to be built into other runners as well, and
>>>>>> limiting to integral sorting likely makes it easier for other runners to
>>>>>> implement this. Also, if you look at this
>>>>>> <https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95> Flink
>>>>>> comment pointed out by Aljoscha, for Flink the main use case identified was
>>>>>> also timestamp sorting. This will also simplify the API design for this
>>>>>> feature: Sorted multimap with arbitrary keys would require us to introduce
>>>>>> a way of mapping natural ordering to encoded ordering (i.e. a new
>>>>>> OrderPreservingCoder), but if we limit sort keys to integral types, the API
>>>>>> design is simpler as integral types can be represented directly.
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> This sounds to me like a potential runner strategy. However if a
>>>>>>> runner can natively support sorted maps (e.g. we expect the Dataflow runner
>>>>>>> to be able to do so, and I think it would be useful for other runners as
>>>>>>> well), then it's probably preferable to allow the runner to use its native
>>>>>>> capabilities.
>>>>>>>
>>>>>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lc...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> For the API that you proposed, the map key is always "void" and the
>>>>>>>> sort key == user key. So in my example of
>>>>>>>> key: dummy value
>>>>>>>> key.000: token, (0001, value4)
>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>>>>> key.01: token
>>>>>>>> key.1: token, (1011, value3)
>>>>>>>> you would have:
>>>>>>>> "void": dummy value
>>>>>>>> "void".000: token, (0001, value4)
>>>>>>>> "void".001: token, (0010, value1), (0011, value2)
>>>>>>>> "void".01: token
>>>>>>>> "void".1: token, (1011, value3)
>>>>>>>>
>>>>>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into walking
>>>>>>>> the the prefixes until you find a common prefix for K and then filter for
>>>>>>>> values where they have a sort key <= K. Using the example above, to find
>>>>>>>> entriesUntil(0010) you would:
>>>>>>>> look for key."", miss
>>>>>>>> look for key.0, miss
>>>>>>>> look for key.00, miss
>>>>>>>> look for key.000, hit, sort all contained values using secondary
>>>>>>>> key, provide value4 to user
>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we
>>>>>>>> sort all contained values using secondary key, filter out value2 and
>>>>>>>> provide value1
>>>>>>>>
>>>>>>>> void removeUntil(K limit) also translates into walking the prefixes
>>>>>>>> but instead we will clear them when we have a "hit" with some special logic
>>>>>>>> for when the sort key is a prefix of the key. Used the example, to
>>>>>>>> removeUntil(0010) you would:
>>>>>>>> look for key."", miss
>>>>>>>> look for key.0, miss
>>>>>>>> look for key.00, miss
>>>>>>>> look for key.000, hit, clear
>>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we
>>>>>>>> sort all contained values using secondary key, store in memory all values
>>>>>>>> that > 0010, clear and append values stored in memory.
>>>>>>>>
>>>>>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Can you explain how fetching and deleting ranges of keys would
>>>>>>>>> work with this data structure?
>>>>>>>>>
>>>>>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <lc...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Reuven, for the example, I assume that we never want to store
>>>>>>>>>> more then 2 values at a given sort key prefix, and if we do then we will
>>>>>>>>>> create a new longer prefix splitting up the values based upon the sort key.
>>>>>>>>>>
>>>>>>>>>> Tuple representation in examples below is (key, sort key, value)
>>>>>>>>>> and . is a character outside of the alphabet which can be represented by
>>>>>>>>>> using an escaping encoding that wraps the key + sort key encoding.
>>>>>>>>>>
>>>>>>>>>> To insert (key, 0010, value1), we lookup "key" + all the prefixes
>>>>>>>>>> of 0010 finding one that is not empty. In this case its 0, so we append
>>>>>>>>>> value to the map at key.0 ending up with (we also set the key to any dummy
>>>>>>>>>> value to know that it it contains values):
>>>>>>>>>> key: dummy value
>>>>>>>>>> key."": token, (0010, value1)
>>>>>>>>>> Now we insert (key, 0011, value2), we again lookup "key" + all
>>>>>>>>>> the prefixes of 0010, finding "", so we append value2 to key."" ending up
>>>>>>>>>> with:
>>>>>>>>>> key: dummy value
>>>>>>>>>> key."": token, (0010, value1), (0011, value2)
>>>>>>>>>> Now we insert (key, 1011, value3), we again lookup "key" + all
>>>>>>>>>> the prefixes of 1011 finding "" but notice that it is full, so we partition
>>>>>>>>>> all the values into two prefixes 0 and 1. We also clear the "" prefix
>>>>>>>>>> ending up with:
>>>>>>>>>> key: dummy value
>>>>>>>>>> key.0: token, (0010, value1), (0011, value2)
>>>>>>>>>> key.1: token, (1011, value3)
>>>>>>>>>> Now we insert (key, 0001, value4), we again lookup "key" + all
>>>>>>>>>> the prefixes of the value finding 0 but notice that it is full, so we
>>>>>>>>>> partition all the values into two prefixes 00 and 01 but notice this
>>>>>>>>>> doesn't help us since 00 will be too full so we split 00 again to 000, 001.
>>>>>>>>>> We also clear the 0 prefix ending up with:
>>>>>>>>>> key: dummy value
>>>>>>>>>> key.000: token, (0001, value4)
>>>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>>>>>>> key.01: token
>>>>>>>>>> key.1: token, (1011, value3)
>>>>>>>>>>
>>>>>>>>>> We are effectively building a trie[1] where we only have values
>>>>>>>>>> at the leaves and control how full each leaf can be. There are other trie
>>>>>>>>>> representations like a radix tree that may be better.
>>>>>>>>>>
>>>>>>>>>> Looking up the values in sorted order for "key" would go like
>>>>>>>>>> this:
>>>>>>>>>> Is key set, yes
>>>>>>>>>> look for key."", miss
>>>>>>>>>> look for key.0, miss
>>>>>>>>>> look for key.00, miss
>>>>>>>>>> look for key.000, hit, sort all contained values using secondary
>>>>>>>>>> key, provide value4 to user
>>>>>>>>>> look for key.001, hit, sort all contained values using secondary
>>>>>>>>>> key, provide value1 followed by value2 to user
>>>>>>>>>> look for key.01, hit, empty, return no values to user
>>>>>>>>>> look for key.1, hit, sort all contained values using secondary
>>>>>>>>>> key, provide value3 to user
>>>>>>>>>> we have walked the entire prefix space, signal end of iterable
>>>>>>>>>>
>>>>>>>>>> Some notes for the above:
>>>>>>>>>> * The dummy value is used to know that the key contains values
>>>>>>>>>> and the token is to know whether there are any values deeper in the trie so
>>>>>>>>>> when we know when to stop searching.
>>>>>>>>>> * If we can recalculate the sort key from the combination of the
>>>>>>>>>> key and value, then we don't need to store it.
>>>>>>>>>> * Keys with lots of values will perform worse then keys with less
>>>>>>>>>> values since we have to look up more keys but they will be empty reads. The
>>>>>>>>>> number of misses can be controlled by how many elements we are willing to
>>>>>>>>>> store at a given node before we subdivide.
>>>>>>>>>>
>>>>>>>>>> In reality you could build a lot of structures (e.g. red black
>>>>>>>>>> tree, binary tree) using the sort key, the issue is the cost of
>>>>>>>>>> rebalancing/re-organizing the structure in map form and whether it has a
>>>>>>>>>> convenient pre-order traversal for lookups.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <re...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Some great comments!
>>>>>>>>>>>
>>>>>>>>>>> *Aljoscha*: absolutely this would have to be implemented by
>>>>>>>>>>> runners to be efficient. We can of course provide a default (inefficient)
>>>>>>>>>>> implementation, but ideally runners would provide better ones.
>>>>>>>>>>>
>>>>>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed by
>>>>>>>>>>> this. E.g.
>>>>>>>>>>>
>>>>>>>>>>> *Robert* Great point about standard coders not satisfying this.
>>>>>>>>>>> That's why I suggested that we provide a way to tag the coders that do
>>>>>>>>>>> preserve order, and only accept those as key coders Alternatively we could
>>>>>>>>>>> present a more limited API - e.g. only allowing a hard-coded set of types
>>>>>>>>>>> to be used as keys - but that seems counter to the direction Beam usually
>>>>>>>>>>> goes. So users will have two ways .of creating multimap state specs:
>>>>>>>>>>>
>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>>>>>> StateSpecs.multimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>>>>>
>>>>>>>>>>> or
>>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>>>>>
>>>>>>>>>>> The second one will validate that the key coder preserves order,
>>>>>>>>>>> and fails otherwise (similar to coder determinism checking in GroupByKey).
>>>>>>>>>>> (BTW we would also have versions of these functions that use coder
>>>>>>>>>>> inference to "guess" the coder, but those will do the same checking)
>>>>>>>>>>>
>>>>>>>>>>> Also the API I proposed did support random access! We could
>>>>>>>>>>> separate out OrderedBagState again if we think the use cases are
>>>>>>>>>>> fundamentally different. I merged the proposal into that of MultimapState
>>>>>>>>>>> because there seemed be 99% overlap.
>>>>>>>>>>>
>>>>>>>>>>> Reuven
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <
>>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <re...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <
>>>>>>>>>>>> lcwik@google.com> wrote:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <
>>>>>>>>>>>> ruwang@google.com> wrote:
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> A few obvious problems with this code:
>>>>>>>>>>>> >>>>>   1. Removing the elements already processed from the bag
>>>>>>>>>>>> requires clearing and rewriting the entire bag. This is O(n^2) in the
>>>>>>>>>>>> number of input trades.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade
>>>>>>>>>>>> state?
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends State {
>>>>>>>>>>>> >>>>>   // Add a value to the map.
>>>>>>>>>>>> >>>>>   void put(K key, V value);
>>>>>>>>>>>> >>>>>   // Get all values for a given key.
>>>>>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>>>>>>>>>>>> >>>>>  // Return all entries in the map.
>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>>>>>>>>>>>> >>>>>   // Return all entries in the map with keys <= limit.
>>>>>>>>>>>> returned elements are sorted by the key.
>>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K limit);
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>>  // Remove all values with the given key;
>>>>>>>>>>>> >>>>>   void remove(K key);
>>>>>>>>>>>> >>>>>  // Remove all entries in the map with keys <= limit.
>>>>>>>>>>>> >>>>>   void removeUntil(K limit);
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will remove
>>>>>>>>>>>> all entries in the map with keys < limit.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Runners will sort based on the encoded value of the key.
>>>>>>>>>>>> In order to make this easier for users, I propose that we introduce a new
>>>>>>>>>>>> tag on Coders PreservesOrder. A Coder that contains this tag guarantees
>>>>>>>>>>>> that the encoded value preserves the same ordering as the base Java type.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> Could you clarify what is  "encoded value preserves the
>>>>>>>>>>>> same ordering as the base Java type"?
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Lets say A and B represent two different instances of the
>>>>>>>>>>>> same Java type like a double, then A < B (using the languages comparison
>>>>>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded versions are compared
>>>>>>>>>>>> lexicographically)
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Since coders are shared across SDKs, do we expect A < B iff
>>>>>>>>>>>> e(A) < e(P) property to hold for all languages we support? What happens A,
>>>>>>>>>>>> B sort differently in different languages?
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > That would have to be the property of the coder (which means
>>>>>>>>>>>> that this property probably needs to be represented in the portability
>>>>>>>>>>>> representation of the coder). I imagine the common use cases will be for
>>>>>>>>>>>> simple coders like int, long, string, etc., which are likely to sort the
>>>>>>>>>>>> same in most languages.
>>>>>>>>>>>>
>>>>>>>>>>>> The standard coders for both double and integral types do not
>>>>>>>>>>>> respect
>>>>>>>>>>>> the natural ordering (consider negative values). KV coders
>>>>>>>>>>>> violate the
>>>>>>>>>>>> "natural" lexicographic ordering on components as well. I think
>>>>>>>>>>>> implicitly sorting on encoded value would yield many surprises.
>>>>>>>>>>>> (The
>>>>>>>>>>>> state, of course, could take a order-preserving, bytes
>>>>>>>>>>>> (string?)-producing callable as a parameter of course). (As for
>>>>>>>>>>>> naming, I'd probably call this OrderedBagState or something like
>>>>>>>>>>>> that...rather than Map which tends to imply random access.)
>>>>>>>>>>>>
>>>>>>>>>>>

Re: DISCUSS: Sorted MapState API

Posted by Jan Lukavský <je...@seznam.cz>.

Big +1 for a BIP, as this might really help clarify all the pros and 
cons of all possibilities. There seem to be questions that need 
answering and motivating use cases - do we need sorted map state or can 
we solve our use cases by something simpler - e.g. the mentioned 
TimeSortedBagState? Does that really have to be time-sorted structure, 
or does it "only" have to have operations that can efficiently find and 
remove element with smallest timestamp (like a PriorityQueue)?

Jan

On 6/18/20 5:32 AM, Kenneth Knowles wrote:
> Zooming in from generic philosophy to be clear: adding time ordered 
> buffer to the Fn state API is *not* a shortcut.It has benefits that 
> will not be achieved by SDK-side implementation on top of either 
> ordered or unordered multimap. Are those benefits worth expanding 
> the API? I don't know.
>
> A change to allow a runner to have a specialized implementation for 
> time-buffered state would be one or more StateKey types, right? 
> Reuven, maybe put this and your Java API in a doc? A BIP? Seems like 
> there's at least the following to explore:
>
>  - how that Java API would map to an SDK-side implementation on top of 
> multimap state key
>  - how that Java API would map to a new StateKey
>  - whether there's actually more than one relevant implementation of 
> that StateKey
>  - whether SDK-side implementation on some other state key would be 
> performant enough in all SDK languages (present and future)
>
> Zooming back out to generic philosophy: Proliferation of StateKey 
> types tuned by runners (which can very easily still share 
> implementation) is probably better than proliferation of complex 
> SDK-side implementations with varying completeness and performance.
>
> Kenn
>
> On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <relax@google.com 
> <ma...@google.com>> wrote:
>
>     It might help for me to describe what I have in mind. I'm still
>     proposing that we build multimap, just not a globally-sorted
>     multimap.
>
>     My previous proposal was that we provide a Multimap<Key, Value>
>     state type, sorted by key. this would have two additional
>     operations - multimap.getRange(startKey, endKey) and
>     multimap.deleteRange(startKey, endKey). The primary use case was
>     timestamp sorting, but I felt that a sorted multimap provided a
>     nice generalization - after all, you can simply key the multimap
>     by timestamp to get timestamp sorting.
>
>     This approach had some issues immediately that would take some
>     work to solve. Since a multimap key can have any type and a runner
>     will only be able to sort by encoded type, we would need to
>     introduce a concept of order-preserving coders into Beam and
>     plumb that through. Robert pointed out that even our existing
>     standard coders for simple integral types don't preserve order, so
>     there will likely be surprises here.
>
>     My current proposal is for a multimap that is not sorted by key,
>     but that can support.ordered values for a single key. Remember
>     that a multimap maps K -> Iterable<V>, so this means that each
>     individual Iterable<V> is ordered, but the keys have no specific
>     order relative to each other. This is not too different from many
>     multimap implementations where the keys are unordered, but the
>     list of values for a single key at least has a stable order.
>
>     The interface would look like this:
>
>     public interface MultimapState<K, V> extends State {
>       // Add a value with a default timestamp.
>       void put(K key, V value);
>
>       // Add a timestamped value.
>       void put(K, key, TimestampedValue<V> value);
>
>       // Remove all values for a key.
>       void remove (K key);
>
>       // Remove all values for a key with timestamps within the
>     specified range.
>       void removeRange(K key, Instant startTs, Instant endTs);
>
>       // Get an Iterable of values for V. The Iterable will be
>     returned sorted by timestamp.
>     ReadableState<Iterable<TimestampedValue<V>>> get(K key);
>
>       // Get an Iterable of values for V in the specified range. The
>     Iterable will be returned sorted by timestamp.
>     ReadableState<Iterable<TimestampedValue<V>>> getRange(K key,
>     Instant startTs, Instant endTs);
>
>       ReadableState<Iterable<K>> keys();
>     ReadableState<Iterable<TimestampedValue<V>>> values();
>       ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>> entries;
>     }
>
>     We can of course provide helper functions that allow using
>     MultimapState without deailing with TimestampValue for users who
>     only want a multimap and don't want sorting.
>
>     I think many users will only need a single sorted list - not a
>     full multimap. It's worth offering this as well, and we can simply
>     build it on top of MultimapState. It will look like an extension
>     of BagState
>
>     public interface TimestampSortedListState<T> extends State {
>       void add(TimestampedValue<T> value);
>       Iterable<TimestampedValue<T>> read();
>       Iterable<TimestampedValue<T>> readRange(Instant startTs, Instant
>     endTs);
>       void clearRange(Instant startTs, Instant endTs);
>     }
>
>
>     On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lcwik@google.com
>     <ma...@google.com>> wrote:
>
>         The portability layer is meant to live across multiple
>         versions of Beam and I don't think it should be treated by
>         doing the simple and useful thing now since I believe it will
>         lead to a proliferation of the API.
>
>         On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles
>         <kenn@apache.org <ma...@apache.org>> wrote:
>
>             I have thoughts on the subject of whether to have APIs
>             just for the lowest-level building blocks versus having
>             APIs for higher-level constructs. Specifically this
>             applies to providing only unsorted multimap vs what I will
>             call "time-ordered buffer". TL;DR: I'd vote to focus on
>             time-ordered buffer; if it turns out to be easy to go all
>             the way to sorted multimap that's nice-to-have; if it
>             turns out to be easy to implement on top of unsorted map
>             state that should probably be under the hood
>
>             Reasons to build low-level multimap in the runner & fn api
>             and layer higher-level things in the SDK:
>
>              - It is less implementation for runners if they have to
>             only provide fewer lower-level building blocks like
>             multimap state.
>              - There are many more runners than SDKs (and will be even
>             more and more) so this saves overall.
>
>             Reasons to build higher-level constructs directly in the
>             runner and fn api:
>
>              - Having multiple higher-level state types may actually
>             be less implementation than one complex state type,
>             especially if they map to runner primitives.
>              - The runner may have better specialized implementations,
>             especially for something like a time-ordered buffer.
>              - The particular access patterns in an SDK-based
>             implementation may not be ideal for each runner's
>             underlying implementation of the low-level building block.
>              - There may be excessive gRPC overhead even for optimal
>             access patterns.
>
>             There are ways to have best of both worlds, like:
>
>             1. Define multiple state types according to fundamental
>             access patterns, like we did this before portability.
>             2. If it is easy to layer one on top of the other, do that
>             inside the runner. Provide shared code so for runners
>             providing the lowest-level primitive they get all the
>             types for free.
>
>             I understand that this is an oversimplification. It still
>             creates some more work. And APIs are a burden so it is
>             good to introduce as few as possible for maintenance. But
>             it has performance benefits and also unblocks "just doing
>             the simple and useful thing now" which I always like to do
>             as long as it is compatible with future changes. If the
>             APIs are fundamental, like sets, maps, timestamp ordering,
>             then it is safe to guess that they will change rarely and
>             be useful forever.
>
>             Kenn
>
>             On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik
>             <lcwik@google.com <ma...@google.com>> wrote:
>
>                 I would be glad to take a stab at how to provide
>                 sorting on top of unsorted multimap state.
>                 Based upon your description, you want integer keys
>                 representing timestamps and arbitrary user value for
>                 the values, is that correct?
>                 What kinds of operations do you need on the sorted map
>                 state in order of efficiency requirements?
>                 (e.g. Next(x), Previous(x), GetAll(Range[x, y)),
>                 ClearAll(Range[x, y))
>                 What kinds of operations do we expect the underlying
>                 unsorted map state to be able to provide?
>                 (at a minimum Get(K), Append(K), Clear(K) but what
>                 else e.g. enumerate(K)?)
>
>                 I went through a similar exercise of how to provide a
>                 list like side input view over a multimap[1] side
>                 input which efficiently allowed computation of size
>                 and provided random access while only having access to
>                 get(K) and enumerate K's.
>
>                 1:
>                 https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>
>                 On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax
>                 <relax@google.com <ma...@google.com>> wrote:
>
>                     Bringing this subject up again,
>
>                     I've spent some time looking into implementing
>                     this for the Dataflow runner. I'm unable to find a
>                     way to implement the arbitrary sorted multimap
>                     efficiently for the case where there are large
>                     numbers of unique keys. Since the primary driving
>                     use case is timestamp ordering (i.e. key is event
>                     timestamp), you would expect to have nearly a new
>                     key per element. I considered Luke's suggestion
>                     above, but unfortunately it doesn't really solve
>                     this issue.
>
>                     The primary use case for sorting always seems to
>                     be sorting by timestamp. I want to propose that
>                     instead of building the fully-general sorted
>                     multimap, we instead focus on a state type where
>                     the sort key is an integral type (like a timestamp
>                     or an integer). There is still a valid use case
>                     for multimap, but we can provide that as an
>                     unordered state. At least for Dataflow, it will be
>                     much easier
>
>                     While my difficulties here may be specific to the
>                     Dataflow runner, any such support would have to be
>                     built into other runners as well, and limiting to
>                     integral sorting likely makes it easier for other
>                     runners to implement this. Also, if you look at
>                     this
>                     <https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95> Flink
>                     comment pointed out by Aljoscha, for Flink the
>                     main use case identified was also timestamp
>                     sorting. This will also simplify the API design
>                     for this feature: Sorted multimap with arbitrary
>                     keys would require us to introduce a way of
>                     mapping natural ordering to encoded ordering (i.e.
>                     a new OrderPreservingCoder), but if we limit sort
>                     keys to integral types, the API design is simpler
>                     as integral types can be represented directly.
>
>                     Reuven
>
>                     On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax
>                     <relax@google.com <ma...@google.com>> wrote:
>
>                         This sounds to me like a potential runner
>                         strategy. However if a runner can natively
>                         support sorted maps (e.g. we expect the
>                         Dataflow runner to be able to do so, and I
>                         think it would be useful for other runners as
>                         well), then it's probably preferable to allow
>                         the runner to use its native capabilities.
>
>                         On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik
>                         <lcwik@google.com <ma...@google.com>>
>                         wrote:
>
>                             For the API that you proposed, the map key
>                             is always "void" and the sort key == user
>                             key. So in my example of
>                             key: dummy value
>                             key.000: token, (0001, value4)
>                             key.001: token, (0010, value1), (0011, value2)
>                             key.01: token
>                             key.1: token, (1011, value3)
>                             you would have:
>                             "void": dummy value
>                             "void".000: token, (0001, value4)
>                             "void".001: token, (0010, value1), (0011,
>                             value2)
>                             "void".01: token
>                             "void".1: token, (1011, value3)
>
>                             Iterable<KV<K, V>> entriesUntil(K limit)
>                             translates into walking the the prefixes
>                             until you find a common prefix for K and
>                             then filter for values where they have a
>                             sort key <= K. Using the example above, to
>                             find entriesUntil(0010) you would:
>                             look for key."", miss
>                             look for key.0, miss
>                             look for key.00, miss
>                             look for key.000, hit, sort all contained
>                             values using secondary key, provide value4
>                             to user
>                             look for key.001, hit, notice that 001 is
>                             a prefix of 0010 so we sort all contained
>                             values using secondary key, filter out
>                             value2 and provide value1
>
>                             void removeUntil(K limit) also translates
>                             into walking the prefixes but instead we
>                             will clear them when we have a "hit" with
>                             some special logic for when the sort key
>                             is a prefix of the key. Used the example,
>                             to removeUntil(0010) you would:
>                             look for key."", miss
>                             look for key.0, miss
>                             look for key.00, miss
>                             look for key.000, hit, clear
>                             look for key.001, hit, notice that 001 is
>                             a prefix of 0010 so we sort all contained
>                             values using secondary key, store in
>                             memory all values that > 0010, clear and
>                             append values stored in memory.
>
>                             On Fri, May 24, 2019 at 10:36 AM Reuven
>                             Lax <relax@google.com
>                             <ma...@google.com>> wrote:
>
>                                 Can you explain how fetching and
>                                 deleting ranges of keys would work
>                                 with this data structure?
>
>                                 On Fri, May 24, 2019 at 9:50 AM Lukasz
>                                 Cwik <lcwik@google.com
>                                 <ma...@google.com>> wrote:
>
>                                     Reuven, for the example, I assume
>                                     that we never want to store more
>                                     then 2 values at a given sort key
>                                     prefix, and if we do then we will
>                                     create a new longer prefix
>                                     splitting up the values based upon
>                                     the sort key.
>
>                                     Tuple representation in examples
>                                     below is (key, sort key, value)
>                                     and . is a character outside of
>                                     the alphabet which can be
>                                     represented by using an escaping
>                                     encoding that wraps the key + sort
>                                     key encoding.
>
>                                     To insert (key, 0010, value1), we
>                                     lookup "key" + all the prefixes of
>                                     0010 finding one that is not
>                                     empty. In this case its 0, so we
>                                     append value to the map at key.0
>                                     ending up with (we also set the
>                                     key to any dummy value to know
>                                     that it it contains values):
>                                     key: dummy value
>                                     key."": token, (0010, value1)
>                                     Now we insert (key, 0011, value2),
>                                     we again lookup "key" + all the
>                                     prefixes of 0010, finding "", so
>                                     we append value2 to key."" ending
>                                     up with:
>                                     key: dummy value
>                                     key."": token, (0010, value1),
>                                     (0011, value2)
>                                     Now we insert (key, 1011, value3),
>                                     we again lookup "key" + all the
>                                     prefixes of 1011 finding "" but
>                                     notice that it is full, so we
>                                     partition all the values into two
>                                     prefixes 0 and 1. We also clear
>                                     the "" prefix ending up with:
>                                     key: dummy value
>                                     key.0: token, (0010, value1),
>                                     (0011, value2)
>                                     key.1: token, (1011, value3)
>                                     Now we insert (key, 0001, value4),
>                                     we again lookup "key" + all the
>                                     prefixes of the value finding 0
>                                     but notice that it is full, so we
>                                     partition all the values into two
>                                     prefixes 00 and 01 but notice this
>                                     doesn't help us since 00 will be
>                                     too full so we split 00 again to
>                                     000, 001. We also clear the 0
>                                     prefix ending up with:
>                                     key: dummy value
>                                     key.000: token, (0001, value4)
>                                     key.001: token, (0010, value1),
>                                     (0011, value2)
>                                     key.01: token
>                                     key.1: token, (1011, value3)
>
>                                     We are effectively building a
>                                     trie[1] where we only have values
>                                     at the leaves and control how full
>                                     each leaf can be. There are other
>                                     trie representations like a radix
>                                     tree that may be better.
>
>                                     Looking up the values in sorted
>                                     order for "key" would go like this:
>                                     Is key set, yes
>                                     look for key."", miss
>                                     look for key.0, miss
>                                     look for key.00, miss
>                                     look for key.000, hit, sort all
>                                     contained values using secondary
>                                     key, provide value4 to user
>                                     look for key.001, hit, sort all
>                                     contained values using secondary
>                                     key, provide value1 followed by
>                                     value2 to user
>                                     look for key.01, hit, empty,
>                                     return no values to user
>                                     look for key.1, hit, sort all
>                                     contained values using secondary
>                                     key, provide value3 to user
>                                     we have walked the entire prefix
>                                     space, signal end of iterable
>
>                                     Some notes for the above:
>                                     * The dummy value is used to know
>                                     that the key contains values and
>                                     the token is to know whether there
>                                     are any values deeper in the trie
>                                     so when we know when to stop
>                                     searching.
>                                     * If we can recalculate the sort
>                                     key from the combination of the
>                                     key and value, then we don't need
>                                     to store it.
>                                     * Keys with lots of values will
>                                     perform worse then keys with less
>                                     values since we have to look up
>                                     more keys but they will be empty
>                                     reads. The number of misses can be
>                                     controlled by how many elements we
>                                     are willing to store at a given
>                                     node before we subdivide.
>
>                                     In reality you could build a lot
>                                     of structures (e.g. red black
>                                     tree, binary tree) using the sort
>                                     key, the issue is the cost of
>                                     rebalancing/re-organizing the
>                                     structure in map form and whether
>                                     it has a convenient pre-order
>                                     traversal for lookups.
>
>
>
>                                     On Fri, May 24, 2019 at 8:14 AM
>                                     Reuven Lax <relax@google.com
>                                     <ma...@google.com>> wrote:
>
>                                         Some great comments!
>
>                                         *Aljoscha*: absolutely this
>                                         would have to be implemented
>                                         by runners to be efficient. We
>                                         can of course provide a
>                                         default (inefficient)
>                                         implementation, but ideally
>                                         runners would provide better ones.
>
>                                         *Jan* Exactly. I think
>                                         MapState can be dropped or
>                                         backed by this. E.g.
>
>                                         *Robert* Great point about
>                                         standard coders not satisfying
>                                         this. That's why I suggested
>                                         that we provide a way to tag
>                                         the coders that do preserve
>                                         order, and only accept those
>                                         as key coders Alternatively we
>                                         could present a more limited
>                                         API - e.g. only allowing a
>                                         hard-coded set of types to be
>                                         used as keys - but that seems
>                                         counter to the direction Beam
>                                         usually goes. So users will
>                                         have two ways .of creating
>                                         multimap state specs:
>
>                                            private final
>                                         StateSpec<MultimapState<Long,
>                                         String>> state =
>                                         StateSpecs.multimap(VarLongCoder.of(),
>                                         StringUtf8Coder.of());
>
>                                         or
>                                            private final
>                                         StateSpec<MultimapState<Long,
>                                         String>> state =
>                                         StateSpecs.orderedMultimap(VarLongCoder.of(),
>                                         StringUtf8Coder.of());
>
>                                         The second one will validate
>                                         that the key coder preserves
>                                         order, and fails otherwise
>                                         (similar to coder determinism
>                                         checking in GroupByKey). (BTW
>                                         we would also have versions of
>                                         these functions that use coder
>                                         inference to "guess" the
>                                         coder, but those will do the
>                                         same checking)
>
>                                         Also the API I proposed did
>                                         support random access! We
>                                         could separate out
>                                         OrderedBagState again if we
>                                         think the use cases are
>                                         fundamentally different. I
>                                         merged the proposal into that
>                                         of MultimapState because there
>                                         seemed be 99% overlap.
>
>                                         Reuven
>
>                                         On Fri, May 24, 2019 at 6:19
>                                         AM Robert Bradshaw
>                                         <robertwb@google.com
>                                         <ma...@google.com>>
>                                         wrote:
>
>                                             On Fri, May 24, 2019 at
>                                             5:32 AM Reuven Lax
>                                             <relax@google.com
>                                             <ma...@google.com>>
>                                             wrote:
>                                             >
>                                             > On Thu, May 23, 2019 at
>                                             1:53 PM Ahmet Altay
>                                             <altay@google.com
>                                             <ma...@google.com>>
>                                             wrote:
>                                             >>
>                                             >>
>                                             >>
>                                             >> On Thu, May 23, 2019 at
>                                             1:38 PM Lukasz Cwik
>                                             <lcwik@google.com
>                                             <ma...@google.com>>
>                                             wrote:
>                                             >>>
>                                             >>>
>                                             >>>
>                                             >>> On Thu, May 23, 2019
>                                             at 11:37 AM Rui Wang
>                                             <ruwang@google.com
>                                             <ma...@google.com>>
>                                             wrote:
>                                             >>>>>
>                                             >>>>> A few obvious
>                                             problems with this code:
>                                             >>>>>  1. Removing the
>                                             elements already processed
>                                             from the bag requires
>                                             clearing and rewriting the
>                                             entire bag. This is O(n^2)
>                                             in the number of input trades.
>                                             >>>>
>                                             >>>> why it's not O(2 * n)
>                                             to clearing and rewriting
>                                             trade state?
>                                             >>>>
>                                             >>>>>
>                                             >>>>> public interface
>                                             SortedMultimapState<K, V>
>                                             extends State {
>                                             >>>>>  // Add a value to
>                                             the map.
>                                             >>>>>  void put(K key, V
>                                             value);
>                                             >>>>>  // Get all values
>                                             for a given key.
>                                             >>>>>
>                                              ReadableState<Iterable<V>>
>                                             get(K key);
>                                             >>>>> // Return all
>                                             entries in the map.
>                                             >>>>>
>                                              ReadableState<Iterable<KV<K,
>                                             V>>> allEntries();
>                                             >>>>>  // Return all
>                                             entries in the map with
>                                             keys <= limit. returned
>                                             elements are sorted by the
>                                             key.
>                                             >>>>>
>                                              ReadableState<Iterable<KV<K,
>                                             V>>> entriesUntil(K limit);
>                                             >>>>>
>                                             >>>>> // Remove all values
>                                             with the given key;
>                                             >>>>>  void remove(K key);
>                                             >>>>> // Remove all
>                                             entries in the map with
>                                             keys <= limit.
>                                             >>>>>  void removeUntil(K
>                                             limit);
>                                             >>>>
>                                             >>>> Will
>                                             removeUntilExcl(K limit)
>                                             also useful? It will
>                                             remove all entries in the
>                                             map with keys < limit.
>                                             >>>>
>                                             >>>>>
>                                             >>>>> Runners will sort
>                                             based on the encoded value
>                                             of the key. In order to
>                                             make this easier for
>                                             users, I propose that we
>                                             introduce a new tag on
>                                             Coders PreservesOrder. A
>                                             Coder that contains this
>                                             tag guarantees that the
>                                             encoded value preserves
>                                             the same ordering as the
>                                             base Java type.
>                                             >>>>
>                                             >>>>
>                                             >>>> Could you clarify
>                                             what is "encoded value
>                                             preserves the same
>                                             ordering as the base Java
>                                             type"?
>                                             >>>
>                                             >>>
>                                             >>> Lets say A and B
>                                             represent two different
>                                             instances of the same Java
>                                             type like a double, then A
>                                             < B (using the languages
>                                             comparison operator) iff
>                                             encode(A) < encode(B)
>                                             (note the encoded versions
>                                             are compared
>                                             lexicographically)
>                                             >>
>                                             >>
>                                             >> Since coders are shared
>                                             across SDKs, do we expect
>                                             A < B iff e(A) < e(P)
>                                             property to hold for all
>                                             languages we support? What
>                                             happens A, B sort
>                                             differently in different
>                                             languages?
>                                             >
>                                             >
>                                             > That would have to be
>                                             the property of the coder
>                                             (which means that this
>                                             property probably needs to
>                                             be represented in the
>                                             portability representation
>                                             of the coder). I imagine
>                                             the common use cases will
>                                             be for simple coders like
>                                             int, long, string, etc.,
>                                             which are likely to sort
>                                             the same in most languages.
>
>                                             The standard coders for
>                                             both double and integral
>                                             types do not respect
>                                             the natural ordering
>                                             (consider negative
>                                             values). KV coders violate the
>                                             "natural" lexicographic
>                                             ordering on components as
>                                             well. I think
>                                             implicitly sorting on
>                                             encoded value would yield
>                                             many surprises. (The
>                                             state, of course, could
>                                             take a order-preserving, bytes
>                                             (string?)-producing
>                                             callable as a parameter of
>                                             course). (As for
>                                             naming, I'd probably call
>                                             this OrderedBagState or
>                                             something like
>                                             that...rather than Map
>                                             which tends to imply
>                                             random access.)
>

Re: DISCUSS: Sorted MapState API

Posted by Kenneth Knowles <ke...@apache.org>.

Zooming in from generic philosophy to be clear: adding time ordered buffer
to the Fn state API is *not* a shortcut.It has benefits that will not be
achieved by SDK-side implementation on top of either ordered or unordered
multimap. Are those benefits worth expanding the API? I don't know.

A change to allow a runner to have a specialized implementation for
time-buffered state would be one or more StateKey types, right? Reuven,
maybe put this and your Java API in a doc? A BIP? Seems like there's at
least the following to explore:

 - how that Java API would map to an SDK-side implementation on top of
multimap state key
 - how that Java API would map to a new StateKey
 - whether there's actually more than one relevant implementation of that
StateKey
 - whether SDK-side implementation on some other state key would be
performant enough in all SDK languages (present and future)

Zooming back out to generic philosophy: Proliferation of StateKey
types tuned by runners (which can very easily still share implementation)
is probably better than proliferation of complex SDK-side implementations
with varying completeness and performance.

Kenn

On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <re...@google.com> wrote:

> It might help for me to describe what I have in mind. I'm still proposing
> that we build multimap, just not a globally-sorted multimap.
>
> My previous proposal was that we provide a Multimap<Key, Value> state
> type, sorted by key. this would have two additional operations -
> multimap.getRange(startKey, endKey) and multimap.deleteRange(startKey,
> endKey). The primary use case was timestamp sorting, but I felt that a
> sorted multimap provided a nice generalization - after all, you can simply
> key the multimap by timestamp to get timestamp sorting.
>
> This approach had some issues immediately that would take some work to
> solve. Since a multimap key can have any type and a runner will only be
> able to sort by encoded type, we would need to introduce a concept of
> order-preserving coders into Beam and plumb that through. Robert pointed
> out that even our existing standard coders for simple integral types don't
> preserve order, so there will likely be surprises here.
>
> My current proposal is for a multimap that is not sorted by key, but that
> can support.ordered values for a single key. Remember that a multimap maps
> K -> Iterable<V>, so this means that each individual Iterable<V> is
> ordered, but the keys have no specific order relative to each other. This
> is not too different from many multimap implementations where the keys are
> unordered, but the list of values for a single key at least has a stable
> order.
>
> The interface would look like this:
>
> public interface MultimapState<K, V> extends State {
>   // Add a value with a default timestamp.
>   void put(K key, V value);
>
>   // Add a timestamped value.
>   void put(K, key, TimestampedValue<V> value);
>
>   // Remove all values for a key.
>   void remove (K key);
>
>   // Remove all values for a key with timestamps within the specified
> range.
>   void removeRange(K key, Instant startTs, Instant endTs);
>
>   // Get an Iterable of values for V. The Iterable will be returned sorted
> by timestamp.
>   ReadableState<Iterable<TimestampedValue<V>>> get(K key);
>
>   // Get an Iterable of values for V in the specified range. The Iterable
> will be returned sorted by timestamp.
>   ReadableState<Iterable<TimestampedValue<V>>> getRange(K key, Instant
> startTs, Instant endTs);
>
>   ReadableState<Iterable<K>> keys();
>   ReadableState<Iterable<TimestampedValue<V>>> values();
>   ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>> entries;
> }
>
> We can of course provide helper functions that allow using MultimapState
> without deailing with TimestampValue for users who only want a multimap and
> don't want sorting.
>
> I think many users will only need a single sorted list - not a full
> multimap. It's worth offering this as well, and we can simply build it on
> top of MultimapState. It will look like an extension of BagState
>
> public interface TimestampSortedListState<T> extends State {
>   void add(TimestampedValue<T> value);
>   Iterable<TimestampedValue<T>> read();
>   Iterable<TimestampedValue<T>> readRange(Instant startTs, Instant endTs);
>   void clearRange(Instant startTs, Instant endTs);
> }
>
>
> On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lc...@google.com> wrote:
>
>> The portability layer is meant to live across multiple versions of Beam
>> and I don't think it should be treated by doing the simple and useful thing
>> now since I believe it will lead to a proliferation of the API.
>>
>> On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> I have thoughts on the subject of whether to have APIs just for the
>>> lowest-level building blocks versus having APIs for higher-level
>>> constructs. Specifically this applies to providing only unsorted multimap
>>> vs what I will call "time-ordered buffer". TL;DR: I'd vote to focus on
>>> time-ordered buffer; if it turns out to be easy to go all the way to sorted
>>> multimap that's nice-to-have; if it turns out to be easy to implement on
>>> top of unsorted map state that should probably be under the hood
>>>
>>> Reasons to build low-level multimap in the runner & fn api and layer
>>> higher-level things in the SDK:
>>>
>>>  - It is less implementation for runners if they have to only provide
>>> fewer lower-level building blocks like multimap state.
>>>  - There are many more runners than SDKs (and will be even more and
>>> more) so this saves overall.
>>>
>>> Reasons to build higher-level constructs directly in the runner and fn
>>> api:
>>>
>>>  - Having multiple higher-level state types may actually be less
>>> implementation than one complex state type, especially if they map to
>>> runner primitives.
>>>  - The runner may have better specialized implementations, especially
>>> for something like a time-ordered buffer.
>>>  - The particular access patterns in an SDK-based implementation may not
>>> be ideal for each runner's underlying implementation of the low-level
>>> building block.
>>>  - There may be excessive gRPC overhead even for optimal access patterns.
>>>
>>> There are ways to have best of both worlds, like:
>>>
>>> 1. Define multiple state types according to fundamental access patterns,
>>> like we did this before portability.
>>> 2. If it is easy to layer one on top of the other, do that inside the
>>> runner. Provide shared code so for runners providing the lowest-level
>>> primitive they get all the types for free.
>>>
>>> I understand that this is an oversimplification. It still creates some
>>> more work. And APIs are a burden so it is good to introduce as few as
>>> possible for maintenance. But it has performance benefits and also unblocks
>>> "just doing the simple and useful thing now" which I always like to do as
>>> long as it is compatible with future changes. If the APIs are fundamental,
>>> like sets, maps, timestamp ordering, then it is safe to guess that they
>>> will change rarely and be useful forever.
>>>
>>> Kenn
>>>
>>> On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> I would be glad to take a stab at how to provide sorting on top of
>>>> unsorted multimap state.
>>>> Based upon your description, you want integer keys representing
>>>> timestamps and arbitrary user value for the values, is that correct?
>>>> What kinds of operations do you need on the sorted map state in order
>>>> of efficiency requirements?
>>>> (e.g. Next(x), Previous(x), GetAll(Range[x, y)), ClearAll(Range[x, y))
>>>> What kinds of operations do we expect the underlying unsorted map state
>>>> to be able to provide?
>>>> (at a minimum Get(K), Append(K), Clear(K) but what else e.g.
>>>> enumerate(K)?)
>>>>
>>>> I went through a similar exercise of how to provide a list like side
>>>> input view over a multimap[1] side input which efficiently allowed
>>>> computation of size and provided random access while only having access to
>>>> get(K) and enumerate K's.
>>>>
>>>> 1:
>>>> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>>>>
>>>> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Bringing this subject up again,
>>>>>
>>>>> I've spent some time looking into implementing this for the Dataflow
>>>>> runner. I'm unable to find a way to implement the arbitrary sorted multimap
>>>>> efficiently for the case where there are large numbers of unique keys.
>>>>> Since the primary driving use case is timestamp ordering (i.e. key is event
>>>>> timestamp), you would expect to have nearly a new key per element. I
>>>>> considered Luke's suggestion above, but unfortunately it doesn't really
>>>>> solve this issue.
>>>>>
>>>>> The primary use case for sorting always seems to be sorting by
>>>>> timestamp. I want to propose that instead of building the fully-general
>>>>> sorted multimap, we instead focus on a state type where the sort key is an
>>>>> integral type (like a timestamp or an integer). There is still a valid use
>>>>> case for multimap, but we can provide that as an unordered state. At least
>>>>> for Dataflow, it will be much easier
>>>>>
>>>>> While my difficulties here may be specific to the Dataflow runner, any
>>>>> such support would have to be built into other runners as well, and
>>>>> limiting to integral sorting likely makes it easier for other runners to
>>>>> implement this. Also, if you look at this
>>>>> <https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95> Flink
>>>>> comment pointed out by Aljoscha, for Flink the main use case identified was
>>>>> also timestamp sorting. This will also simplify the API design for this
>>>>> feature: Sorted multimap with arbitrary keys would require us to introduce
>>>>> a way of mapping natural ordering to encoded ordering (i.e. a new
>>>>> OrderPreservingCoder), but if we limit sort keys to integral types, the API
>>>>> design is simpler as integral types can be represented directly.
>>>>>
>>>>> Reuven
>>>>>
>>>>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> This sounds to me like a potential runner strategy. However if a
>>>>>> runner can natively support sorted maps (e.g. we expect the Dataflow runner
>>>>>> to be able to do so, and I think it would be useful for other runners as
>>>>>> well), then it's probably preferable to allow the runner to use its native
>>>>>> capabilities.
>>>>>>
>>>>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lc...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> For the API that you proposed, the map key is always "void" and the
>>>>>>> sort key == user key. So in my example of
>>>>>>> key: dummy value
>>>>>>> key.000: token, (0001, value4)
>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>>>> key.01: token
>>>>>>> key.1: token, (1011, value3)
>>>>>>> you would have:
>>>>>>> "void": dummy value
>>>>>>> "void".000: token, (0001, value4)
>>>>>>> "void".001: token, (0010, value1), (0011, value2)
>>>>>>> "void".01: token
>>>>>>> "void".1: token, (1011, value3)
>>>>>>>
>>>>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into walking the
>>>>>>> the prefixes until you find a common prefix for K and then filter for
>>>>>>> values where they have a sort key <= K. Using the example above, to find
>>>>>>> entriesUntil(0010) you would:
>>>>>>> look for key."", miss
>>>>>>> look for key.0, miss
>>>>>>> look for key.00, miss
>>>>>>> look for key.000, hit, sort all contained values using secondary
>>>>>>> key, provide value4 to user
>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we
>>>>>>> sort all contained values using secondary key, filter out value2 and
>>>>>>> provide value1
>>>>>>>
>>>>>>> void removeUntil(K limit) also translates into walking the prefixes
>>>>>>> but instead we will clear them when we have a "hit" with some special logic
>>>>>>> for when the sort key is a prefix of the key. Used the example, to
>>>>>>> removeUntil(0010) you would:
>>>>>>> look for key."", miss
>>>>>>> look for key.0, miss
>>>>>>> look for key.00, miss
>>>>>>> look for key.000, hit, clear
>>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we
>>>>>>> sort all contained values using secondary key, store in memory all values
>>>>>>> that > 0010, clear and append values stored in memory.
>>>>>>>
>>>>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <re...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Can you explain how fetching and deleting ranges of keys would work
>>>>>>>> with this data structure?
>>>>>>>>
>>>>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <lc...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Reuven, for the example, I assume that we never want to store more
>>>>>>>>> then 2 values at a given sort key prefix, and if we do then we will create
>>>>>>>>> a new longer prefix splitting up the values based upon the sort key.
>>>>>>>>>
>>>>>>>>> Tuple representation in examples below is (key, sort key, value)
>>>>>>>>> and . is a character outside of the alphabet which can be represented by
>>>>>>>>> using an escaping encoding that wraps the key + sort key encoding.
>>>>>>>>>
>>>>>>>>> To insert (key, 0010, value1), we lookup "key" + all the prefixes
>>>>>>>>> of 0010 finding one that is not empty. In this case its 0, so we append
>>>>>>>>> value to the map at key.0 ending up with (we also set the key to any dummy
>>>>>>>>> value to know that it it contains values):
>>>>>>>>> key: dummy value
>>>>>>>>> key."": token, (0010, value1)
>>>>>>>>> Now we insert (key, 0011, value2), we again lookup "key" + all the
>>>>>>>>> prefixes of 0010, finding "", so we append value2 to key."" ending up with:
>>>>>>>>> key: dummy value
>>>>>>>>> key."": token, (0010, value1), (0011, value2)
>>>>>>>>> Now we insert (key, 1011, value3), we again lookup "key" + all the
>>>>>>>>> prefixes of 1011 finding "" but notice that it is full, so we partition all
>>>>>>>>> the values into two prefixes 0 and 1. We also clear the "" prefix ending up
>>>>>>>>> with:
>>>>>>>>> key: dummy value
>>>>>>>>> key.0: token, (0010, value1), (0011, value2)
>>>>>>>>> key.1: token, (1011, value3)
>>>>>>>>> Now we insert (key, 0001, value4), we again lookup "key" + all the
>>>>>>>>> prefixes of the value finding 0 but notice that it is full, so we partition
>>>>>>>>> all the values into two prefixes 00 and 01 but notice this doesn't help us
>>>>>>>>> since 00 will be too full so we split 00 again to 000, 001. We also clear
>>>>>>>>> the 0 prefix ending up with:
>>>>>>>>> key: dummy value
>>>>>>>>> key.000: token, (0001, value4)
>>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>>>>>> key.01: token
>>>>>>>>> key.1: token, (1011, value3)
>>>>>>>>>
>>>>>>>>> We are effectively building a trie[1] where we only have values at
>>>>>>>>> the leaves and control how full each leaf can be. There are other trie
>>>>>>>>> representations like a radix tree that may be better.
>>>>>>>>>
>>>>>>>>> Looking up the values in sorted order for "key" would go like this:
>>>>>>>>> Is key set, yes
>>>>>>>>> look for key."", miss
>>>>>>>>> look for key.0, miss
>>>>>>>>> look for key.00, miss
>>>>>>>>> look for key.000, hit, sort all contained values using secondary
>>>>>>>>> key, provide value4 to user
>>>>>>>>> look for key.001, hit, sort all contained values using secondary
>>>>>>>>> key, provide value1 followed by value2 to user
>>>>>>>>> look for key.01, hit, empty, return no values to user
>>>>>>>>> look for key.1, hit, sort all contained values using secondary
>>>>>>>>> key, provide value3 to user
>>>>>>>>> we have walked the entire prefix space, signal end of iterable
>>>>>>>>>
>>>>>>>>> Some notes for the above:
>>>>>>>>> * The dummy value is used to know that the key contains values and
>>>>>>>>> the token is to know whether there are any values deeper in the trie so
>>>>>>>>> when we know when to stop searching.
>>>>>>>>> * If we can recalculate the sort key from the combination of the
>>>>>>>>> key and value, then we don't need to store it.
>>>>>>>>> * Keys with lots of values will perform worse then keys with less
>>>>>>>>> values since we have to look up more keys but they will be empty reads. The
>>>>>>>>> number of misses can be controlled by how many elements we are willing to
>>>>>>>>> store at a given node before we subdivide.
>>>>>>>>>
>>>>>>>>> In reality you could build a lot of structures (e.g. red black
>>>>>>>>> tree, binary tree) using the sort key, the issue is the cost of
>>>>>>>>> rebalancing/re-organizing the structure in map form and whether it has a
>>>>>>>>> convenient pre-order traversal for lookups.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Some great comments!
>>>>>>>>>>
>>>>>>>>>> *Aljoscha*: absolutely this would have to be implemented by
>>>>>>>>>> runners to be efficient. We can of course provide a default (inefficient)
>>>>>>>>>> implementation, but ideally runners would provide better ones.
>>>>>>>>>>
>>>>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed by
>>>>>>>>>> this. E.g.
>>>>>>>>>>
>>>>>>>>>> *Robert* Great point about standard coders not satisfying this.
>>>>>>>>>> That's why I suggested that we provide a way to tag the coders that do
>>>>>>>>>> preserve order, and only accept those as key coders Alternatively we could
>>>>>>>>>> present a more limited API - e.g. only allowing a hard-coded set of types
>>>>>>>>>> to be used as keys - but that seems counter to the direction Beam usually
>>>>>>>>>> goes. So users will have two ways .of creating multimap state specs:
>>>>>>>>>>
>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>>>>> StateSpecs.multimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>>>>
>>>>>>>>>> or
>>>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>>>>
>>>>>>>>>> The second one will validate that the key coder preserves order,
>>>>>>>>>> and fails otherwise (similar to coder determinism checking in GroupByKey).
>>>>>>>>>> (BTW we would also have versions of these functions that use coder
>>>>>>>>>> inference to "guess" the coder, but those will do the same checking)
>>>>>>>>>>
>>>>>>>>>> Also the API I proposed did support random access! We could
>>>>>>>>>> separate out OrderedBagState again if we think the use cases are
>>>>>>>>>> fundamentally different. I merged the proposal into that of MultimapState
>>>>>>>>>> because there seemed be 99% overlap.
>>>>>>>>>>
>>>>>>>>>> Reuven
>>>>>>>>>>
>>>>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <
>>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <re...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <lc...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <ru...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> A few obvious problems with this code:
>>>>>>>>>>> >>>>>   1. Removing the elements already processed from the bag
>>>>>>>>>>> requires clearing and rewriting the entire bag. This is O(n^2) in the
>>>>>>>>>>> number of input trades.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade state?
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends State {
>>>>>>>>>>> >>>>>   // Add a value to the map.
>>>>>>>>>>> >>>>>   void put(K key, V value);
>>>>>>>>>>> >>>>>   // Get all values for a given key.
>>>>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>>>>>>>>>>> >>>>>  // Return all entries in the map.
>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>>>>>>>>>>> >>>>>   // Return all entries in the map with keys <= limit.
>>>>>>>>>>> returned elements are sorted by the key.
>>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K limit);
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>  // Remove all values with the given key;
>>>>>>>>>>> >>>>>   void remove(K key);
>>>>>>>>>>> >>>>>  // Remove all entries in the map with keys <= limit.
>>>>>>>>>>> >>>>>   void removeUntil(K limit);
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will remove
>>>>>>>>>>> all entries in the map with keys < limit.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Runners will sort based on the encoded value of the key.
>>>>>>>>>>> In order to make this easier for users, I propose that we introduce a new
>>>>>>>>>>> tag on Coders PreservesOrder. A Coder that contains this tag guarantees
>>>>>>>>>>> that the encoded value preserves the same ordering as the base Java type.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Could you clarify what is  "encoded value preserves the
>>>>>>>>>>> same ordering as the base Java type"?
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Lets say A and B represent two different instances of the
>>>>>>>>>>> same Java type like a double, then A < B (using the languages comparison
>>>>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded versions are compared
>>>>>>>>>>> lexicographically)
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> Since coders are shared across SDKs, do we expect A < B iff
>>>>>>>>>>> e(A) < e(P) property to hold for all languages we support? What happens A,
>>>>>>>>>>> B sort differently in different languages?
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > That would have to be the property of the coder (which means
>>>>>>>>>>> that this property probably needs to be represented in the portability
>>>>>>>>>>> representation of the coder). I imagine the common use cases will be for
>>>>>>>>>>> simple coders like int, long, string, etc., which are likely to sort the
>>>>>>>>>>> same in most languages.
>>>>>>>>>>>
>>>>>>>>>>> The standard coders for both double and integral types do not
>>>>>>>>>>> respect
>>>>>>>>>>> the natural ordering (consider negative values). KV coders
>>>>>>>>>>> violate the
>>>>>>>>>>> "natural" lexicographic ordering on components as well. I think
>>>>>>>>>>> implicitly sorting on encoded value would yield many surprises.
>>>>>>>>>>> (The
>>>>>>>>>>> state, of course, could take a order-preserving, bytes
>>>>>>>>>>> (string?)-producing callable as a parameter of course). (As for
>>>>>>>>>>> naming, I'd probably call this OrderedBagState or something like
>>>>>>>>>>> that...rather than Map which tends to imply random access.)
>>>>>>>>>>>
>>>>>>>>>>

Re: DISCUSS: Sorted MapState API

Posted by Reuven Lax <re...@google.com>.

It might help for me to describe what I have in mind. I'm still proposing
that we build multimap, just not a globally-sorted multimap.

My previous proposal was that we provide a Multimap<Key, Value> state type,
sorted by key. this would have two additional operations -
multimap.getRange(startKey, endKey) and multimap.deleteRange(startKey,
endKey). The primary use case was timestamp sorting, but I felt that a
sorted multimap provided a nice generalization - after all, you can simply
key the multimap by timestamp to get timestamp sorting.

This approach had some issues immediately that would take some work to
solve. Since a multimap key can have any type and a runner will only be
able to sort by encoded type, we would need to introduce a concept of
order-preserving coders into Beam and plumb that through. Robert pointed
out that even our existing standard coders for simple integral types don't
preserve order, so there will likely be surprises here.

My current proposal is for a multimap that is not sorted by key, but that
can support.ordered values for a single key. Remember that a multimap maps
K -> Iterable<V>, so this means that each individual Iterable<V> is
ordered, but the keys have no specific order relative to each other. This
is not too different from many multimap implementations where the keys are
unordered, but the list of values for a single key at least has a stable
order.

The interface would look like this:

public interface MultimapState<K, V> extends State {
  // Add a value with a default timestamp.
  void put(K key, V value);

  // Add a timestamped value.
  void put(K, key, TimestampedValue<V> value);

  // Remove all values for a key.
  void remove (K key);

  // Remove all values for a key with timestamps within the specified range.
  void removeRange(K key, Instant startTs, Instant endTs);

  // Get an Iterable of values for V. The Iterable will be returned sorted
by timestamp.
  ReadableState<Iterable<TimestampedValue<V>>> get(K key);

  // Get an Iterable of values for V in the specified range. The Iterable
will be returned sorted by timestamp.
  ReadableState<Iterable<TimestampedValue<V>>> getRange(K key, Instant
startTs, Instant endTs);

  ReadableState<Iterable<K>> keys();
  ReadableState<Iterable<TimestampedValue<V>>> values();
  ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>> entries;
}

We can of course provide helper functions that allow using MultimapState
without deailing with TimestampValue for users who only want a multimap and
don't want sorting.

I think many users will only need a single sorted list - not a full
multimap. It's worth offering this as well, and we can simply build it on
top of MultimapState. It will look like an extension of BagState

public interface TimestampSortedListState<T> extends State {
  void add(TimestampedValue<T> value);
  Iterable<TimestampedValue<T>> read();
  Iterable<TimestampedValue<T>> readRange(Instant startTs, Instant endTs);
  void clearRange(Instant startTs, Instant endTs);
}


On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <lc...@google.com> wrote:

> The portability layer is meant to live across multiple versions of Beam
> and I don't think it should be treated by doing the simple and useful thing
> now since I believe it will lead to a proliferation of the API.
>
> On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> I have thoughts on the subject of whether to have APIs just for the
>> lowest-level building blocks versus having APIs for higher-level
>> constructs. Specifically this applies to providing only unsorted multimap
>> vs what I will call "time-ordered buffer". TL;DR: I'd vote to focus on
>> time-ordered buffer; if it turns out to be easy to go all the way to sorted
>> multimap that's nice-to-have; if it turns out to be easy to implement on
>> top of unsorted map state that should probably be under the hood
>>
>> Reasons to build low-level multimap in the runner & fn api and layer
>> higher-level things in the SDK:
>>
>>  - It is less implementation for runners if they have to only provide
>> fewer lower-level building blocks like multimap state.
>>  - There are many more runners than SDKs (and will be even more and more)
>> so this saves overall.
>>
>> Reasons to build higher-level constructs directly in the runner and fn
>> api:
>>
>>  - Having multiple higher-level state types may actually be less
>> implementation than one complex state type, especially if they map to
>> runner primitives.
>>  - The runner may have better specialized implementations, especially for
>> something like a time-ordered buffer.
>>  - The particular access patterns in an SDK-based implementation may not
>> be ideal for each runner's underlying implementation of the low-level
>> building block.
>>  - There may be excessive gRPC overhead even for optimal access patterns.
>>
>> There are ways to have best of both worlds, like:
>>
>> 1. Define multiple state types according to fundamental access patterns,
>> like we did this before portability.
>> 2. If it is easy to layer one on top of the other, do that inside the
>> runner. Provide shared code so for runners providing the lowest-level
>> primitive they get all the types for free.
>>
>> I understand that this is an oversimplification. It still creates some
>> more work. And APIs are a burden so it is good to introduce as few as
>> possible for maintenance. But it has performance benefits and also unblocks
>> "just doing the simple and useful thing now" which I always like to do as
>> long as it is compatible with future changes. If the APIs are fundamental,
>> like sets, maps, timestamp ordering, then it is safe to guess that they
>> will change rarely and be useful forever.
>>
>> Kenn
>>
>> On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com> wrote:
>>
>>> I would be glad to take a stab at how to provide sorting on top of
>>> unsorted multimap state.
>>> Based upon your description, you want integer keys representing
>>> timestamps and arbitrary user value for the values, is that correct?
>>> What kinds of operations do you need on the sorted map state in order of
>>> efficiency requirements?
>>> (e.g. Next(x), Previous(x), GetAll(Range[x, y)), ClearAll(Range[x, y))
>>> What kinds of operations do we expect the underlying unsorted map state
>>> to be able to provide?
>>> (at a minimum Get(K), Append(K), Clear(K) but what else e.g.
>>> enumerate(K)?)
>>>
>>> I went through a similar exercise of how to provide a list like side
>>> input view over a multimap[1] side input which efficiently allowed
>>> computation of size and provided random access while only having access to
>>> get(K) and enumerate K's.
>>>
>>> 1:
>>> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>>>
>>> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> Bringing this subject up again,
>>>>
>>>> I've spent some time looking into implementing this for the Dataflow
>>>> runner. I'm unable to find a way to implement the arbitrary sorted multimap
>>>> efficiently for the case where there are large numbers of unique keys.
>>>> Since the primary driving use case is timestamp ordering (i.e. key is event
>>>> timestamp), you would expect to have nearly a new key per element. I
>>>> considered Luke's suggestion above, but unfortunately it doesn't really
>>>> solve this issue.
>>>>
>>>> The primary use case for sorting always seems to be sorting by
>>>> timestamp. I want to propose that instead of building the fully-general
>>>> sorted multimap, we instead focus on a state type where the sort key is an
>>>> integral type (like a timestamp or an integer). There is still a valid use
>>>> case for multimap, but we can provide that as an unordered state. At least
>>>> for Dataflow, it will be much easier
>>>>
>>>> While my difficulties here may be specific to the Dataflow runner, any
>>>> such support would have to be built into other runners as well, and
>>>> limiting to integral sorting likely makes it easier for other runners to
>>>> implement this. Also, if you look at this
>>>> <https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95> Flink
>>>> comment pointed out by Aljoscha, for Flink the main use case identified was
>>>> also timestamp sorting. This will also simplify the API design for this
>>>> feature: Sorted multimap with arbitrary keys would require us to introduce
>>>> a way of mapping natural ordering to encoded ordering (i.e. a new
>>>> OrderPreservingCoder), but if we limit sort keys to integral types, the API
>>>> design is simpler as integral types can be represented directly.
>>>>
>>>> Reuven
>>>>
>>>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> This sounds to me like a potential runner strategy. However if a
>>>>> runner can natively support sorted maps (e.g. we expect the Dataflow runner
>>>>> to be able to do so, and I think it would be useful for other runners as
>>>>> well), then it's probably preferable to allow the runner to use its native
>>>>> capabilities.
>>>>>
>>>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> For the API that you proposed, the map key is always "void" and the
>>>>>> sort key == user key. So in my example of
>>>>>> key: dummy value
>>>>>> key.000: token, (0001, value4)
>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>>> key.01: token
>>>>>> key.1: token, (1011, value3)
>>>>>> you would have:
>>>>>> "void": dummy value
>>>>>> "void".000: token, (0001, value4)
>>>>>> "void".001: token, (0010, value1), (0011, value2)
>>>>>> "void".01: token
>>>>>> "void".1: token, (1011, value3)
>>>>>>
>>>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into walking the
>>>>>> the prefixes until you find a common prefix for K and then filter for
>>>>>> values where they have a sort key <= K. Using the example above, to find
>>>>>> entriesUntil(0010) you would:
>>>>>> look for key."", miss
>>>>>> look for key.0, miss
>>>>>> look for key.00, miss
>>>>>> look for key.000, hit, sort all contained values using secondary key,
>>>>>> provide value4 to user
>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we sort
>>>>>> all contained values using secondary key, filter out value2 and provide
>>>>>> value1
>>>>>>
>>>>>> void removeUntil(K limit) also translates into walking the prefixes
>>>>>> but instead we will clear them when we have a "hit" with some special logic
>>>>>> for when the sort key is a prefix of the key. Used the example, to
>>>>>> removeUntil(0010) you would:
>>>>>> look for key."", miss
>>>>>> look for key.0, miss
>>>>>> look for key.00, miss
>>>>>> look for key.000, hit, clear
>>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we sort
>>>>>> all contained values using secondary key, store in memory all values that >
>>>>>> 0010, clear and append values stored in memory.
>>>>>>
>>>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> Can you explain how fetching and deleting ranges of keys would work
>>>>>>> with this data structure?
>>>>>>>
>>>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <lc...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Reuven, for the example, I assume that we never want to store more
>>>>>>>> then 2 values at a given sort key prefix, and if we do then we will create
>>>>>>>> a new longer prefix splitting up the values based upon the sort key.
>>>>>>>>
>>>>>>>> Tuple representation in examples below is (key, sort key, value)
>>>>>>>> and . is a character outside of the alphabet which can be represented by
>>>>>>>> using an escaping encoding that wraps the key + sort key encoding.
>>>>>>>>
>>>>>>>> To insert (key, 0010, value1), we lookup "key" + all the prefixes
>>>>>>>> of 0010 finding one that is not empty. In this case its 0, so we append
>>>>>>>> value to the map at key.0 ending up with (we also set the key to any dummy
>>>>>>>> value to know that it it contains values):
>>>>>>>> key: dummy value
>>>>>>>> key."": token, (0010, value1)
>>>>>>>> Now we insert (key, 0011, value2), we again lookup "key" + all the
>>>>>>>> prefixes of 0010, finding "", so we append value2 to key."" ending up with:
>>>>>>>> key: dummy value
>>>>>>>> key."": token, (0010, value1), (0011, value2)
>>>>>>>> Now we insert (key, 1011, value3), we again lookup "key" + all the
>>>>>>>> prefixes of 1011 finding "" but notice that it is full, so we partition all
>>>>>>>> the values into two prefixes 0 and 1. We also clear the "" prefix ending up
>>>>>>>> with:
>>>>>>>> key: dummy value
>>>>>>>> key.0: token, (0010, value1), (0011, value2)
>>>>>>>> key.1: token, (1011, value3)
>>>>>>>> Now we insert (key, 0001, value4), we again lookup "key" + all the
>>>>>>>> prefixes of the value finding 0 but notice that it is full, so we partition
>>>>>>>> all the values into two prefixes 00 and 01 but notice this doesn't help us
>>>>>>>> since 00 will be too full so we split 00 again to 000, 001. We also clear
>>>>>>>> the 0 prefix ending up with:
>>>>>>>> key: dummy value
>>>>>>>> key.000: token, (0001, value4)
>>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>>>>> key.01: token
>>>>>>>> key.1: token, (1011, value3)
>>>>>>>>
>>>>>>>> We are effectively building a trie[1] where we only have values at
>>>>>>>> the leaves and control how full each leaf can be. There are other trie
>>>>>>>> representations like a radix tree that may be better.
>>>>>>>>
>>>>>>>> Looking up the values in sorted order for "key" would go like this:
>>>>>>>> Is key set, yes
>>>>>>>> look for key."", miss
>>>>>>>> look for key.0, miss
>>>>>>>> look for key.00, miss
>>>>>>>> look for key.000, hit, sort all contained values using secondary
>>>>>>>> key, provide value4 to user
>>>>>>>> look for key.001, hit, sort all contained values using secondary
>>>>>>>> key, provide value1 followed by value2 to user
>>>>>>>> look for key.01, hit, empty, return no values to user
>>>>>>>> look for key.1, hit, sort all contained values using secondary key,
>>>>>>>> provide value3 to user
>>>>>>>> we have walked the entire prefix space, signal end of iterable
>>>>>>>>
>>>>>>>> Some notes for the above:
>>>>>>>> * The dummy value is used to know that the key contains values and
>>>>>>>> the token is to know whether there are any values deeper in the trie so
>>>>>>>> when we know when to stop searching.
>>>>>>>> * If we can recalculate the sort key from the combination of the
>>>>>>>> key and value, then we don't need to store it.
>>>>>>>> * Keys with lots of values will perform worse then keys with less
>>>>>>>> values since we have to look up more keys but they will be empty reads. The
>>>>>>>> number of misses can be controlled by how many elements we are willing to
>>>>>>>> store at a given node before we subdivide.
>>>>>>>>
>>>>>>>> In reality you could build a lot of structures (e.g. red black
>>>>>>>> tree, binary tree) using the sort key, the issue is the cost of
>>>>>>>> rebalancing/re-organizing the structure in map form and whether it has a
>>>>>>>> convenient pre-order traversal for lookups.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Some great comments!
>>>>>>>>>
>>>>>>>>> *Aljoscha*: absolutely this would have to be implemented by
>>>>>>>>> runners to be efficient. We can of course provide a default (inefficient)
>>>>>>>>> implementation, but ideally runners would provide better ones.
>>>>>>>>>
>>>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed by this.
>>>>>>>>> E.g.
>>>>>>>>>
>>>>>>>>> *Robert* Great point about standard coders not satisfying this.
>>>>>>>>> That's why I suggested that we provide a way to tag the coders that do
>>>>>>>>> preserve order, and only accept those as key coders Alternatively we could
>>>>>>>>> present a more limited API - e.g. only allowing a hard-coded set of types
>>>>>>>>> to be used as keys - but that seems counter to the direction Beam usually
>>>>>>>>> goes. So users will have two ways .of creating multimap state specs:
>>>>>>>>>
>>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>>>> StateSpecs.multimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>>>
>>>>>>>>> or
>>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>>>
>>>>>>>>> The second one will validate that the key coder preserves order,
>>>>>>>>> and fails otherwise (similar to coder determinism checking in GroupByKey).
>>>>>>>>> (BTW we would also have versions of these functions that use coder
>>>>>>>>> inference to "guess" the coder, but those will do the same checking)
>>>>>>>>>
>>>>>>>>> Also the API I proposed did support random access! We could
>>>>>>>>> separate out OrderedBagState again if we think the use cases are
>>>>>>>>> fundamentally different. I merged the proposal into that of MultimapState
>>>>>>>>> because there seemed be 99% overlap.
>>>>>>>>>
>>>>>>>>> Reuven
>>>>>>>>>
>>>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <
>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <re...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <lc...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <ru...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> A few obvious problems with this code:
>>>>>>>>>> >>>>>   1. Removing the elements already processed from the bag
>>>>>>>>>> requires clearing and rewriting the entire bag. This is O(n^2) in the
>>>>>>>>>> number of input trades.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade state?
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends State {
>>>>>>>>>> >>>>>   // Add a value to the map.
>>>>>>>>>> >>>>>   void put(K key, V value);
>>>>>>>>>> >>>>>   // Get all values for a given key.
>>>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>>>>>>>>>> >>>>>  // Return all entries in the map.
>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>>>>>>>>>> >>>>>   // Return all entries in the map with keys <= limit.
>>>>>>>>>> returned elements are sorted by the key.
>>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K limit);
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>  // Remove all values with the given key;
>>>>>>>>>> >>>>>   void remove(K key);
>>>>>>>>>> >>>>>  // Remove all entries in the map with keys <= limit.
>>>>>>>>>> >>>>>   void removeUntil(K limit);
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will remove
>>>>>>>>>> all entries in the map with keys < limit.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Runners will sort based on the encoded value of the key. In
>>>>>>>>>> order to make this easier for users, I propose that we introduce a new tag
>>>>>>>>>> on Coders PreservesOrder. A Coder that contains this tag guarantees that
>>>>>>>>>> the encoded value preserves the same ordering as the base Java type.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Could you clarify what is  "encoded value preserves the same
>>>>>>>>>> ordering as the base Java type"?
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>> Lets say A and B represent two different instances of the
>>>>>>>>>> same Java type like a double, then A < B (using the languages comparison
>>>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded versions are compared
>>>>>>>>>> lexicographically)
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> Since coders are shared across SDKs, do we expect A < B iff
>>>>>>>>>> e(A) < e(P) property to hold for all languages we support? What happens A,
>>>>>>>>>> B sort differently in different languages?
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > That would have to be the property of the coder (which means
>>>>>>>>>> that this property probably needs to be represented in the portability
>>>>>>>>>> representation of the coder). I imagine the common use cases will be for
>>>>>>>>>> simple coders like int, long, string, etc., which are likely to sort the
>>>>>>>>>> same in most languages.
>>>>>>>>>>
>>>>>>>>>> The standard coders for both double and integral types do not
>>>>>>>>>> respect
>>>>>>>>>> the natural ordering (consider negative values). KV coders
>>>>>>>>>> violate the
>>>>>>>>>> "natural" lexicographic ordering on components as well. I think
>>>>>>>>>> implicitly sorting on encoded value would yield many surprises.
>>>>>>>>>> (The
>>>>>>>>>> state, of course, could take a order-preserving, bytes
>>>>>>>>>> (string?)-producing callable as a parameter of course). (As for
>>>>>>>>>> naming, I'd probably call this OrderedBagState or something like
>>>>>>>>>> that...rather than Map which tends to imply random access.)
>>>>>>>>>>
>>>>>>>>>

Re: DISCUSS: Sorted MapState API

Posted by Luke Cwik <lc...@google.com>.

The portability layer is meant to live across multiple versions of Beam and
I don't think it should be treated by doing the simple and useful thing now
since I believe it will lead to a proliferation of the API.

On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles <ke...@apache.org> wrote:

> I have thoughts on the subject of whether to have APIs just for the
> lowest-level building blocks versus having APIs for higher-level
> constructs. Specifically this applies to providing only unsorted multimap
> vs what I will call "time-ordered buffer". TL;DR: I'd vote to focus on
> time-ordered buffer; if it turns out to be easy to go all the way to sorted
> multimap that's nice-to-have; if it turns out to be easy to implement on
> top of unsorted map state that should probably be under the hood
>
> Reasons to build low-level multimap in the runner & fn api and layer
> higher-level things in the SDK:
>
>  - It is less implementation for runners if they have to only provide
> fewer lower-level building blocks like multimap state.
>  - There are many more runners than SDKs (and will be even more and more)
> so this saves overall.
>
> Reasons to build higher-level constructs directly in the runner and fn api:
>
>  - Having multiple higher-level state types may actually be less
> implementation than one complex state type, especially if they map to
> runner primitives.
>  - The runner may have better specialized implementations, especially for
> something like a time-ordered buffer.
>  - The particular access patterns in an SDK-based implementation may not
> be ideal for each runner's underlying implementation of the low-level
> building block.
>  - There may be excessive gRPC overhead even for optimal access patterns.
>
> There are ways to have best of both worlds, like:
>
> 1. Define multiple state types according to fundamental access patterns,
> like we did this before portability.
> 2. If it is easy to layer one on top of the other, do that inside the
> runner. Provide shared code so for runners providing the lowest-level
> primitive they get all the types for free.
>
> I understand that this is an oversimplification. It still creates some
> more work. And APIs are a burden so it is good to introduce as few as
> possible for maintenance. But it has performance benefits and also unblocks
> "just doing the simple and useful thing now" which I always like to do as
> long as it is compatible with future changes. If the APIs are fundamental,
> like sets, maps, timestamp ordering, then it is safe to guess that they
> will change rarely and be useful forever.
>
> Kenn
>
> On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com> wrote:
>
>> I would be glad to take a stab at how to provide sorting on top of
>> unsorted multimap state.
>> Based upon your description, you want integer keys representing
>> timestamps and arbitrary user value for the values, is that correct?
>> What kinds of operations do you need on the sorted map state in order of
>> efficiency requirements?
>> (e.g. Next(x), Previous(x), GetAll(Range[x, y)), ClearAll(Range[x, y))
>> What kinds of operations do we expect the underlying unsorted map state
>> to be able to provide?
>> (at a minimum Get(K), Append(K), Clear(K) but what else e.g.
>> enumerate(K)?)
>>
>> I went through a similar exercise of how to provide a list like side
>> input view over a multimap[1] side input which efficiently allowed
>> computation of size and provided random access while only having access to
>> get(K) and enumerate K's.
>>
>> 1:
>> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>>
>> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com> wrote:
>>
>>> Bringing this subject up again,
>>>
>>> I've spent some time looking into implementing this for the Dataflow
>>> runner. I'm unable to find a way to implement the arbitrary sorted multimap
>>> efficiently for the case where there are large numbers of unique keys.
>>> Since the primary driving use case is timestamp ordering (i.e. key is event
>>> timestamp), you would expect to have nearly a new key per element. I
>>> considered Luke's suggestion above, but unfortunately it doesn't really
>>> solve this issue.
>>>
>>> The primary use case for sorting always seems to be sorting by
>>> timestamp. I want to propose that instead of building the fully-general
>>> sorted multimap, we instead focus on a state type where the sort key is an
>>> integral type (like a timestamp or an integer). There is still a valid use
>>> case for multimap, but we can provide that as an unordered state. At least
>>> for Dataflow, it will be much easier
>>>
>>> While my difficulties here may be specific to the Dataflow runner, any
>>> such support would have to be built into other runners as well, and
>>> limiting to integral sorting likely makes it easier for other runners to
>>> implement this. Also, if you look at this
>>> <https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95> Flink
>>> comment pointed out by Aljoscha, for Flink the main use case identified was
>>> also timestamp sorting. This will also simplify the API design for this
>>> feature: Sorted multimap with arbitrary keys would require us to introduce
>>> a way of mapping natural ordering to encoded ordering (i.e. a new
>>> OrderPreservingCoder), but if we limit sort keys to integral types, the API
>>> design is simpler as integral types can be represented directly.
>>>
>>> Reuven
>>>
>>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> This sounds to me like a potential runner strategy. However if a runner
>>>> can natively support sorted maps (e.g. we expect the Dataflow runner to be
>>>> able to do so, and I think it would be useful for other runners as well),
>>>> then it's probably preferable to allow the runner to use its native
>>>> capabilities.
>>>>
>>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> For the API that you proposed, the map key is always "void" and the
>>>>> sort key == user key. So in my example of
>>>>> key: dummy value
>>>>> key.000: token, (0001, value4)
>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>> key.01: token
>>>>> key.1: token, (1011, value3)
>>>>> you would have:
>>>>> "void": dummy value
>>>>> "void".000: token, (0001, value4)
>>>>> "void".001: token, (0010, value1), (0011, value2)
>>>>> "void".01: token
>>>>> "void".1: token, (1011, value3)
>>>>>
>>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into walking the
>>>>> the prefixes until you find a common prefix for K and then filter for
>>>>> values where they have a sort key <= K. Using the example above, to find
>>>>> entriesUntil(0010) you would:
>>>>> look for key."", miss
>>>>> look for key.0, miss
>>>>> look for key.00, miss
>>>>> look for key.000, hit, sort all contained values using secondary key,
>>>>> provide value4 to user
>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we sort
>>>>> all contained values using secondary key, filter out value2 and provide
>>>>> value1
>>>>>
>>>>> void removeUntil(K limit) also translates into walking the prefixes
>>>>> but instead we will clear them when we have a "hit" with some special logic
>>>>> for when the sort key is a prefix of the key. Used the example, to
>>>>> removeUntil(0010) you would:
>>>>> look for key."", miss
>>>>> look for key.0, miss
>>>>> look for key.00, miss
>>>>> look for key.000, hit, clear
>>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we sort
>>>>> all contained values using secondary key, store in memory all values that >
>>>>> 0010, clear and append values stored in memory.
>>>>>
>>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> Can you explain how fetching and deleting ranges of keys would work
>>>>>> with this data structure?
>>>>>>
>>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>>> Reuven, for the example, I assume that we never want to store more
>>>>>>> then 2 values at a given sort key prefix, and if we do then we will create
>>>>>>> a new longer prefix splitting up the values based upon the sort key.
>>>>>>>
>>>>>>> Tuple representation in examples below is (key, sort key, value) and
>>>>>>> . is a character outside of the alphabet which can be represented by using
>>>>>>> an escaping encoding that wraps the key + sort key encoding.
>>>>>>>
>>>>>>> To insert (key, 0010, value1), we lookup "key" + all the prefixes of
>>>>>>> 0010 finding one that is not empty. In this case its 0, so we append value
>>>>>>> to the map at key.0 ending up with (we also set the key to any dummy value
>>>>>>> to know that it it contains values):
>>>>>>> key: dummy value
>>>>>>> key."": token, (0010, value1)
>>>>>>> Now we insert (key, 0011, value2), we again lookup "key" + all the
>>>>>>> prefixes of 0010, finding "", so we append value2 to key."" ending up with:
>>>>>>> key: dummy value
>>>>>>> key."": token, (0010, value1), (0011, value2)
>>>>>>> Now we insert (key, 1011, value3), we again lookup "key" + all the
>>>>>>> prefixes of 1011 finding "" but notice that it is full, so we partition all
>>>>>>> the values into two prefixes 0 and 1. We also clear the "" prefix ending up
>>>>>>> with:
>>>>>>> key: dummy value
>>>>>>> key.0: token, (0010, value1), (0011, value2)
>>>>>>> key.1: token, (1011, value3)
>>>>>>> Now we insert (key, 0001, value4), we again lookup "key" + all the
>>>>>>> prefixes of the value finding 0 but notice that it is full, so we partition
>>>>>>> all the values into two prefixes 00 and 01 but notice this doesn't help us
>>>>>>> since 00 will be too full so we split 00 again to 000, 001. We also clear
>>>>>>> the 0 prefix ending up with:
>>>>>>> key: dummy value
>>>>>>> key.000: token, (0001, value4)
>>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>>>> key.01: token
>>>>>>> key.1: token, (1011, value3)
>>>>>>>
>>>>>>> We are effectively building a trie[1] where we only have values at
>>>>>>> the leaves and control how full each leaf can be. There are other trie
>>>>>>> representations like a radix tree that may be better.
>>>>>>>
>>>>>>> Looking up the values in sorted order for "key" would go like this:
>>>>>>> Is key set, yes
>>>>>>> look for key."", miss
>>>>>>> look for key.0, miss
>>>>>>> look for key.00, miss
>>>>>>> look for key.000, hit, sort all contained values using secondary
>>>>>>> key, provide value4 to user
>>>>>>> look for key.001, hit, sort all contained values using secondary
>>>>>>> key, provide value1 followed by value2 to user
>>>>>>> look for key.01, hit, empty, return no values to user
>>>>>>> look for key.1, hit, sort all contained values using secondary key,
>>>>>>> provide value3 to user
>>>>>>> we have walked the entire prefix space, signal end of iterable
>>>>>>>
>>>>>>> Some notes for the above:
>>>>>>> * The dummy value is used to know that the key contains values and
>>>>>>> the token is to know whether there are any values deeper in the trie so
>>>>>>> when we know when to stop searching.
>>>>>>> * If we can recalculate the sort key from the combination of the key
>>>>>>> and value, then we don't need to store it.
>>>>>>> * Keys with lots of values will perform worse then keys with less
>>>>>>> values since we have to look up more keys but they will be empty reads. The
>>>>>>> number of misses can be controlled by how many elements we are willing to
>>>>>>> store at a given node before we subdivide.
>>>>>>>
>>>>>>> In reality you could build a lot of structures (e.g. red black tree,
>>>>>>> binary tree) using the sort key, the issue is the cost of
>>>>>>> rebalancing/re-organizing the structure in map form and whether it has a
>>>>>>> convenient pre-order traversal for lookups.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> Some great comments!
>>>>>>>>
>>>>>>>> *Aljoscha*: absolutely this would have to be implemented by
>>>>>>>> runners to be efficient. We can of course provide a default (inefficient)
>>>>>>>> implementation, but ideally runners would provide better ones.
>>>>>>>>
>>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed by this.
>>>>>>>> E.g.
>>>>>>>>
>>>>>>>> *Robert* Great point about standard coders not satisfying this.
>>>>>>>> That's why I suggested that we provide a way to tag the coders that do
>>>>>>>> preserve order, and only accept those as key coders Alternatively we could
>>>>>>>> present a more limited API - e.g. only allowing a hard-coded set of types
>>>>>>>> to be used as keys - but that seems counter to the direction Beam usually
>>>>>>>> goes. So users will have two ways .of creating multimap state specs:
>>>>>>>>
>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>>> StateSpecs.multimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>>
>>>>>>>> or
>>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>>
>>>>>>>> The second one will validate that the key coder preserves order,
>>>>>>>> and fails otherwise (similar to coder determinism checking in GroupByKey).
>>>>>>>> (BTW we would also have versions of these functions that use coder
>>>>>>>> inference to "guess" the coder, but those will do the same checking)
>>>>>>>>
>>>>>>>> Also the API I proposed did support random access! We could
>>>>>>>> separate out OrderedBagState again if we think the use cases are
>>>>>>>> fundamentally different. I merged the proposal into that of MultimapState
>>>>>>>> because there seemed be 99% overlap.
>>>>>>>>
>>>>>>>> Reuven
>>>>>>>>
>>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <
>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>
>>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <lc...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <ru...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> A few obvious problems with this code:
>>>>>>>>> >>>>>   1. Removing the elements already processed from the bag
>>>>>>>>> requires clearing and rewriting the entire bag. This is O(n^2) in the
>>>>>>>>> number of input trades.
>>>>>>>>> >>>>
>>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade state?
>>>>>>>>> >>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends State {
>>>>>>>>> >>>>>   // Add a value to the map.
>>>>>>>>> >>>>>   void put(K key, V value);
>>>>>>>>> >>>>>   // Get all values for a given key.
>>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>>>>>>>>> >>>>>  // Return all entries in the map.
>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>>>>>>>>> >>>>>   // Return all entries in the map with keys <= limit.
>>>>>>>>> returned elements are sorted by the key.
>>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K limit);
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  // Remove all values with the given key;
>>>>>>>>> >>>>>   void remove(K key);
>>>>>>>>> >>>>>  // Remove all entries in the map with keys <= limit.
>>>>>>>>> >>>>>   void removeUntil(K limit);
>>>>>>>>> >>>>
>>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will remove all
>>>>>>>>> entries in the map with keys < limit.
>>>>>>>>> >>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Runners will sort based on the encoded value of the key. In
>>>>>>>>> order to make this easier for users, I propose that we introduce a new tag
>>>>>>>>> on Coders PreservesOrder. A Coder that contains this tag guarantees that
>>>>>>>>> the encoded value preserves the same ordering as the base Java type.
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> >>>> Could you clarify what is  "encoded value preserves the same
>>>>>>>>> ordering as the base Java type"?
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> Lets say A and B represent two different instances of the same
>>>>>>>>> Java type like a double, then A < B (using the languages comparison
>>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded versions are compared
>>>>>>>>> lexicographically)
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> Since coders are shared across SDKs, do we expect A < B iff
>>>>>>>>> e(A) < e(P) property to hold for all languages we support? What happens A,
>>>>>>>>> B sort differently in different languages?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > That would have to be the property of the coder (which means
>>>>>>>>> that this property probably needs to be represented in the portability
>>>>>>>>> representation of the coder). I imagine the common use cases will be for
>>>>>>>>> simple coders like int, long, string, etc., which are likely to sort the
>>>>>>>>> same in most languages.
>>>>>>>>>
>>>>>>>>> The standard coders for both double and integral types do not
>>>>>>>>> respect
>>>>>>>>> the natural ordering (consider negative values). KV coders violate
>>>>>>>>> the
>>>>>>>>> "natural" lexicographic ordering on components as well. I think
>>>>>>>>> implicitly sorting on encoded value would yield many surprises.
>>>>>>>>> (The
>>>>>>>>> state, of course, could take a order-preserving, bytes
>>>>>>>>> (string?)-producing callable as a parameter of course). (As for
>>>>>>>>> naming, I'd probably call this OrderedBagState or something like
>>>>>>>>> that...rather than Map which tends to imply random access.)
>>>>>>>>>
>>>>>>>>

Re: DISCUSS: Sorted MapState API

Posted by Kenneth Knowles <ke...@apache.org>.

I have thoughts on the subject of whether to have APIs just for the
lowest-level building blocks versus having APIs for higher-level
constructs. Specifically this applies to providing only unsorted multimap
vs what I will call "time-ordered buffer". TL;DR: I'd vote to focus on
time-ordered buffer; if it turns out to be easy to go all the way to sorted
multimap that's nice-to-have; if it turns out to be easy to implement on
top of unsorted map state that should probably be under the hood

Reasons to build low-level multimap in the runner & fn api and layer
higher-level things in the SDK:

 - It is less implementation for runners if they have to only provide fewer
lower-level building blocks like multimap state.
 - There are many more runners than SDKs (and will be even more and more)
so this saves overall.

Reasons to build higher-level constructs directly in the runner and fn api:

 - Having multiple higher-level state types may actually be less
implementation than one complex state type, especially if they map to
runner primitives.
 - The runner may have better specialized implementations, especially for
something like a time-ordered buffer.
 - The particular access patterns in an SDK-based implementation may not be
ideal for each runner's underlying implementation of the low-level building
block.
 - There may be excessive gRPC overhead even for optimal access patterns.

There are ways to have best of both worlds, like:

1. Define multiple state types according to fundamental access patterns,
like we did this before portability.
2. If it is easy to layer one on top of the other, do that inside the
runner. Provide shared code so for runners providing the lowest-level
primitive they get all the types for free.

I understand that this is an oversimplification. It still creates some more
work. And APIs are a burden so it is good to introduce as few as possible
for maintenance. But it has performance benefits and also unblocks "just
doing the simple and useful thing now" which I always like to do as long as
it is compatible with future changes. If the APIs are fundamental, like
sets, maps, timestamp ordering, then it is safe to guess that they will
change rarely and be useful forever.

Kenn

On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik <lc...@google.com> wrote:

> I would be glad to take a stab at how to provide sorting on top of
> unsorted multimap state.
> Based upon your description, you want integer keys representing timestamps
> and arbitrary user value for the values, is that correct?
> What kinds of operations do you need on the sorted map state in order of
> efficiency requirements?
> (e.g. Next(x), Previous(x), GetAll(Range[x, y)), ClearAll(Range[x, y))
> What kinds of operations do we expect the underlying unsorted map state to
> be able to provide?
> (at a minimum Get(K), Append(K), Clear(K) but what else e.g. enumerate(K)?)
>
> I went through a similar exercise of how to provide a list like side input
> view over a multimap[1] side input which efficiently allowed computation of
> size and provided random access while only having access to get(K) and
> enumerate K's.
>
> 1:
> https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568
>
> On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com> wrote:
>
>> Bringing this subject up again,
>>
>> I've spent some time looking into implementing this for the Dataflow
>> runner. I'm unable to find a way to implement the arbitrary sorted multimap
>> efficiently for the case where there are large numbers of unique keys.
>> Since the primary driving use case is timestamp ordering (i.e. key is event
>> timestamp), you would expect to have nearly a new key per element. I
>> considered Luke's suggestion above, but unfortunately it doesn't really
>> solve this issue.
>>
>> The primary use case for sorting always seems to be sorting by timestamp.
>> I want to propose that instead of building the fully-general sorted
>> multimap, we instead focus on a state type where the sort key is an
>> integral type (like a timestamp or an integer). There is still a valid use
>> case for multimap, but we can provide that as an unordered state. At least
>> for Dataflow, it will be much easier
>>
>> While my difficulties here may be specific to the Dataflow runner, any
>> such support would have to be built into other runners as well, and
>> limiting to integral sorting likely makes it easier for other runners to
>> implement this. Also, if you look at this
>> <https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95> Flink
>> comment pointed out by Aljoscha, for Flink the main use case identified was
>> also timestamp sorting. This will also simplify the API design for this
>> feature: Sorted multimap with arbitrary keys would require us to introduce
>> a way of mapping natural ordering to encoded ordering (i.e. a new
>> OrderPreservingCoder), but if we limit sort keys to integral types, the API
>> design is simpler as integral types can be represented directly.
>>
>> Reuven
>>
>> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com> wrote:
>>
>>> This sounds to me like a potential runner strategy. However if a runner
>>> can natively support sorted maps (e.g. we expect the Dataflow runner to be
>>> able to do so, and I think it would be useful for other runners as well),
>>> then it's probably preferable to allow the runner to use its native
>>> capabilities.
>>>
>>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> For the API that you proposed, the map key is always "void" and the
>>>> sort key == user key. So in my example of
>>>> key: dummy value
>>>> key.000: token, (0001, value4)
>>>> key.001: token, (0010, value1), (0011, value2)
>>>> key.01: token
>>>> key.1: token, (1011, value3)
>>>> you would have:
>>>> "void": dummy value
>>>> "void".000: token, (0001, value4)
>>>> "void".001: token, (0010, value1), (0011, value2)
>>>> "void".01: token
>>>> "void".1: token, (1011, value3)
>>>>
>>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into walking the
>>>> the prefixes until you find a common prefix for K and then filter for
>>>> values where they have a sort key <= K. Using the example above, to find
>>>> entriesUntil(0010) you would:
>>>> look for key."", miss
>>>> look for key.0, miss
>>>> look for key.00, miss
>>>> look for key.000, hit, sort all contained values using secondary key,
>>>> provide value4 to user
>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we sort
>>>> all contained values using secondary key, filter out value2 and provide
>>>> value1
>>>>
>>>> void removeUntil(K limit) also translates into walking the prefixes but
>>>> instead we will clear them when we have a "hit" with some special logic for
>>>> when the sort key is a prefix of the key. Used the example, to
>>>> removeUntil(0010) you would:
>>>> look for key."", miss
>>>> look for key.0, miss
>>>> look for key.00, miss
>>>> look for key.000, hit, clear
>>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we sort
>>>> all contained values using secondary key, store in memory all values that >
>>>> 0010, clear and append values stored in memory.
>>>>
>>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Can you explain how fetching and deleting ranges of keys would work
>>>>> with this data structure?
>>>>>
>>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> Reuven, for the example, I assume that we never want to store more
>>>>>> then 2 values at a given sort key prefix, and if we do then we will create
>>>>>> a new longer prefix splitting up the values based upon the sort key.
>>>>>>
>>>>>> Tuple representation in examples below is (key, sort key, value) and
>>>>>> . is a character outside of the alphabet which can be represented by using
>>>>>> an escaping encoding that wraps the key + sort key encoding.
>>>>>>
>>>>>> To insert (key, 0010, value1), we lookup "key" + all the prefixes of
>>>>>> 0010 finding one that is not empty. In this case its 0, so we append value
>>>>>> to the map at key.0 ending up with (we also set the key to any dummy value
>>>>>> to know that it it contains values):
>>>>>> key: dummy value
>>>>>> key."": token, (0010, value1)
>>>>>> Now we insert (key, 0011, value2), we again lookup "key" + all the
>>>>>> prefixes of 0010, finding "", so we append value2 to key."" ending up with:
>>>>>> key: dummy value
>>>>>> key."": token, (0010, value1), (0011, value2)
>>>>>> Now we insert (key, 1011, value3), we again lookup "key" + all the
>>>>>> prefixes of 1011 finding "" but notice that it is full, so we partition all
>>>>>> the values into two prefixes 0 and 1. We also clear the "" prefix ending up
>>>>>> with:
>>>>>> key: dummy value
>>>>>> key.0: token, (0010, value1), (0011, value2)
>>>>>> key.1: token, (1011, value3)
>>>>>> Now we insert (key, 0001, value4), we again lookup "key" + all the
>>>>>> prefixes of the value finding 0 but notice that it is full, so we partition
>>>>>> all the values into two prefixes 00 and 01 but notice this doesn't help us
>>>>>> since 00 will be too full so we split 00 again to 000, 001. We also clear
>>>>>> the 0 prefix ending up with:
>>>>>> key: dummy value
>>>>>> key.000: token, (0001, value4)
>>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>>> key.01: token
>>>>>> key.1: token, (1011, value3)
>>>>>>
>>>>>> We are effectively building a trie[1] where we only have values at
>>>>>> the leaves and control how full each leaf can be. There are other trie
>>>>>> representations like a radix tree that may be better.
>>>>>>
>>>>>> Looking up the values in sorted order for "key" would go like this:
>>>>>> Is key set, yes
>>>>>> look for key."", miss
>>>>>> look for key.0, miss
>>>>>> look for key.00, miss
>>>>>> look for key.000, hit, sort all contained values using secondary key,
>>>>>> provide value4 to user
>>>>>> look for key.001, hit, sort all contained values using secondary key,
>>>>>> provide value1 followed by value2 to user
>>>>>> look for key.01, hit, empty, return no values to user
>>>>>> look for key.1, hit, sort all contained values using secondary key,
>>>>>> provide value3 to user
>>>>>> we have walked the entire prefix space, signal end of iterable
>>>>>>
>>>>>> Some notes for the above:
>>>>>> * The dummy value is used to know that the key contains values and
>>>>>> the token is to know whether there are any values deeper in the trie so
>>>>>> when we know when to stop searching.
>>>>>> * If we can recalculate the sort key from the combination of the key
>>>>>> and value, then we don't need to store it.
>>>>>> * Keys with lots of values will perform worse then keys with less
>>>>>> values since we have to look up more keys but they will be empty reads. The
>>>>>> number of misses can be controlled by how many elements we are willing to
>>>>>> store at a given node before we subdivide.
>>>>>>
>>>>>> In reality you could build a lot of structures (e.g. red black tree,
>>>>>> binary tree) using the sort key, the issue is the cost of
>>>>>> rebalancing/re-organizing the structure in map form and whether it has a
>>>>>> convenient pre-order traversal for lookups.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> Some great comments!
>>>>>>>
>>>>>>> *Aljoscha*: absolutely this would have to be implemented by runners
>>>>>>> to be efficient. We can of course provide a default (inefficient)
>>>>>>> implementation, but ideally runners would provide better ones.
>>>>>>>
>>>>>>> *Jan* Exactly. I think MapState can be dropped or backed by this.
>>>>>>> E.g.
>>>>>>>
>>>>>>> *Robert* Great point about standard coders not satisfying this.
>>>>>>> That's why I suggested that we provide a way to tag the coders that do
>>>>>>> preserve order, and only accept those as key coders Alternatively we could
>>>>>>> present a more limited API - e.g. only allowing a hard-coded set of types
>>>>>>> to be used as keys - but that seems counter to the direction Beam usually
>>>>>>> goes. So users will have two ways .of creating multimap state specs:
>>>>>>>
>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>> StateSpecs.multimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>
>>>>>>> or
>>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>>
>>>>>>> The second one will validate that the key coder preserves order, and
>>>>>>> fails otherwise (similar to coder determinism checking in GroupByKey). (BTW
>>>>>>> we would also have versions of these functions that use coder inference to
>>>>>>> "guess" the coder, but those will do the same checking)
>>>>>>>
>>>>>>> Also the API I proposed did support random access! We could separate
>>>>>>> out OrderedBagState again if we think the use cases are fundamentally
>>>>>>> different. I merged the proposal into that of MultimapState because there
>>>>>>> seemed be 99% overlap.
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <ro...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <lc...@google.com>
>>>>>>>> wrote:
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <ru...@google.com>
>>>>>>>> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> A few obvious problems with this code:
>>>>>>>> >>>>>   1. Removing the elements already processed from the bag
>>>>>>>> requires clearing and rewriting the entire bag. This is O(n^2) in the
>>>>>>>> number of input trades.
>>>>>>>> >>>>
>>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade state?
>>>>>>>> >>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends State {
>>>>>>>> >>>>>   // Add a value to the map.
>>>>>>>> >>>>>   void put(K key, V value);
>>>>>>>> >>>>>   // Get all values for a given key.
>>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>>>>>>>> >>>>>  // Return all entries in the map.
>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>>>>>>>> >>>>>   // Return all entries in the map with keys <= limit.
>>>>>>>> returned elements are sorted by the key.
>>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K limit);
>>>>>>>> >>>>>
>>>>>>>> >>>>>  // Remove all values with the given key;
>>>>>>>> >>>>>   void remove(K key);
>>>>>>>> >>>>>  // Remove all entries in the map with keys <= limit.
>>>>>>>> >>>>>   void removeUntil(K limit);
>>>>>>>> >>>>
>>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will remove all
>>>>>>>> entries in the map with keys < limit.
>>>>>>>> >>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> Runners will sort based on the encoded value of the key. In
>>>>>>>> order to make this easier for users, I propose that we introduce a new tag
>>>>>>>> on Coders PreservesOrder. A Coder that contains this tag guarantees that
>>>>>>>> the encoded value preserves the same ordering as the base Java type.
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> Could you clarify what is  "encoded value preserves the same
>>>>>>>> ordering as the base Java type"?
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> Lets say A and B represent two different instances of the same
>>>>>>>> Java type like a double, then A < B (using the languages comparison
>>>>>>>> operator) iff encode(A) < encode(B) (note the encoded versions are compared
>>>>>>>> lexicographically)
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> Since coders are shared across SDKs, do we expect A < B iff e(A)
>>>>>>>> < e(P) property to hold for all languages we support? What happens A, B
>>>>>>>> sort differently in different languages?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > That would have to be the property of the coder (which means that
>>>>>>>> this property probably needs to be represented in the portability
>>>>>>>> representation of the coder). I imagine the common use cases will be for
>>>>>>>> simple coders like int, long, string, etc., which are likely to sort the
>>>>>>>> same in most languages.
>>>>>>>>
>>>>>>>> The standard coders for both double and integral types do not
>>>>>>>> respect
>>>>>>>> the natural ordering (consider negative values). KV coders violate
>>>>>>>> the
>>>>>>>> "natural" lexicographic ordering on components as well. I think
>>>>>>>> implicitly sorting on encoded value would yield many surprises. (The
>>>>>>>> state, of course, could take a order-preserving, bytes
>>>>>>>> (string?)-producing callable as a parameter of course). (As for
>>>>>>>> naming, I'd probably call this OrderedBagState or something like
>>>>>>>> that...rather than Map which tends to imply random access.)
>>>>>>>>
>>>>>>>

Re: DISCUSS: Sorted MapState API

Posted by Luke Cwik <lc...@google.com>.

I would be glad to take a stab at how to provide sorting on top of unsorted
multimap state.
Based upon your description, you want integer keys representing timestamps
and arbitrary user value for the values, is that correct?
What kinds of operations do you need on the sorted map state in order of
efficiency requirements?
(e.g. Next(x), Previous(x), GetAll(Range[x, y)), ClearAll(Range[x, y))
What kinds of operations do we expect the underlying unsorted map state to
be able to provide?
(at a minimum Get(K), Append(K), Clear(K) but what else e.g. enumerate(K)?)

I went through a similar exercise of how to provide a list like side input
view over a multimap[1] side input which efficiently allowed computation of
size and provided random access while only having access to get(K) and
enumerate K's.

1:
https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568

On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax <re...@google.com> wrote:

> Bringing this subject up again,
>
> I've spent some time looking into implementing this for the Dataflow
> runner. I'm unable to find a way to implement the arbitrary sorted multimap
> efficiently for the case where there are large numbers of unique keys.
> Since the primary driving use case is timestamp ordering (i.e. key is event
> timestamp), you would expect to have nearly a new key per element. I
> considered Luke's suggestion above, but unfortunately it doesn't really
> solve this issue.
>
> The primary use case for sorting always seems to be sorting by timestamp.
> I want to propose that instead of building the fully-general sorted
> multimap, we instead focus on a state type where the sort key is an
> integral type (like a timestamp or an integer). There is still a valid use
> case for multimap, but we can provide that as an unordered state. At least
> for Dataflow, it will be much easier
>
> While my difficulties here may be specific to the Dataflow runner, any
> such support would have to be built into other runners as well, and
> limiting to integral sorting likely makes it easier for other runners to
> implement this. Also, if you look at this
> <https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95> Flink
> comment pointed out by Aljoscha, for Flink the main use case identified was
> also timestamp sorting. This will also simplify the API design for this
> feature: Sorted multimap with arbitrary keys would require us to introduce
> a way of mapping natural ordering to encoded ordering (i.e. a new
> OrderPreservingCoder), but if we limit sort keys to integral types, the API
> design is simpler as integral types can be represented directly.
>
> Reuven
>
> On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax <re...@google.com> wrote:
>
>> This sounds to me like a potential runner strategy. However if a runner
>> can natively support sorted maps (e.g. we expect the Dataflow runner to be
>> able to do so, and I think it would be useful for other runners as well),
>> then it's probably preferable to allow the runner to use its native
>> capabilities.
>>
>> On Fri, May 24, 2019 at 11:05 AM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> For the API that you proposed, the map key is always "void" and the sort
>>> key == user key. So in my example of
>>> key: dummy value
>>> key.000: token, (0001, value4)
>>> key.001: token, (0010, value1), (0011, value2)
>>> key.01: token
>>> key.1: token, (1011, value3)
>>> you would have:
>>> "void": dummy value
>>> "void".000: token, (0001, value4)
>>> "void".001: token, (0010, value1), (0011, value2)
>>> "void".01: token
>>> "void".1: token, (1011, value3)
>>>
>>> Iterable<KV<K, V>> entriesUntil(K limit) translates into walking the the
>>> prefixes until you find a common prefix for K and then filter for values
>>> where they have a sort key <= K. Using the example above, to find
>>> entriesUntil(0010) you would:
>>> look for key."", miss
>>> look for key.0, miss
>>> look for key.00, miss
>>> look for key.000, hit, sort all contained values using secondary key,
>>> provide value4 to user
>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we sort
>>> all contained values using secondary key, filter out value2 and provide
>>> value1
>>>
>>> void removeUntil(K limit) also translates into walking the prefixes but
>>> instead we will clear them when we have a "hit" with some special logic for
>>> when the sort key is a prefix of the key. Used the example, to
>>> removeUntil(0010) you would:
>>> look for key."", miss
>>> look for key.0, miss
>>> look for key.00, miss
>>> look for key.000, hit, clear
>>> look for key.001, hit, notice that 001 is a prefix of 0010 so we sort
>>> all contained values using secondary key, store in memory all values that >
>>> 0010, clear and append values stored in memory.
>>>
>>> On Fri, May 24, 2019 at 10:36 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> Can you explain how fetching and deleting ranges of keys would work
>>>> with this data structure?
>>>>
>>>> On Fri, May 24, 2019 at 9:50 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> Reuven, for the example, I assume that we never want to store more
>>>>> then 2 values at a given sort key prefix, and if we do then we will create
>>>>> a new longer prefix splitting up the values based upon the sort key.
>>>>>
>>>>> Tuple representation in examples below is (key, sort key, value) and .
>>>>> is a character outside of the alphabet which can be represented by using an
>>>>> escaping encoding that wraps the key + sort key encoding.
>>>>>
>>>>> To insert (key, 0010, value1), we lookup "key" + all the prefixes of
>>>>> 0010 finding one that is not empty. In this case its 0, so we append value
>>>>> to the map at key.0 ending up with (we also set the key to any dummy value
>>>>> to know that it it contains values):
>>>>> key: dummy value
>>>>> key."": token, (0010, value1)
>>>>> Now we insert (key, 0011, value2), we again lookup "key" + all the
>>>>> prefixes of 0010, finding "", so we append value2 to key."" ending up with:
>>>>> key: dummy value
>>>>> key."": token, (0010, value1), (0011, value2)
>>>>> Now we insert (key, 1011, value3), we again lookup "key" + all the
>>>>> prefixes of 1011 finding "" but notice that it is full, so we partition all
>>>>> the values into two prefixes 0 and 1. We also clear the "" prefix ending up
>>>>> with:
>>>>> key: dummy value
>>>>> key.0: token, (0010, value1), (0011, value2)
>>>>> key.1: token, (1011, value3)
>>>>> Now we insert (key, 0001, value4), we again lookup "key" + all the
>>>>> prefixes of the value finding 0 but notice that it is full, so we partition
>>>>> all the values into two prefixes 00 and 01 but notice this doesn't help us
>>>>> since 00 will be too full so we split 00 again to 000, 001. We also clear
>>>>> the 0 prefix ending up with:
>>>>> key: dummy value
>>>>> key.000: token, (0001, value4)
>>>>> key.001: token, (0010, value1), (0011, value2)
>>>>> key.01: token
>>>>> key.1: token, (1011, value3)
>>>>>
>>>>> We are effectively building a trie[1] where we only have values at the
>>>>> leaves and control how full each leaf can be. There are other trie
>>>>> representations like a radix tree that may be better.
>>>>>
>>>>> Looking up the values in sorted order for "key" would go like this:
>>>>> Is key set, yes
>>>>> look for key."", miss
>>>>> look for key.0, miss
>>>>> look for key.00, miss
>>>>> look for key.000, hit, sort all contained values using secondary key,
>>>>> provide value4 to user
>>>>> look for key.001, hit, sort all contained values using secondary key,
>>>>> provide value1 followed by value2 to user
>>>>> look for key.01, hit, empty, return no values to user
>>>>> look for key.1, hit, sort all contained values using secondary key,
>>>>> provide value3 to user
>>>>> we have walked the entire prefix space, signal end of iterable
>>>>>
>>>>> Some notes for the above:
>>>>> * The dummy value is used to know that the key contains values and the
>>>>> token is to know whether there are any values deeper in the trie so when we
>>>>> know when to stop searching.
>>>>> * If we can recalculate the sort key from the combination of the key
>>>>> and value, then we don't need to store it.
>>>>> * Keys with lots of values will perform worse then keys with less
>>>>> values since we have to look up more keys but they will be empty reads. The
>>>>> number of misses can be controlled by how many elements we are willing to
>>>>> store at a given node before we subdivide.
>>>>>
>>>>> In reality you could build a lot of structures (e.g. red black tree,
>>>>> binary tree) using the sort key, the issue is the cost of
>>>>> rebalancing/re-organizing the structure in map form and whether it has a
>>>>> convenient pre-order traversal for lookups.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 24, 2019 at 8:14 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> Some great comments!
>>>>>>
>>>>>> *Aljoscha*: absolutely this would have to be implemented by runners
>>>>>> to be efficient. We can of course provide a default (inefficient)
>>>>>> implementation, but ideally runners would provide better ones.
>>>>>>
>>>>>> *Jan* Exactly. I think MapState can be dropped or backed by this.
>>>>>> E.g.
>>>>>>
>>>>>> *Robert* Great point about standard coders not satisfying this.
>>>>>> That's why I suggested that we provide a way to tag the coders that do
>>>>>> preserve order, and only accept those as key coders Alternatively we could
>>>>>> present a more limited API - e.g. only allowing a hard-coded set of types
>>>>>> to be used as keys - but that seems counter to the direction Beam usually
>>>>>> goes. So users will have two ways .of creating multimap state specs:
>>>>>>
>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>> StateSpecs.multimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>
>>>>>> or
>>>>>>    private final StateSpec<MultimapState<Long, String>> state =
>>>>>> StateSpecs.orderedMultimap(VarLongCoder.of(), StringUtf8Coder.of());
>>>>>>
>>>>>> The second one will validate that the key coder preserves order, and
>>>>>> fails otherwise (similar to coder determinism checking in GroupByKey). (BTW
>>>>>> we would also have versions of these functions that use coder inference to
>>>>>> "guess" the coder, but those will do the same checking)
>>>>>>
>>>>>> Also the API I proposed did support random access! We could separate
>>>>>> out OrderedBagState again if we think the use cases are fundamentally
>>>>>> different. I merged the proposal into that of MultimapState because there
>>>>>> seemed be 99% overlap.
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>> On Fri, May 24, 2019 at 6:19 AM Robert Bradshaw <ro...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Fri, May 24, 2019 at 5:32 AM Reuven Lax <re...@google.com> wrote:
>>>>>>> >
>>>>>>> > On Thu, May 23, 2019 at 1:53 PM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Thu, May 23, 2019 at 1:38 PM Lukasz Cwik <lc...@google.com>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On Thu, May 23, 2019 at 11:37 AM Rui Wang <ru...@google.com>
>>>>>>> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> A few obvious problems with this code:
>>>>>>> >>>>>   1. Removing the elements already processed from the bag
>>>>>>> requires clearing and rewriting the entire bag. This is O(n^2) in the
>>>>>>> number of input trades.
>>>>>>> >>>>
>>>>>>> >>>> why it's not O(2 * n) to clearing and rewriting trade state?
>>>>>>> >>>>
>>>>>>> >>>>>
>>>>>>> >>>>> public interface SortedMultimapState<K, V> extends State {
>>>>>>> >>>>>   // Add a value to the map.
>>>>>>> >>>>>   void put(K key, V value);
>>>>>>> >>>>>   // Get all values for a given key.
>>>>>>> >>>>>   ReadableState<Iterable<V>> get(K key);
>>>>>>> >>>>>  // Return all entries in the map.
>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> allEntries();
>>>>>>> >>>>>   // Return all entries in the map with keys <= limit.
>>>>>>> returned elements are sorted by the key.
>>>>>>> >>>>>   ReadableState<Iterable<KV<K, V>>> entriesUntil(K limit);
>>>>>>> >>>>>
>>>>>>> >>>>>  // Remove all values with the given key;
>>>>>>> >>>>>   void remove(K key);
>>>>>>> >>>>>  // Remove all entries in the map with keys <= limit.
>>>>>>> >>>>>   void removeUntil(K limit);
>>>>>>> >>>>
>>>>>>> >>>> Will removeUntilExcl(K limit) also useful? It will remove all
>>>>>>> entries in the map with keys < limit.
>>>>>>> >>>>
>>>>>>> >>>>>
>>>>>>> >>>>> Runners will sort based on the encoded value of the key. In
>>>>>>> order to make this easier for users, I propose that we introduce a new tag
>>>>>>> on Coders PreservesOrder. A Coder that contains this tag guarantees that
>>>>>>> the encoded value preserves the same ordering as the base Java type.
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> Could you clarify what is  "encoded value preserves the same
>>>>>>> ordering as the base Java type"?
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Lets say A and B represent two different instances of the same
>>>>>>> Java type like a double, then A < B (using the languages comparison
>>>>>>> operator) iff encode(A) < encode(B) (note the encoded versions are compared
>>>>>>> lexicographically)
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> Since coders are shared across SDKs, do we expect A < B iff e(A)
>>>>>>> < e(P) property to hold for all languages we support? What happens A, B
>>>>>>> sort differently in different languages?
>>>>>>> >
>>>>>>> >
>>>>>>> > That would have to be the property of the coder (which means that
>>>>>>> this property probably needs to be represented in the portability
>>>>>>> representation of the coder). I imagine the common use cases will be for
>>>>>>> simple coders like int, long, string, etc., which are likely to sort the
>>>>>>> same in most languages.
>>>>>>>
>>>>>>> The standard coders for both double and integral types do not respect
>>>>>>> the natural ordering (consider negative values). KV coders violate
>>>>>>> the
>>>>>>> "natural" lexicographic ordering on components as well. I think
>>>>>>> implicitly sorting on encoded value would yield many surprises. (The
>>>>>>> state, of course, could take a order-preserving, bytes
>>>>>>> (string?)-producing callable as a parameter of course). (As for
>>>>>>> naming, I'd probably call this OrderedBagState or something like
>>>>>>> that...rather than Map which tends to imply random access.)
>>>>>>>
>>>>>>