You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Godefroy Clair <go...@gmail.com> on 2023/03/13 18:32:10 UTC
Why is FlatMap different from composing Flatten and Map?
Hi,
I am wondering about the way `Flatten()` and `FlatMap()` are implemented in
Apache Beam Python.
In most functional languages, FlatMap() is the same as composing
`Flatten()` and `Map()` as indicated by the name, so Flatten() and
Flatmap() have the same input.
But in Apache Beam, Flatten() is using _iterable of PCollections_ while
FlatMap() is working with _PCollection of Iterables_.
If I am not wrong, the signature of Flatten, Map and FlatMap are :
```
Flatten:: Iterable[PCollections[A]] -> PCollection[A]
Map:: (PCollection[A], (A-> B)) -> PCollection[B]
FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A]
```
So my question is is there another "Flatten-like" function with this
signature :
```
anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A]
```
One of the reason this would be useful, is that when you just want to
"flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an
identity function.
So instead of writing:
`FlatMap(lambda e: e)`
I would like to use a function
`anotherFlatten()`
Thanks,
Godefroy
Re: Why is FlatMap different from composing Flatten and Map?
Posted by Robert Bradshaw via user <us...@beam.apache.org>.
On Mon, Mar 13, 2023 at 11:33 AM Godefroy Clair <go...@gmail.com>
wrote:
> Hi,
> I am wondering about the way `Flatten()` and `FlatMap()` are implemented
> in Apache Beam Python.
> In most functional languages, FlatMap() is the same as composing
> `Flatten()` and `Map()` as indicated by the name, so Flatten() and
> Flatmap() have the same input.
> But in Apache Beam, Flatten() is using _iterable of PCollections_ while
> FlatMap() is working with _PCollection of Iterables_.
>
> If I am not wrong, the signature of Flatten, Map and FlatMap are :
> ```
> Flatten:: Iterable[PCollections[A]] -> PCollection[A]
> Map:: (PCollection[A], (A-> B)) -> PCollection[B]
> FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A]
>
FlatMap is actually (PCollection[A], (A->Iterable[B])) -> PCollection[B].
> ```
>
> So my question is is there another "Flatten-like" function with this
> signature :
> ```
> anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A]
> ```
>
> One of the reason this would be useful, is that when you just want to
> "flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an
> identity function.
>
> So instead of writing:
> `FlatMap(lambda e: e)`
> I would like to use a function
> `anotherFlatten()`
>
As Reuven mentions, Beam's Flatten could have been called Union, in which
case we'd free up the name Flatten for the PCollection[Iterable[A]] ->
PCollection[A] operation. It's Flatten for historical reasons, and would be
difficult to change now.
FlumeJava uses static constructors to provide Flatten.Iterables:
PCollection[Iterable[A]] -> PCollection[A] vs. Flatten.PCollections:
Iterable[PCollection[A]] -> PCollection[A].
If you want a FlattenIterables in Python, you could easily implement it as
a composite transform [2] whose implementation is passing the identity
function to FlatMap.
[1]
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Flatten.html
[2]
https://beam.apache.org/documentation/programming-guide/#composite-transforms
Re: Why is FlatMap different from composing Flatten and Map?
Posted by Alexey Romanenko <ar...@gmail.com>.
+CC people who might give more details on this.
—
Alexey
> On 13 Mar 2023, at 19:32, Godefroy Clair <go...@gmail.com> wrote:
>
> Hi,
> I am wondering about the way `Flatten()` and `FlatMap()` are implemented in Apache Beam Python.
> In most functional languages, FlatMap() is the same as composing `Flatten()` and `Map()` as indicated by the name, so Flatten() and Flatmap() have the same input.
> But in Apache Beam, Flatten() is using _iterable of PCollections_ while FlatMap() is working with _PCollection of Iterables_.
>
> If I am not wrong, the signature of Flatten, Map and FlatMap are :
> ```
> Flatten:: Iterable[PCollections[A]] -> PCollection[A]
> Map:: (PCollection[A], (A-> B)) -> PCollection[B]
> FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A]
> ```
>
> So my question is is there another "Flatten-like" function with this signature :
> ```
> anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A]
> ```
>
> One of the reason this would be useful, is that when you just want to "flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an identity function.
>
> So instead of writing:
> `FlatMap(lambda e: e)`
> I would like to use a function
> `anotherFlatten()`
>
> Thanks,
> Godefroy
Re: Why is FlatMap different from composing Flatten and Map?
Posted by Reuven Lax via user <us...@beam.apache.org>.
In Apache Beam, Flatten is a union operation - it takes multiple
PCollections (of the same type) and merges them into a single PCollection.
On Mon, Mar 13, 2023 at 11:32 AM Godefroy Clair <go...@gmail.com>
wrote:
> Hi,
> I am wondering about the way `Flatten()` and `FlatMap()` are implemented
> in Apache Beam Python.
> In most functional languages, FlatMap() is the same as composing
> `Flatten()` and `Map()` as indicated by the name, so Flatten() and
> Flatmap() have the same input.
> But in Apache Beam, Flatten() is using _iterable of PCollections_ while
> FlatMap() is working with _PCollection of Iterables_.
>
> If I am not wrong, the signature of Flatten, Map and FlatMap are :
> ```
> Flatten:: Iterable[PCollections[A]] -> PCollection[A]
> Map:: (PCollection[A], (A-> B)) -> PCollection[B]
> FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A]
> ```
>
> So my question is is there another "Flatten-like" function with this
> signature :
> ```
> anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A]
> ```
>
> One of the reason this would be useful, is that when you just want to
> "flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an
> identity function.
>
> So instead of writing:
> `FlatMap(lambda e: e)`
> I would like to use a function
> `anotherFlatten()`
>
> Thanks,
> Godefroy
>