You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Godefroy Clair <go...@gmail.com> on 2023/03/13 18:32:10 UTC

Why is FlatMap different from composing Flatten and Map?

Hi,
I am wondering about the way `Flatten()` and `FlatMap()` are implemented in
Apache Beam Python.
In most functional languages, FlatMap() is the same as composing
`Flatten()` and `Map()` as indicated by the name, so Flatten() and
Flatmap() have the same input.
But in Apache Beam, Flatten() is using _iterable of PCollections_ while
FlatMap() is working with _PCollection of Iterables_.

If I am not wrong, the signature of Flatten, Map and FlatMap are :
```
Flatten:: Iterable[PCollections[A]] -> PCollection[A]
Map:: (PCollection[A], (A-> B)) -> PCollection[B]
FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A]
```

So my question is is there another "Flatten-like" function  with this
signature :
```
anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A]
```

One of the reason this would be useful, is that when you just want to
"flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an
identity function.

So instead of writing:
`FlatMap(lambda e: e)`
I would like to use a function
`anotherFlatten()`

Thanks,
Godefroy

Re: Why is FlatMap different from composing Flatten and Map?

Posted by Robert Bradshaw via user <us...@beam.apache.org>.
On Mon, Mar 13, 2023 at 11:33 AM Godefroy Clair <go...@gmail.com>
wrote:

> Hi,
> I am wondering about the way `Flatten()` and `FlatMap()` are implemented
> in Apache Beam Python.
> In most functional languages, FlatMap() is the same as composing
> `Flatten()` and `Map()` as indicated by the name, so Flatten() and
> Flatmap() have the same input.
> But in Apache Beam, Flatten() is using _iterable of PCollections_ while
> FlatMap() is working with _PCollection of Iterables_.
>
> If I am not wrong, the signature of Flatten, Map and FlatMap are :
> ```
> Flatten:: Iterable[PCollections[A]] -> PCollection[A]
> Map:: (PCollection[A], (A-> B)) -> PCollection[B]
> FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A]
>

FlatMap is actually (PCollection[A], (A->Iterable[B])) -> PCollection[B].


> ```
>
> So my question is is there another "Flatten-like" function  with this
> signature :
> ```
> anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A]
> ```
>
> One of the reason this would be useful, is that when you just want to
> "flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an
> identity function.
>
> So instead of writing:
> `FlatMap(lambda e: e)`
> I would like to use a function
> `anotherFlatten()`
>

As Reuven mentions, Beam's Flatten could have been called Union, in which
case we'd free up the name Flatten for the PCollection[Iterable[A]] ->
PCollection[A] operation. It's Flatten for historical reasons, and would be
difficult to change now.

FlumeJava uses static constructors to provide Flatten.Iterables:
PCollection[Iterable[A]] -> PCollection[A] vs.  Flatten.PCollections:
Iterable[PCollection[A]] -> PCollection[A].

If you want a FlattenIterables in Python, you could easily implement it as
a composite transform [2] whose implementation is passing the identity
function to FlatMap.

[1]
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Flatten.html
[2]
https://beam.apache.org/documentation/programming-guide/#composite-transforms

Re: Why is FlatMap different from composing Flatten and Map?

Posted by Alexey Romanenko <ar...@gmail.com>.
+CC people who might give more details on this.

—
Alexey

> On 13 Mar 2023, at 19:32, Godefroy Clair <go...@gmail.com> wrote:
> 
> Hi,
> I am wondering about the way `Flatten()` and `FlatMap()` are implemented in Apache Beam Python.
> In most functional languages, FlatMap() is the same as composing `Flatten()` and `Map()` as indicated by the name, so Flatten() and Flatmap() have the same input.
> But in Apache Beam, Flatten() is using _iterable of PCollections_ while FlatMap() is working with _PCollection of Iterables_.
> 
> If I am not wrong, the signature of Flatten, Map and FlatMap are :
> ```
> Flatten:: Iterable[PCollections[A]] -> PCollection[A]
> Map:: (PCollection[A], (A-> B)) -> PCollection[B]
> FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A]
> ```
> 
> So my question is is there another "Flatten-like" function  with this signature :
> ```
> anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A]
> ```
> 
> One of the reason this would be useful, is that when you just want to "flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an identity function.
> 
> So instead of writing:
> `FlatMap(lambda e: e)`
> I would like to use a function
> `anotherFlatten()`
> 
> Thanks,
> Godefroy


Re: Why is FlatMap different from composing Flatten and Map?

Posted by Reuven Lax via user <us...@beam.apache.org>.
In Apache Beam, Flatten is a union operation - it takes multiple
PCollections (of the same type) and merges them into a single PCollection.

On Mon, Mar 13, 2023 at 11:32 AM Godefroy Clair <go...@gmail.com>
wrote:

> Hi,
> I am wondering about the way `Flatten()` and `FlatMap()` are implemented
> in Apache Beam Python.
> In most functional languages, FlatMap() is the same as composing
> `Flatten()` and `Map()` as indicated by the name, so Flatten() and
> Flatmap() have the same input.
> But in Apache Beam, Flatten() is using _iterable of PCollections_ while
> FlatMap() is working with _PCollection of Iterables_.
>
> If I am not wrong, the signature of Flatten, Map and FlatMap are :
> ```
> Flatten:: Iterable[PCollections[A]] -> PCollection[A]
> Map:: (PCollection[A], (A-> B)) -> PCollection[B]
> FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A]
> ```
>
> So my question is is there another "Flatten-like" function  with this
> signature :
> ```
> anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A]
> ```
>
> One of the reason this would be useful, is that when you just want to
> "flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an
> identity function.
>
> So instead of writing:
> `FlatMap(lambda e: e)`
> I would like to use a function
> `anotherFlatten()`
>
> Thanks,
> Godefroy
>