You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Jérémie Bigras-Dunberry (Jira)" <ji...@apache.org> on 2021/07/31 20:09:00 UTC

[jira] [Updated] (BEAM-12701) Converting two deferred dataframes to csv in the same pipeline causes PCollection label collision

     [ https://issues.apache.org/jira/browse/BEAM-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jérémie Bigras-Dunberry updated BEAM-12701:
-------------------------------------------
    Summary: Converting two deferred dataframes  to csv in the same pipeline causes PCollection label collision  (was: Converting two dataframe  to_csv in the same pipeline causes PCollection label collision)

> Converting two deferred dataframes  to csv in the same pipeline causes PCollection label collision
> --------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-12701
>                 URL: https://issues.apache.org/jira/browse/BEAM-12701
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-common
>    Affects Versions: 2.31.0
>            Reporter: Jérémie Bigras-Dunberry
>            Priority: P2
>
>  
> If you use  the to_csv of the DeferredDataFrame twice in a single pipeline like this : 
> {code:java}
> df1 = pd.DataFrame.from_records({"a":"b"}, index=[0])
> df2 = pd.DataFrame.from_records({"a":"b"}, index=[0])
> with beam.Pipeline() as p:
>  df1 = to_dataframe(to_pcollection(df1, pipeline=p), label="df1")
>  df2 = to_dataframe(to_pcollection(df2, pipeline=p), label="df2")
>  df1.to_csv("test.csv")
>  df2.to_csv("test2.csv"){code}
> You get this error on the second to_csv call
>  
> {code:java}
> RuntimeError: A transform with label "ToPCollection(df)" already exists in the pipeline. To apply a transform with a specified label write pvalue | "label" >> transform
> {code}
> I think it comes from the fact that to_csv  is calling a  to_pcollection without any label, causing to infer an identical label for both to_csv function calls. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)