You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tanveer Ahmad - EWI <T....@tudelft.nl> on 2020/06/11 15:05:55 UTC

Re: Arrow RecordBatches/Pandas Dataframes to (Arrow enabled) Spark Dataframe conversion in streaming fashion

Hi Jorge,


Thank you. This union function is better alternative for my work.


Regards,
Tanveer Ahmad


________________________________
From: Jorge Machado <jo...@me.com>
Sent: Monday, May 25, 2020 3:56:04 PM
To: Tanveer Ahmad - EWI
Cc: Spark Group
Subject: Re: Arrow RecordBatches/Pandas Dataframes to (Arrow enabled) Spark Dataframe conversion in streaming fashion

Hey, from what I know you can try to Union them df.union(df2)

Not sure if this is what you need

On 25. May 2020, at 13:53, Tanveer Ahmad - EWI <T....@tudelft.nl>> wrote:

Hi all,

I need some help regarding Arrow RecordBatches/Pandas Dataframes to (Arrow enabled) Spark Dataframe conversions.
Here the example explains very well how to convert a single Pandas Dataframe to Spark Dataframe [1].

But in my case, some external applications are generating Arrow RecordBatches in my PySpark application in streaming fashion. Each time I receive an Arrow RB, I want to transfer/append it to a Spark Dataframe. So is it possible to create a Spark Dataframe initially from one Arrow RecordBatch and then start appending many other in-coming Arrow RecordBatches to that Spark Dataframe (like in streaming fashion)? Thanks!

I saw another example [2] in which all the Arrow RB are being converted to Spark Dataframe but my case is little bit different than this.

[1] https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html
<https://urldefense.proofpoint.com/v2/url?u=https-3A__spark.apache.org_docs_latest_sql-2Dpyspark-2Dpandas-2Dwith-2Darrow.html&d=DwMFAg&c=XYzUhXBD2cD-CornpT4QE19xOJBbRy-TBPLK0X9U2o8&r=0FbbJetCCSYzJEnEDCQ1rNv76vTL6SUFCukKhvNosPs&m=xLLkEi9tBNJDvNZQODXpsgs_2R6gQJZ9fIVJ2OI8Fbo&s=90ADpthmsE8fIpWHhnv3eoGVZ2nZ_s4zxYnkbRJ-THo&e=>
[2] https://gist.github.com/linar-jether/7dd61ed6fa89098ab9c58a1ab428b2b5<https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_linar-2Djether_7dd61ed6fa89098ab9c58a1ab428b2b5&d=DwMFAg&c=XYzUhXBD2cD-CornpT4QE19xOJBbRy-TBPLK0X9U2o8&r=0FbbJetCCSYzJEnEDCQ1rNv76vTL6SUFCukKhvNosPs&m=xLLkEi9tBNJDvNZQODXpsgs_2R6gQJZ9fIVJ2OI8Fbo&s=5jjdBETHcRtM5hSZSN5zEwQ2G3dilNBrcBrcEJYk6nk&e=>

---
Regards,
Tanveer Ahmad