You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Zong-han, Xie" <ic...@gmail.com> on 2020/01/29 13:24:05 UTC

union two pyspark dataframes from different SparkSessions

Dear all

I already had a python function which is used to query data from HBase and
HDFS with given parameters. This function returns a pyspark dataframe and
the SparkContext it used. 

With client's increasing demands, I need to merge data from multiple query.
I tested using "union" function to merge the pyspark dataframes returned by
different function calls directly and it worked. This surprised me that
pyspark dataframe can actually union dataframes from different SparkSession. 

I am using pyspark 2.3.1 and Python 3.5.

I wonder if this is a good practice or I better use same SparkSession for
all the query?

Best regards



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Re: union two pyspark dataframes from different SparkSessions

Posted by "Zong-han, Xie" <ic...@gmail.com>.
Dear Yeikel

I checked my code and it uses getOrCreate to create a SparkSession.
Therefore, I should be retrieving the same SparkSession instance everytime I
call that method.

Thanks for your reminding.

Best regard



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: union two pyspark dataframes from different SparkSessions

Posted by yeikel valdes <em...@yeikel.com>.
From what I understand, the session is a singleton so even if you think you are creating new instances you are just reusing it. 




---- On Wed, 29 Jan 2020 02:24:05 -1100 icbm0926@gmail.com wrote ----


Dear all

I already had a python function which is used to query data from HBase and
HDFS with given parameters. This function returns a pyspark dataframe and
the SparkContext it used.

With client's increasing demands, I need to merge data from multiple query.
I tested using "union" function to merge the pyspark dataframes returned by
different function calls directly and it worked. This surprised me that
pyspark dataframe can actually union dataframes from different SparkSession.

I am using pyspark 2.3.1 and Python 3.5.

I wonder if this is a good practice or I better use same SparkSession for
all the query?

Best regards



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org