You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Alex Kosberg <Al...@rbbn.com> on 2022/03/30 11:31:37 UTC

spark ETL and spark thrift server running together

Hi,
Some details:
*         Spark SQL (version 3.2.1)
*         Driver: Hive JDBC (version 2.3.9)
*         ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads
*         BI tool is connect via odbc driver
After activating Spark Thrift Server I'm unable to run pyspark script using spark-submit as they both use the same metastore_db
error:
Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3acaa384<ma...@3acaa384>, see the next exception for details.
        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source)
        ... 140 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /tmp/metastore_db.

I need to be able to run PySpark (Spark ETL) while having spark thrift server up for BI tool queries. Any workaround for it?
Thanks!


Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.

RE: [EXTERNAL] Re: spark ETL and spark thrift server running together

Posted by Alex Kosberg <Al...@rbbn.com>.

Hi Christophe,
Thank you for the explanation!

Regards,
Alex

From: Christophe Préaud <ch...@kelkoogroup.com>
Sent: Wednesday, March 30, 2022 3:43 PM
To: Alex Kosberg <Al...@rbbn.com>; user@spark.apache.org
Subject: [EXTERNAL] Re: spark ETL and spark thrift server running together

Hi Alex,

As stated in the Hive documentation (https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration<https://clicktime.symantec.com/3UA3CcaMQzi5nnnSG5p8sNw6H4?u=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FHive%2FAdminManual%2BMetastore%2BAdministration>):

An embedded metastore database is mainly used for unit tests. Only one process can connect to the metastore database at a time, so it is not really a practical solution but works well for unit tests.

You need to set up a remote metastore database (e.g. MariaDB / MySQL) for production use.

Regards,
Christophe.

On 3/30/22 13:31, Alex Kosberg wrote:
Hi,
Some details:
1.       Spark SQL (version 3.2.1)
2.       Driver: Hive JDBC (version 2.3.9)
3.       ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads
4.       BI tool is connect via odbc driver
After activating Spark Thrift Server I'm unable to run pyspark script using spark-submit as they both use the same metastore_db
error:
Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3acaa384<ma...@3acaa384>, see the next exception for details.
        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source)
        ... 140 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /tmp/metastore_db.

I need to be able to run PySpark (Spark ETL) while having spark thrift server up for BI tool queries. Any workaround for it?
Thanks!

Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.

Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.

Re: spark ETL and spark thrift server running together

Posted by Christophe Préaud <ch...@kelkoogroup.com>.

Hi Alex,

As stated in the Hive documentation (https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration):

*An embedded metastore database is mainly used for unit tests. Only one process can connect to the metastore database at a time, so it is not really a practical solution but works well for unit tests.*


You need to set up a remote metastore database (e.g. MariaDB / MySQL) for production use.

Regards,
Christophe.

On 3/30/22 13:31, Alex Kosberg wrote:
>
> Hi,
>
> Some details:
>
> ·         Spark SQL (version 3.2.1)
>
> ·         Driver: Hive JDBC (version 2.3.9)
>
> ·         ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads
>
> ·         BI tool is connect via odbc driver
>
> After activating Spark Thrift Server I'm unable to run pyspark script using spark-submit as they both use the same metastore_db
>
> error:
>
> Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3acaa384, see the next exception for details.
>
>         at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
>
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source)
>
>         ... 140 more
>
> Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /tmp/metastore_db.
>
>  
>
> I need to be able to run PySpark (Spark ETL) while having spark thrift server up for BI tool queries. Any workaround for it?
>
> Thanks!
>
>  
>
>
> Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.