You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2021/08/06 22:08:38 UTC

Is operation subtracting two dataframe valid.

I am using Google Kubernetes Cluster with the docker image that I built
with PySpark 3.1.1 on prem and pushed the docker image to a  google
repository.


The py module generates some 100 rows of Random data and then writes it to
a BigQuery table.


Both write to and subsequent read  from BigQuery table show the correct
number of rows:


 Populated BigQuery table test.randomData

 rows written is  100

 Reading from BigQuery table test.randomData

 rows read in is  100

However, the following operation fails

       if df2.subtract(read_df).count() == 0:

            print("Data has been loaded OK to Oracle table")

        else:

            print("Data could not be loaded to Oracle table, quitting")

            sys.exit(1)


21/08/06 21:58:45 WARN org.apache.spark.scheduler.TaskSetManager: Lost task
0.0 in stage 8.0 (TID 11) (10.64.2.15 executor 1):
java.lang.UnsupportedOperationException: sun.misc.Unsafe or
java.nio.DirectByteBuffer.<init>(long, int) not available
        at
com.google.cloud.spark.bigquery.repackaged.io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:490)
        at
com.google.cloud.spark.bigquery.repackaged.io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:257)
        at
com.google.cloud.spark.bigquery.repackaged.io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:247)

Further down it shows:


py4j.protocol.Py4JJavaError: An error occurred while calling o116.count.

: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
tree:

OK this may be specific to BigQuery because as I rtecall this operation
could be done against an Oracle table.

Thanks

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.