You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "vaibhav-bhadade (via GitHub)" <gi...@apache.org> on 2023/04/04 09:24:55 UTC

[GitHub] [arrow] vaibhav-bhadade opened a new issue, #34876: Enabling for Conversion to/from R DataFrame, dapply and gapply failed

vaibhav-bhadade opened a new issue, #34876:
URL: https://github.com/apache/arrow/issues/34876

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   we are doing sparkR testing on ppc64le platform while testing we have observed that "  Enabling for Conversion to/from R DataFrame, dapply and gapply " failed  with apache arrow 11.0.0 . 
   
   steps followed from below link. (https://spark.apache.org/docs/latest/sparkr.html)
   
   same issue is obseerved on x86 also. 
   
   Steps followed are as below: 
      1  yum install wget
       2  wget https://repo.anaconda.com/archive/Anaconda3-2023.03-Linux-ppc64le.sh
       3  bash Anaconda3-2023.03-Linux-ppc64le.sh
       4  export PATH=$PATH:/opt/anaconda3/bin/
       5  conda install -c conda-forge r-base
       6  R
       7  cd /opt/
       8  wget https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz
       9  tar xvf spark-3.3.2-bin-hadoop3.tgz
      10  ln -s  /opt/spark-3.3.2-bin-hadoop3 /opt/spark
      11  useradd spark
      12  chown -R spark:spark /opt/spark*
      13  export SPARK_HOME=/opt/spark
      14  export PATH=$PATH:$SPARK_HOME/bin
      15  Rscript -e 'install.packages("arrow", repos="https://cloud.r-project.org/")   
      16  sparkR
       > sparkR.session(master = "local[*]",
                  sparkConfig = list(spark.sql.execution.arrow.sparkr.enabled = "true"))
   
       > spark_df <- createDataFrame(mtcars)
   
       > collect(spark_df)
   
        > collect(dapply(spark_df, function(rdf) { data.frame(rdf$gear + 1) }, structType("gear double")))
   
   
   [root@APIC1 ~]# /opt/spark/bin/sparkR
   
   R version 4.2.2 (2022-10-31) -- "Innocent and Trusting"
   Copyright (C) 2022 The R Foundation for Statistical Computing
   Platform: powerpc64le-redhat-linux-gnu (64-bit)
   
   R is free software and comes with ABSOLUTELY NO WARRANTY.
   You are welcome to redistribute it under certain conditions.
   Type 'license()' or 'licence()' for distribution details.
   
     Natural language support but running in an English locale
   
   R is a collaborative project with many contributors.
   Type 'contributors()' for more information and
   'citation()' on how to cite R or R packages in publications.
   
   Type 'demo()' for some demos, 'help()' for on-line help, or
   'help.start()' for an HTML browser interface to help.
   Type 'q()' to quit R.
   
   [Previously saved workspace restored]
   
   Launching java with spark-submit command /opt/spark//bin/spark-submit   "sparkr-shell" /tmp/RtmppUtJ0H/backend_port5e39b37886d1c
   
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
   23/04/04 02:19:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 3.3.2
         /_/
   
   
   SparkSession Web UI available at http://APIC1.fyre.ibm.com:4040
   SparkSession available as 'spark'(master = local[*], app id = local-1680599985735).
   >
   > sparkR.session(master = "local[*]",
   +                sparkConfig = list(spark.sql.execution.arrow.sparkr.enabled = "true"))
   Java ref type org.apache.spark.sql.SparkSession id 1
   >
   > spark_df <- createDataFrame(mtcars)
   > collect(spark_df)
       mpg cyl  disp  hp drat    wt  qsec vs am gear carb
   1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
   2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
   3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
   4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
   5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
   6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
   7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
   8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
   9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
   10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
   11 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
   12 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
   13 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
   14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
   15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
   16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
   17 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
   18 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
   19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
   20 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
   21 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
   22 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
   23 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
   24 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
   25 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
   26 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
   27 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
   28 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
   29 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
   30 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
   31 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
   32 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
   > collect(dapply(spark_df, function(rdf) { data.frame(rdf$gear + 1) }, structType("gear double")))
   23/04/04 02:20:22 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
   23/04/04 02:20:27 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)/ 1]
   org.apache.spark.SparkException: R unexpectedly exited.
   R worker produced errors: Error : 'write_arrow' is not an exported object from 'namespace:arrow'
   
           at org.apache.spark.api.r.BaseRRunner$ReaderIterator$$anonfun$1.applyOrElse(BaseRRunner.scala:144)
           at org.apache.spark.api.r.BaseRRunner$ReaderIterator$$anonfun$1.applyOrElse(BaseRRunner.scala:137)
           at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
           at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:194)
           at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:123)
           at org.apache.spark.api.r.BaseRRunner$ReaderIterator.hasNext(BaseRRunner.scala:113)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.hasNext(ArrowConverters.scala:99)
           at scala.collection.Iterator.foreach(Iterator.scala:943)
           at scala.collection.Iterator.foreach$(Iterator.scala:943)
           at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.foreach(ArrowConverters.scala:97)
           at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
           at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
           at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
           at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
           at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.to(ArrowConverters.scala:97)
           at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
           at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
           at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.toBuffer(ArrowConverters.scala:97)
           at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
           at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
           at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.toArray(ArrowConverters.scala:97)
           at org.apache.spark.sql.Dataset.$anonfun$collectAsArrowToR$3(Dataset.scala:3763)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   Caused by: java.io.EOFException
           at java.io.DataInputStream.readInt(DataInputStream.java:392)
           at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:154)
           ... 30 more
   23/04/04 02:20:27 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1) (APIC1.fyre.ibm.com executor driver): org.apache.spark.SparkException: R unexpectedly exited.
   R worker produced errors: Error : 'write_arrow' is not an exported object from 'namespace:arrow'
   
           at org.apache.spark.api.r.BaseRRunner$ReaderIterator$$anonfun$1.applyOrElse(BaseRRunner.scala:144)
           at org.apache.spark.api.r.BaseRRunner$ReaderIterator$$anonfun$1.applyOrElse(BaseRRunner.scala:137)
           at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
           at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:194)
           at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:123)
           at org.apache.spark.api.r.BaseRRunner$ReaderIterator.hasNext(BaseRRunner.scala:113)
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.hasNext(ArrowConverters.scala:99)
           at scala.collection.Iterator.foreach(Iterator.scala:943)
           at scala.collection.Iterator.foreach$(Iterator.scala:943)
           at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.foreach(ArrowConverters.scala:97)
           at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
           at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
           at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
           at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
           at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
           at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.to(ArrowConverters.scala:97)
           at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
           at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
           at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.toBuffer(ArrowConverters.scala:97)
           at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
           at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
           at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.toArray(ArrowConverters.scala:97)
           at org.apache.spark.sql.Dataset.$anonfun$collectAsArrowToR$3(Dataset.scala:3763)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   Caused by: java.io.EOFException
           at java.io.DataInputStream.readInt(DataInputStream.java:392)
           at org.apache.spark.sql.execution.r.ArrowRRunner$$anon$2.read(ArrowRRunner.scala:154)
           ... 30 more
   
   23/04/04 02:20:27 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
   Error in readBin(con, raw(), as.integer(dataLen), endian = "big") :
     invalid 'n' argument
   >
   
   
   
   
   ### Component(s)
   
   C++, R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nealrichardson closed issue #34876: Enabling for Conversion to/from R DataFrame, dapply and gapply failed

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.

nealrichardson closed issue #34876: Enabling for Conversion to/from R DataFrame, dapply and gapply failed 
URL: https://github.com/apache/arrow/issues/34876


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] thisisnic commented on issue #34876: Enabling for Conversion to/from R DataFrame, dapply and gapply failed

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.

thisisnic commented on issue #34876:
URL: https://github.com/apache/arrow/issues/34876#issuecomment-1495694968

   Hi @vaibhav-bhadade, thanks for reporting this.  I don't know exactly what's going on here as I'm not at all familiar with the internal of SparkR; you may get more useful answers by posting this issue there instead.
   
   I do see that you get the error `Error : 'write_arrow' is not an exported object from 'namespace:arrow'`.  You might be working with a version of SparkR which calls the function `arrow::write_arrow()` which was deprecated quite a while ago. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] vaibhav-bhadade commented on issue #34876: Enabling for Conversion to/from R DataFrame, dapply and gapply failed

Posted by "vaibhav-bhadade (via GitHub)" <gi...@apache.org>.

vaibhav-bhadade commented on issue #34876:
URL: https://github.com/apache/arrow/issues/34876#issuecomment-1498459165

   > @thisisnic is right, see also #33758, which sounds the same.
   
   yes it is 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nealrichardson commented on issue #34876: Enabling for Conversion to/from R DataFrame, dapply and gapply failed

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.

nealrichardson commented on issue #34876:
URL: https://github.com/apache/arrow/issues/34876#issuecomment-1497577810

   @thisisnic is right, see also #33758, which sounds the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org