user@spark.apache.org, 2023-01

You are viewing a plain text version of this content. The canonical link for it is here.

- Spark migration from 2.3 to 3.0.1 - posted by Shrikant Prasad <sh...@gmail.com> on 2023/01/02 12:38:14 UTC, 12 replies.
- Incorrect csv parsing when delimiter used within the data - posted by Saurabh Gulati <sa...@fedex.com.INVALID> on 2023/01/03 16:59:01 UTC, 4 replies.
- [SparkR] Compare datetime with Sys.time() throws error in R (>= 4.2.0) - posted by Vivek Atal <at...@yahoo.co.in.INVALID> on 2023/01/03 23:15:40 UTC, 0 replies.
- How to set a config for a single query? - posted by Felipe Pessoto <fe...@hotmail.com> on 2023/01/04 00:14:41 UTC, 3 replies.
- Got Error Creating permanent view in Postgresql through Pyspark code - posted by Vajiha Begum S A <va...@maestrowiz.com> on 2023/01/04 07:26:07 UTC, 3 replies.
- Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data - posted by Saurabh Gulati <sa...@fedex.com.INVALID> on 2023/01/04 10:30:06 UTC, 5 replies.
- [BUG?] How to handle with special characters or scape them on spark version 3.3.0? - posted by "Vieira, Thiago" <Th...@adidas.com.INVALID> on 2023/01/04 10:41:35 UTC, 0 replies.
- Re: [EXTERNAL] Re: Re: Incorrect csv parsing when delimiter used within the data - posted by Shay Elbaz <sh...@gm.com> on 2023/01/04 13:54:30 UTC, 1 replies.
- Spark Yarn UI shows high usedOnHeapStorageMemory than totalOnHeapStorageMemory - posted by vindhya g <vn...@gmail.com> on 2023/01/05 05:33:38 UTC, 0 replies.
- GPU Support - posted by K B M Kaala Subhikshan <kb...@gmail.com> on 2023/01/05 07:16:39 UTC, 1 replies.
- Spark reading from HBase using hbase-connectors - any benefit from localization? - posted by Aaron Grubb <aa...@kaden.ai> on 2023/01/05 09:34:09 UTC, 4 replies.
- [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject - posted by Oliver Ruebenacker <ol...@broadinstitute.org> on 2023/01/06 14:59:29 UTC, 5 replies.
- [pyspark/sparksql]: How to overcome redundant/repetitive code? Is a for loop over an sql statement with a variable a bad idea? - posted by Joris Billen <jo...@bigindustries.be> on 2023/01/06 21:19:01 UTC, 1 replies.
- Re: Hive 3 has big performance improvement from my test - posted by Mich Talebzadeh <mi...@gmail.com> on 2023/01/07 20:12:01 UTC, 1 replies.
- [pyspark/pandas] Pandas UDF accepting more than 2 pandas dataframe when cogroup + applyInPandas? - posted by "pzm6391@hotmail.com" <pz...@hotmail.com> on 2023/01/10 19:49:07 UTC, 0 replies.
- [UNSUBSCRIBE] - posted by Sebastian Schere <ss...@gmail.com> on 2023/01/11 13:33:45 UTC, 0 replies.
- unsubscribe - posted by Sebastian Schere <ss...@gmail.com> on 2023/01/11 13:34:06 UTC, 3 replies.
- pyspark.sql.dataframe.DataFrame versus pyspark.pandas.frame.DataFrame - posted by "second_comet@yahoo.com.INVALID" <se...@yahoo.com.INVALID> on 2023/01/13 03:54:40 UTC, 1 replies.
- [Spark SQL] Data duplicate or data lost with non-deterministic function - posted by 李建伟 <le...@126.com> on 2023/01/15 04:07:01 UTC, 0 replies.
- Is there any Job/Career channel - posted by Chetan Khatri <ch...@gmail.com> on 2023/01/16 02:20:11 UTC, 0 replies.
- Running Google Dataproc on Google Kubernetes Engine (GKE) with Spark - posted by Mich Talebzadeh <mi...@gmail.com> on 2023/01/16 15:05:22 UTC, 1 replies.
- [PySPark] How to check if value of one column is in array of another column - posted by Oliver Ruebenacker <ol...@broadinstitute.org> on 2023/01/17 22:18:11 UTC, 2 replies.
- How to check the liveness of a SparkSession - posted by Yeachan Park <ye...@gmail.com> on 2023/01/19 13:59:25 UTC, 0 replies.
- [Spark Standalone Mode] How to read from kerberised HDFS in spark standalone mode - posted by "Bansal, Jaimita" <Ja...@gs.com> on 2023/01/19 23:35:29 UTC, 0 replies.
- Writing protobuf RDD to parquet - posted by David Diebold <da...@gmail.com> on 2023/01/20 09:27:10 UTC, 0 replies.
- Table created with saveAsTable behaves differently than a table created with spark.sql("CREATE TABLE....) - posted by krexos <kr...@protonmail.com.INVALID> on 2023/01/21 12:02:26 UTC, 2 replies.
- Any advantages of using sql.adaptive.autoBroadcastJoinThreshold over sql.autoBroadcastJoinThreshold? - posted by Soumyadeep Mukhopadhyay <so...@gmail.com> on 2023/01/22 13:05:41 UTC, 1 replies.
- Duplicates in Collaborative Filtering Output - posted by Kartik Ohri <ka...@gmail.com> on 2023/01/23 07:39:49 UTC, 1 replies.
- Unsubscribe - posted by Calum <cj...@gmail.com> on 2023/01/23 08:53:58 UTC, 1 replies.
- Re: Dynamic Scaling without Kubernetes - posted by Mich Talebzadeh <mi...@gmail.com> on 2023/01/23 17:47:36 UTC, 0 replies.
- Question regarding Spark 3.X performance - posted by Athanasios Kordelas <at...@gmail.com> on 2023/01/26 09:10:48 UTC, 9 replies.
- Spark SQL question - posted by Kohki Nishio <ta...@gmail.com> on 2023/01/27 23:34:50 UTC, 2 replies.
- spark+kafka+dynamic resource allocation - posted by Lingzhe Sun <li...@hirain.com> on 2023/01/28 06:17:40 UTC, 5 replies.
- Fwd: Spark-submit doesn't load all app classes in the classpath - posted by Soheil Pourbafrani <so...@gmail.com> on 2023/01/28 22:25:48 UTC, 0 replies.
- Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast - posted by "Jain, Sanchi" <Sa...@comcast.com.INVALID> on 2023/01/30 15:12:53 UTC, 2 replies.
- [Spark/deeplyR] how come spark is caching tables read through jdbc connection from oracle, even when memory=false is chosen - posted by Joris Billen <jo...@bigindustries.be> on 2023/01/31 15:35:35 UTC, 0 replies.