user@spark.apache.org, 2022-01

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: Issue Communicating with Driver, RpcTimeoutException - posted by Gourav Sengupta <go...@gmail.com> on 2022/01/01 06:06:48 UTC, 4 replies.
- How to make batch filter - posted by Bitfox <bi...@bitfox.top> on 2022/01/01 20:57:59 UTC, 10 replies.
- Re: Pyspark debugging best practices - posted by David Diebold <da...@gmail.com> on 2022/01/03 08:39:22 UTC, 0 replies.
- Joining many tables Re: Pyspark debugging best practices - posted by Andrew Davidson <ae...@ucsc.edu.INVALID> on 2022/01/04 00:58:42 UTC, 1 replies.
- understanding iterator of series to iterator of series pandasUDF - posted by Nitin Siwach <ni...@gmail.com> on 2022/01/04 06:54:11 UTC, 2 replies.
- Query regarding kafka version - posted by Renu Yadav <yr...@gmail.com> on 2022/01/04 14:08:19 UTC, 0 replies.
- pyspark - posted by 流年以东” <25...@qq.com.INVALID> on 2022/01/05 09:01:53 UTC, 3 replies.
- Re: Spark 3.2 - ReusedExchange not present in join execution plan - posted by Abdeali Kothari <ab...@gmail.com> on 2022/01/05 16:10:43 UTC, 2 replies.
- Newbie pyspark memory mgmt question - posted by Andrew Davidson <ae...@ucsc.edu.INVALID> on 2022/01/05 23:26:59 UTC, 2 replies.
- JDBCConnectionProvider in Spark - posted by Artemis User <ar...@dtechspace.com> on 2022/01/06 03:25:09 UTC, 4 replies.
- spark metadata metastore bug ? - posted by Nicolas Paris <ni...@riseup.net> on 2022/01/06 15:51:30 UTC, 0 replies.
- Fwd: metastore bug when hive update spark table ? - posted by Mich Talebzadeh <mi...@gmail.com> on 2022/01/06 19:03:10 UTC, 0 replies.
- How to add a row number column with out reordering my data frame - posted by Andrew Davidson <ae...@ucsc.edu.INVALID> on 2022/01/07 01:13:15 UTC, 4 replies.
- Proposed additional function to create fold_column for better integration of Spark data frames with H2O - posted by Chester Gan <c....@gmail.com> on 2022/01/07 05:04:20 UTC, 4 replies.
- hive table with large column data size - posted by weoccc <we...@gmail.com> on 2022/01/09 06:45:32 UTC, 2 replies.
- Difference in behavior for Spark 3.0 vs Spark 3.1 "create database " - posted by Pralabh Kumar <pr...@gmail.com> on 2022/01/10 13:41:16 UTC, 1 replies.
- pyspark loop optimization - posted by Ramesh Natarajan <ra...@gmail.com> on 2022/01/10 22:48:19 UTC, 2 replies.
- [Spark ML Pipeline]: Error Loading Pipeline Model with Custom Transformer - posted by Alana Young <ay...@dtechspace.com> on 2022/01/11 17:27:37 UTC, 2 replies.
- Does Spark 3.1.2/3.2 support log4j 2.17.1+, and how? your target release day for Spark3.3? - posted by Juan Liu <li...@cn.ibm.com> on 2022/01/12 14:50:20 UTC, 1 replies.
- Re: Does Spark 3.1.2/3.2 support log4j 2.17.1+, and how? your target release day for Spark3.3? - posted by Artemis User <ar...@dtechspace.com> on 2022/01/12 15:07:11 UTC, 8 replies.
- Log4j2 upgrade - posted by Amit Sharma <re...@gmail.com> on 2022/01/12 19:52:19 UTC, 1 replies.
- Seem like a bug in ExecutorAllocationManager, because numberMaxNeededExecutors value is negative in JMX exporter, which is unreasonable - posted by 徐涛 <xu...@163.com> on 2022/01/13 01:41:23 UTC, 1 replies.
- Spark Unary Transformer Example - posted by Alana Young <ay...@dtechspace.com> on 2022/01/13 14:35:42 UTC, 0 replies.
- Spark on Oracle available as an Apache licensed open source repo - posted by Harish Butani <rh...@gmail.com> on 2022/01/14 00:49:34 UTC, 2 replies.
- about memory size for loading file - posted by frakass <ca...@free.fr> on 2022/01/14 06:17:13 UTC, 3 replies.
- groupMapReduce - posted by frakass <ca...@free.fr> on 2022/01/14 10:55:41 UTC, 2 replies.
- Spark with parallel processing and event driven architecture - posted by "ashok34668@yahoo.com.INVALID" <as...@yahoo.com.INVALID> on 2022/01/14 19:31:41 UTC, 1 replies.
- unsubscribe - posted by ALOK KUMAR SINGH <sa...@gmail.com> on 2022/01/14 23:04:22 UTC, 10 replies.
- question of shorten syntax for rdd - posted by ca...@free.fr on 2022/01/17 10:54:37 UTC, 0 replies.
- Spark on k8s : spark 3.0.1 spark.kubernetes.executor.deleteontermination issue - posted by Pralabh Kumar <pr...@gmail.com> on 2022/01/18 05:50:34 UTC, 1 replies.
- [ML Intermediate]: Slow fitting of Linear regression vs Sklearn - posted by Hu You <hy...@outlook.com> on 2022/01/18 06:37:17 UTC, 0 replies.
- Regarding spark-3.2.0 decommission features. - posted by "Patidar, Mohanlal (Nokia - IN/Bangalore)" <mo...@nokia.com> on 2022/01/18 08:31:31 UTC, 2 replies.
- [Pyspark] How to download Zip file from SFTP location and put in into Azure Data Lake and unzip it - posted by Heta Desai <he...@1rivet.com> on 2022/01/18 14:16:25 UTC, 1 replies.
- newbie question for reduce - posted by ca...@free.fr on 2022/01/19 02:41:11 UTC, 2 replies.
- Self contained Spark application with local master without spark-submit - posted by Colin Williams <co...@gmail.com> on 2022/01/19 08:59:53 UTC, 1 replies.
- Issue: Spring-Boot vs. Apache Spark Dependencies - posted by "Heyde, Andreas" <an...@dzbank.de> on 2022/01/19 12:14:03 UTC, 4 replies.
- Profiling spark application - posted by Prasad Bhalerao <pr...@gmail.com> on 2022/01/20 05:18:04 UTC, 4 replies.
- Code fails when AQE enabled in Spark 3.1 - posted by Gaspar Muñoz <gm...@datiobd.com> on 2022/01/20 07:55:04 UTC, 0 replies.
- Spark 3.2.0 upgrade - posted by Amit Sharma <re...@gmail.com> on 2022/01/20 22:17:50 UTC, 3 replies.
- java.lang.StackOverflow Error How to sum across rows in a data frame with a large number of columns - posted by Andrew Davidson <ae...@ucsc.edu.INVALID> on 2022/01/20 22:20:32 UTC, 0 replies.
- How to configure log4j in pyspark to get log level, file name, and line number - posted by Andrew Davidson <ae...@ucsc.edu.INVALID> on 2022/01/20 22:32:39 UTC, 1 replies.
- questions on these functions - posted by Sherd Fox <sh...@gmail.com> on 2022/01/21 09:25:30 UTC, 3 replies.
- What happens when a partition that holds data under a task fails - posted by Siddhesh Kalgaonkar <ka...@gmail.com> on 2022/01/21 17:53:49 UTC, 9 replies.
- Is user@spark indexed by google? - posted by Andrew Davidson <ae...@ucsc.edu.INVALID> on 2022/01/21 18:02:31 UTC, 2 replies.
- Unsubscribe - posted by Aniket Khandelwal <ak...@gmail.com> on 2022/01/21 18:08:28 UTC, 1 replies.
- Migration to Spark 3.2 - posted by Aurélien Mazoyer <au...@aepsilon.com> on 2022/01/21 23:49:14 UTC, 6 replies.
- What are the most common operators for shuffle in Spark - posted by "ashok34668@yahoo.com.INVALID" <as...@yahoo.com.INVALID> on 2022/01/23 17:40:01 UTC, 1 replies.
- What are your experiences using google cloud platform - posted by Andrew Davidson <ae...@ucsc.edu.INVALID> on 2022/01/23 21:18:14 UTC, 4 replies.
- Question about ports in spark - posted by Bitfox <bi...@bitfox.top> on 2022/01/24 06:16:03 UTC, 0 replies.
- may I need a join here? - posted by Bitfox <bi...@bitfox.top> on 2022/01/24 06:38:14 UTC, 1 replies.
- Spark execution on Hadoop cluster (many nodes) - posted by sam smith <qu...@gmail.com> on 2022/01/24 13:12:16 UTC, 8 replies.
- triggering spark python app using native REST api - posted by "Michael Williams (SSI)" <Mi...@ssigroup.com> on 2022/01/24 15:52:25 UTC, 0 replies.
- Fwd: Cassandra driver upgrade - posted by Amit Sharma <re...@gmail.com> on 2022/01/24 22:05:00 UTC, 0 replies.
- Small optimization questions - posted by Aki Riisiö <ak...@gmail.com> on 2022/01/25 11:57:16 UTC, 6 replies.
- Bottlenecks in spark application (Spark version 3.0) - posted by Prasad Bhalerao <pr...@gmail.com> on 2022/01/25 12:09:42 UTC, 3 replies.
- Spark and scalability in k8s etc - posted by Mich Talebzadeh <mi...@gmail.com> on 2022/01/25 19:44:06 UTC, 2 replies.
- [Spark UDF]: Where does UDF stores temporary Arrays/Sets - posted by Abhimanyu Kumar Singh <ab...@gmail.com> on 2022/01/26 15:32:24 UTC, 4 replies.
- question for definition of column types - posted by ca...@free.fr on 2022/01/27 02:48:46 UTC, 2 replies.
- DataStreamReader cleanSource option - posted by Gabriela Dvořáková <ga...@monthio.com.INVALID> on 2022/01/27 07:21:13 UTC, 1 replies.
- How to delete the record - posted by Sid Kal <fl...@gmail.com> on 2022/01/27 15:46:51 UTC, 13 replies.
- how can I remove the warning message - posted by ca...@free.fr on 2022/01/28 11:13:17 UTC, 3 replies.
- Kafka to spark streaming - posted by Amit Sharma <re...@gmail.com> on 2022/01/28 22:13:36 UTC, 3 replies.
- [ANNOUNCE] Apache Spark 3.2.1 released - posted by huaxin gao <hu...@gmail.com> on 2022/01/29 01:07:13 UTC, 9 replies.
- 回复：[ANNOUNCE] Apache Spark 3.2.1 released - posted by Ruifeng Zheng <ru...@foxmail.com> on 2022/01/29 01:36:37 UTC, 0 replies.
- [mongo-spark-connector] How can I improve the performance of Mongo spark write? - posted by sj p <pa...@gmail.com> on 2022/01/29 04:32:13 UTC, 0 replies.
- Log4j upgrade in spark binary from 1.2.17 to 2.17.1 - posted by "KS, Rajabhupati" <Ra...@comcast.com.INVALID> on 2022/01/30 03:32:48 UTC, 3 replies.
- A Persisted Spark DataFrame is computed twice - posted by Benjamin Du <le...@outlook.com> on 2022/01/30 08:35:51 UTC, 8 replies.
- why the pyspark RDD API is so slow? - posted by Bitfox <bi...@bitfox.top> on 2022/01/30 10:10:20 UTC, 5 replies.
- [ANNOUNCE] Apache Kyuubi (Incubating) released 1.4.1-incubating - posted by Vino Yang <vi...@apache.org> on 2022/01/31 06:45:31 UTC, 2 replies.
- Regarding Spark Cassandra Metrics - posted by Yogesh Kumar Garg <yo...@gmail.com> on 2022/01/31 10:14:15 UTC, 0 replies.
- - posted by pd...@chuliege.be on 2022/01/31 14:11:06 UTC, 3 replies.
- RE: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1 - posted by "KS, Rajabhupati" <Ra...@comcast.com.INVALID> on 2022/01/31 17:14:33 UTC, 3 replies.
- bucketBy in pyspark not retaining partition information - posted by Nitin Siwach <ni...@gmail.com> on 2022/01/31 17:51:41 UTC, 0 replies.