user@spark.apache.org, 2020-06

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: Spark Security - posted by "Wilbert S." <wi...@gmail.com> on 2020/06/01 12:19:55 UTC, 2 replies.
- [PySpark 2.3+] Reading parquet entire path vs a set of file paths - posted by Rishi Shah <ri...@gmail.com> on 2020/06/01 13:33:20 UTC, 1 replies.
- Re: Using Spark Accumulators with Structured Streaming - posted by ZHANG Wei <we...@outlook.com> on 2020/06/02 02:28:36 UTC, 7 replies.
- Join on Condition provide at run time - posted by Chetan Khatri <ch...@gmail.com> on 2020/06/02 15:53:49 UTC, 0 replies.
- Spark stage stuck - posted by Manjunath Shetty H <ma...@live.com> on 2020/06/02 16:45:03 UTC, 0 replies.
- NoClassDefFoundError: scala/Product$class - posted by charles_cai <16...@qq.com> on 2020/06/03 05:44:27 UTC, 5 replies.
- WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped - posted by YuqingWan <24...@qq.com> on 2020/06/04 03:12:24 UTC, 0 replies.
- [Spark RDD] Persisting Spark RDDs across spark contexts/applications - options - posted by Boris Litvak <bo...@skf.com> on 2020/06/04 07:11:11 UTC, 1 replies.
- Re: [PySpark] Tagging descriptions - posted by Rishi Shah <ri...@gmail.com> on 2020/06/04 20:14:16 UTC, 0 replies.
- How to set Description in UI SQL tab - posted by gpatcham <gp...@gmail.com> on 2020/06/04 20:20:49 UTC, 1 replies.
- Re: Add python library with native code - posted by Dark Crusader <re...@gmail.com> on 2020/06/05 02:32:15 UTC, 4 replies.
- Unsubscribe - posted by Sunil Prabhakara <su...@gmail.com> on 2020/06/06 13:49:45 UTC, 3 replies.
- [pyspark 2.3+] Add scala library to pyspark app and use to derive columns - posted by Rishi Shah <ri...@gmail.com> on 2020/06/06 17:05:52 UTC, 0 replies.
- Add python library - posted by Anwar AliKhan <an...@gmail.com> on 2020/06/06 20:16:07 UTC, 1 replies.
- Spark :- Update record in partition. - posted by Sunil Kalra <su...@gmail.com> on 2020/06/07 16:35:28 UTC, 1 replies.
- Structured Streaming using File Source - How to handle live files - posted by ArtemisDev <ar...@dtechspace.com> on 2020/06/07 17:41:51 UTC, 2 replies.
- unsubscribe - posted by Arkadiy Ver <ar...@gmail.com> on 2020/06/08 05:12:27 UTC, 14 replies.
- we control spark file names before we write them - should we opensource it? - posted by ilaimalka <il...@nielsen.com> on 2020/06/08 13:16:36 UTC, 4 replies.
- Out of memory causing due to high number of spark submissions in FIFO mode - posted by Sunil Pasumarthi <it...@gmail.com> on 2020/06/09 07:40:48 UTC, 0 replies.
- [PySpark CrossValidator] Dropping column randCol before fitting model - posted by Ablaye FAYE <fa...@gmail.com> on 2020/06/09 08:59:38 UTC, 0 replies.
- Re: [SPARK-30957][SQL] Null-safe variant of Dataset.join(Dataset[_], Seq[String]) - posted by Alexandros Biratsis <al...@gmail.com> on 2020/06/09 10:59:47 UTC, 0 replies.
- [spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters - posted by Srinivas V <sr...@gmail.com> on 2020/06/09 16:10:15 UTC, 4 replies.
- how can i write spark addListener metric to kafka - posted by a s <al...@gmail.com> on 2020/06/09 20:40:39 UTC, 1 replies.
- Issue with pyspark query - posted by Tzahi File <tz...@ironsrc.com> on 2020/06/10 11:24:04 UTC, 0 replies.
- Accessing Teradata DW data from Spark - posted by Mich Talebzadeh <mi...@gmail.com> on 2020/06/10 16:49:36 UTC, 1 replies.
- Broadcast join data reuse - posted by tc...@gmail.com on 2020/06/11 00:09:10 UTC, 2 replies.
- Does Spark SQL support GRANT/REVOKE operations on Tables? - posted by Nasrulla Khan Haris <Na...@microsoft.com.INVALID> on 2020/06/11 00:55:22 UTC, 2 replies.
- [ANNOUNCE] Apache Spark 2.4.6 released - posted by Holden Karau <ho...@apache.org> on 2020/06/11 01:37:45 UTC, 3 replies.
- Re: Arrow RecordBatches/Pandas Dataframes to (Arrow enabled) Spark Dataframe conversion in streaming fashion - posted by Tanveer Ahmad - EWI <T....@tudelft.nl> on 2020/06/11 15:05:55 UTC, 0 replies.
- [External] Unsubscribe - posted by "Mishra, Dhiraj A." <dh...@accenture.com.INVALID> on 2020/06/11 17:07:54 UTC, 0 replies.
- Spark ml how to extract split points from trained decision tree mode - posted by AaronLee <yl...@wish.com.INVALID> on 2020/06/11 23:57:44 UTC, 5 replies.
- Re: Unsubscribe martha focker - posted by ha...@hushmail.com.INVALID on 2020/06/12 02:55:02 UTC, 0 replies.
- GPU Acceleration for spark-3.0.0 - posted by charles_cai <16...@qq.com> on 2020/06/13 03:50:09 UTC, 3 replies.
- [spark-structured-streaming] [stateful] - posted by Srinivas V <sr...@gmail.com> on 2020/06/14 06:47:10 UTC, 0 replies.
- [2.4.5 Standalone Master]: Idle cores not being allocated - posted by krchia <ka...@gmail.com> on 2020/06/15 12:01:41 UTC, 0 replies.
- GroupBy issue while running K-Means - Dataframe - posted by Deepak Sharma <de...@gmail.com> on 2020/06/16 07:29:41 UTC, 0 replies.
- Spark dataframe creation through already distributed in-memory data sets - posted by Tanveer Ahmad - EWI <T....@tudelft.nl> on 2020/06/16 14:01:13 UTC, 0 replies.
- Check point storage and its redundancy - posted by shensonj <sh...@gmail.com> on 2020/06/16 17:06:17 UTC, 0 replies.
- how to know what happen between tasks launch - posted by lk_spark <lk...@163.com> on 2020/06/17 11:29:08 UTC, 0 replies.
- Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues) - posted by Rachana Srivastava <ra...@yahoo.com.INVALID> on 2020/06/17 13:44:16 UTC, 19 replies.
- How to manage offsets in Spark Structured Streaming? - posted by Rachana Srivastava <ra...@yahoo.com.INVALID> on 2020/06/17 16:00:29 UTC, 0 replies.
- java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtoco - posted by murat migdisoglu <mu...@gmail.com> on 2020/06/17 22:35:46 UTC, 0 replies.
- Reading TB of JSON file - posted by Chetan Khatri <ch...@gmail.com> on 2020/06/18 13:11:45 UTC, 11 replies.
- [ANNOUNCE] Apache Spark 3.0.0 - posted by Reynold Xin <rx...@databricks.com> on 2020/06/18 17:21:24 UTC, 7 replies.
- Custom Metrics - posted by Bryan Jeffrey <br...@gmail.com> on 2020/06/18 19:22:50 UTC, 0 replies.
- Re: java.lang.ClassNotFoundException for s3a comitter - posted by murat migdisoglu <mu...@gmail.com> on 2020/06/19 00:24:02 UTC, 2 replies.
- [pyspark 2.3+] read/write huge data with smaller block size (128MB per block) - posted by Rishi Shah <ri...@gmail.com> on 2020/06/19 05:16:19 UTC, 2 replies.
- Hey good looking toPandas () - posted by Anwar AliKhan <an...@gmail.com> on 2020/06/19 06:56:25 UTC, 2 replies.
- Kafka Zeppelin integration - posted by si...@dtechspace.com on 2020/06/20 02:41:45 UTC, 1 replies.
- Re: Hey good looking toPandas () error stack - posted by Anwar AliKhan <an...@gmail.com> on 2020/06/20 10:17:08 UTC, 3 replies.
- Spark Thrift Server in Kubernetes deployment - posted by Subash K <su...@ericsson.com.INVALID> on 2020/06/22 03:30:22 UTC, 1 replies.
- Using hadoop-cloud_2.12 jars - posted by Rahij Ramsharan <ra...@gmail.com> on 2020/06/22 09:00:10 UTC, 2 replies.
- Reg - Why Apache Hadoop need to be Installed separately for Running Apache Spark…? - posted by Praveen Kumar Ramachandran <n....@gmail.com> on 2020/06/22 11:45:36 UTC, 0 replies.
- Documentation on SupportsReportStatistics Outdated? - posted by Micah Kornfield <em...@gmail.com> on 2020/06/22 17:04:51 UTC, 0 replies.
- How to disable pushdown predicate in spark 2.x query - posted by Mohit Durgapal <du...@gmail.com> on 2020/06/22 18:36:23 UTC, 1 replies.
- CVE-2020-9480: Apache Spark RCE vulnerability in auth-enabled standalone master - posted by Sean Owen <sr...@apache.org> on 2020/06/22 21:49:30 UTC, 0 replies.
- apache-spark mongodb dataframe issue - posted by Harmanat Singh <wi...@gmail.com> on 2020/06/23 07:04:51 UTC, 2 replies.
- elasticsearch-hadoop is not compatible with spark 3.0( scala 2.12) - posted by murat migdisoglu <mu...@gmail.com> on 2020/06/23 11:48:38 UTC, 0 replies.
- Where are all the jars gone ? - posted by Anwar AliKhan <an...@gmail.com> on 2020/06/23 19:21:01 UTC, 5 replies.
- Spark Small file issue - posted by Hichki <ha...@gmail.com> on 2020/06/23 21:35:22 UTC, 6 replies.
- [Spark Streaming] predicate pushdown in custom connector source. - posted by Rahul Kumar <rk...@gmail.com> on 2020/06/24 00:02:50 UTC, 0 replies.
- Found jars in /assembly/target/scala-2.12/jars - posted by Anwar AliKhan <an...@gmail.com> on 2020/06/24 00:25:32 UTC, 0 replies.
- LynxKite is now open-source - posted by Daniel Darabos <da...@lynxanalytics.com.INVALID> on 2020/06/24 09:43:15 UTC, 1 replies.
- Error: Vignette re-building failed. Execution halted - posted by Anwar AliKhan <an...@gmail.com> on 2020/06/24 09:49:09 UTC, 2 replies.
- High Availability for spark streaming application running in kubernetes - posted by Shenson Joseph <sh...@gmail.com> on 2020/06/24 13:35:45 UTC, 0 replies.
- [Structured spak streaming] How does cassandra connector readstream deals with deleted record - posted by Rahul Kumar <rk...@gmail.com> on 2020/06/25 02:13:28 UTC, 2 replies.
- Arrow RecordBatches to Spark Dataframe - posted by Tanveer Ahmad - EWI <T....@tudelft.nl> on 2020/06/25 03:35:01 UTC, 0 replies.
- Suggested Amendment to ./dev/make-distribution.sh - posted by Anwar AliKhan <an...@gmail.com> on 2020/06/25 09:21:43 UTC, 0 replies.
- Getting PySpark Partitions Locations - posted by Tzahi File <tz...@ironsrc.com> on 2020/06/25 12:51:39 UTC, 4 replies.
- Blog : Apache Spark Window Functions - posted by neeraj bhadani <bh...@gmail.com> on 2020/06/25 17:57:18 UTC, 0 replies.
- Metrics Problem - posted by Bryan Jeffrey <br...@gmail.com> on 2020/06/25 21:32:34 UTC, 6 replies.
- Spark 3 pod template for the driver - posted by Michel Sumbul <mi...@yahoo.fr.INVALID> on 2020/06/26 12:53:00 UTC, 4 replies.
- Data Explosion and repartition before group bys - posted by lsn24 <le...@gmail.com> on 2020/06/26 16:53:06 UTC, 0 replies.
- Spark Structured Streaming: “earliest” as “startingOffsets” is not working - posted by Something Something <ma...@gmail.com> on 2020/06/26 21:12:04 UTC, 2 replies.
- When is a Bigint a long and when is a long a long - posted by Anwar AliKhan <an...@gmail.com> on 2020/06/27 13:28:59 UTC, 6 replies.
- Distributed Anomaly Detection using MIDAS - posted by Shivin Srivastava <sh...@comp.nus.edu.sg> on 2020/06/27 14:02:17 UTC, 0 replies.
- Spark 3.0.0 spark.read.json never completes - posted by Sanjeev Mishra <sa...@gmail.com> on 2020/06/27 22:51:36 UTC, 0 replies.
- Spark 3.0 almost 1000 times slower to read json than Spark 2.4 - posted by Sanjeev Mishra <sa...@gmail.com> on 2020/06/27 23:58:12 UTC, 17 replies.
- Announcing ApacheCon @Home 2020 - posted by Rich Bowen <rb...@apache.org> on 2020/06/29 12:54:01 UTC, 0 replies.
- File Not Found: /tmp/spark-events in Spark 3.0 - posted by ArtemisDev <ar...@dtechspace.com> on 2020/06/29 14:19:46 UTC, 1 replies.
- [Debug] [Spark Core 2.4.4] org.apache.spark.storage.BlockException: Negative block size -9223372036854775808 - posted by Adam Tobey <ad...@datadoghq.com.INVALID> on 2020/06/29 21:05:17 UTC, 0 replies.
- Spark 3.0 ArrayIndexOutOfBoundsException at RDDOperationScope.toJson - posted by Taegeon Um <ta...@gmail.com> on 2020/06/30 01:34:08 UTC, 1 replies.
- Apache Spark Meetup - Wednesday 1st July - posted by Joe Davies <jo...@orbisconsultants.com> on 2020/06/30 09:21:33 UTC, 0 replies.
- XmlReader not Parsing the Nested elements in XML properly - posted by mars76 <sk...@yahoo.com.INVALID> on 2020/06/30 16:28:05 UTC, 1 replies.
- Question about 'maxOffsetsPerTrigger' - posted by Eric Beabes <ma...@gmail.com> on 2020/06/30 17:54:23 UTC, 0 replies.
- spark on kubernetes client mode - posted by Pradeepta Choudhury <pr...@gmail.com> on 2020/06/30 18:02:30 UTC, 0 replies.