user@spark.apache.org, 2019-09

You are viewing a plain text version of this content. The canonical link for it is here.

- [ANNOUNCE] Announcing Apache Spark 2.4.4 - posted by Dongjoon Hyun <do...@gmail.com> on 2019/09/01 21:54:50 UTC, 3 replies.
- Re: Control Sqoop job from Spark job - posted by Chetan Khatri <ch...@gmail.com> on 2019/09/02 11:11:17 UTC, 6 replies.
- Unit testing PySpark Code and doing assertion - posted by Rahul Nandi <ra...@gmail.com> on 2019/09/03 15:04:29 UTC, 0 replies.
- Re: EMR Spark 2.4.3 executor hang - posted by Vadim Semenov <va...@datadoghq.com.INVALID> on 2019/09/03 20:45:24 UTC, 0 replies.
- Structured Streaming: How to add a listener for when a batch is complete - posted by Natalie Ruiz <Na...@microsoft.com.INVALID> on 2019/09/03 22:25:47 UTC, 2 replies.
- Re: Even after VO fields are mapped using @Table and @Column annotations get error NoSuchElementException - posted by Shyam P <sh...@gmail.com> on 2019/09/04 18:41:01 UTC, 0 replies.
- Issue with structured streaming custom data source V2 - posted by "stevech.hu" <st...@outlook.com> on 2019/09/05 02:43:39 UTC, 0 replies.
- How to query StructField's metadata in spark sql? - posted by kyunam <ky...@hotmail.com> on 2019/09/05 07:09:28 UTC, 0 replies.
- [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations - posted by Sethupathi T <se...@googlemail.com.INVALID> on 2019/09/05 08:05:16 UTC, 5 replies.
- Test mail - posted by Himali Patel <hi...@thalesgroup.com> on 2019/09/05 10:20:39 UTC, 0 replies.
- Tune hive query launched thru spark-yarn job. - posted by Himali Patel <hi...@thalesgroup.com> on 2019/09/05 12:10:03 UTC, 0 replies.
- Re: Tune hive query launched thru spark-yarn job. - posted by Sathi Chowdhury <sa...@yahoo.com.INVALID> on 2019/09/05 14:40:13 UTC, 1 replies.
- Re: read image or binary files / spark 2.3 - posted by Peter Liu <pe...@gmail.com> on 2019/09/05 18:13:00 UTC, 0 replies.
- Collecting large dataset - posted by Rishikesh Gawade <ri...@gmail.com> on 2019/09/05 18:22:45 UTC, 1 replies.
- Start point to read source codes - posted by da zhou <zh...@gmail.com> on 2019/09/05 20:11:25 UTC, 2 replies.
- org.apache.spark.sql.AnalysisException: Detected implicit cartesian product - posted by kyunam <ky...@hotmail.com> on 2019/09/05 23:58:24 UTC, 0 replies.
- how to refresh the loaded non-streaming dataframe for each steaming batch ? - posted by Shyam P <sh...@gmail.com> on 2019/09/06 04:29:37 UTC, 4 replies.
- DataSourceV2: pushFilters() is not invoked for each read call - spark 2.3.2 - posted by Shubham Chaurasia <sh...@gmail.com> on 2019/09/06 06:24:40 UTC, 1 replies.
- Anonymous functions cannot be found - posted by Yuta Morisawa <yu...@kddi-research.jp> on 2019/09/06 07:06:29 UTC, 0 replies.
- Question on streaming job wait and re-run - posted by David Zhou <zh...@gmail.com> on 2019/09/06 21:07:19 UTC, 0 replies.
- OOM Error - posted by Ankit Khettry <ju...@gmail.com> on 2019/09/06 23:33:54 UTC, 9 replies.
- Re: read binary files (for stream reader) / spark 2.3 - posted by Peter Liu <pe...@gmail.com> on 2019/09/09 14:07:47 UTC, 0 replies.
- Problem upgrading from 2.3.1 to 2.4.3 with gradle - posted by Nathan Kronenfeld <nk...@uncharted.software> on 2019/09/09 20:48:07 UTC, 0 replies.
- [ANNOUNCE] Announcing Apache Spark 2.3.4 - posted by Kazuaki Ishizaki <IS...@jp.ibm.com> on 2019/09/10 04:37:40 UTC, 0 replies.
- Custom encoders and udf's - posted by jelmer <jk...@gmail.com> on 2019/09/10 10:32:00 UTC, 0 replies.
- Deadlock using Barrier Execution - posted by csmith <ch...@zocdoc.com> on 2019/09/10 15:12:30 UTC, 0 replies.
- script running in jupyter 6-7x faster than spark submit - posted by Dhrubajyoti Hati <dh...@gmail.com> on 2019/09/10 18:32:32 UTC, 14 replies.
- Re: question about pyarrow.Table to pyspark.DataFrame conversion - posted by Bryan Cutler <cu...@gmail.com> on 2019/09/10 19:17:25 UTC, 0 replies.
- Access all of the custom streaming query listeners that were registered to spark session - posted by Natalie Ruiz <Na...@microsoft.com.INVALID> on 2019/09/10 20:18:13 UTC, 1 replies.
- Spark Kafka Streaming making progress but there is no data to be consumed - posted by Charles vinodh <mi...@gmail.com> on 2019/09/11 21:38:59 UTC, 7 replies.
- Exception when reading multiline JSON file - posted by Kumaresh AK <ku...@nielsen.com> on 2019/09/12 17:03:38 UTC, 1 replies.
- Inconsistent dataset behavior between file and in-memory versions - posted by Dean Arnold <re...@gmail.com> on 2019/09/12 18:41:59 UTC, 0 replies.
- Monitor Spark Applications - posted by raman gugnani <ra...@gmail.com> on 2019/09/13 04:57:47 UTC, 3 replies.
- Cluster sizing - posted by Riccardo Ferrari <fe...@gmail.com> on 2019/09/13 07:39:01 UTC, 0 replies.
- [Spark SQL]: Does Union operation followed by drop duplicate follows "keep first" - posted by Abhinesh Hada <ab...@gmail.com> on 2019/09/13 15:43:28 UTC, 4 replies.
- Partitioning query - posted by ☼ R Nair <ra...@gmail.com> on 2019/09/13 19:47:34 UTC, 0 replies.
- Conflicting PySpark Storage Level Defaults? - posted by grp <gp...@villanova.edu> on 2019/09/16 00:07:19 UTC, 1 replies.
- RE: Classloading issues when using connectors with Uber jars with improper Shading in single Spark job - posted by "Sharma, Praneet" <pr...@informatica.com.INVALID> on 2019/09/16 02:30:51 UTC, 0 replies.
- Unable to verify in-transit encryption - posted by G R <gr...@gmail.com> on 2019/09/16 18:25:31 UTC, 0 replies.
- Can anyone suggest what is wrong with my spark job here? - posted by Shyam P <sh...@gmail.com> on 2019/09/16 18:39:41 UTC, 0 replies.
- Re: [EXTERNAL] Re: Conflicting PySpark Storage Level Defaults? - posted by grp <gp...@villanova.edu> on 2019/09/16 22:21:39 UTC, 0 replies.
- how can I dynamic parse json in kafka when using Structured Streaming - posted by lk_spark <lk...@163.com> on 2019/09/17 02:39:28 UTC, 2 replies.
- How to integrates MLeap to Spark Structured Streaming - posted by Praful Rana <ra...@gmail.com> on 2019/09/17 13:32:22 UTC, 0 replies.
- How to Integrate Spark mllib Streaming Training Models To Spark Structured Streaming - posted by Praful Rana <ra...@gmail.com> on 2019/09/17 13:38:46 UTC, 0 replies.
- Re: intermittent Kryo serialization failures in Spark - posted by Jerry Vinokurov <gr...@gmail.com> on 2019/09/17 14:37:44 UTC, 5 replies.
- Can I set the Alluxio WriteType in Spark applications? - posted by Mark Zhao <gu...@gmail.com> on 2019/09/17 14:52:59 UTC, 1 replies.
- custom rdd - do I need a hadoop input format? - posted by Marcelo Valle <ma...@ktech.com> on 2019/09/17 15:28:46 UTC, 3 replies.
- spark 2.x design docs - posted by Ka...@ril.com on 2019/09/19 06:04:17 UTC, 1 replies.
- RE: [External]Re: spark 2.x design docs - posted by Ka...@ril.com on 2019/09/19 07:16:10 UTC, 2 replies.
- unsubscribe - posted by Mario Amatucci <Ma...@epam.com.INVALID> on 2019/09/19 07:18:04 UTC, 0 replies.
- - posted by Georg Heiler <ge...@gmail.com> on 2019/09/19 09:14:31 UTC, 0 replies.
- Incorrect results in left_outer join in DSv2 implementation with filter pushdown - spark 2.3.2 - posted by Shubham Chaurasia <sh...@gmail.com> on 2019/09/19 13:24:48 UTC, 0 replies.
- Parquet read performance for different schemas - posted by Tomas Bartalos <to...@gmail.com> on 2019/09/19 16:10:06 UTC, 2 replies.
- Re: Low cache hit ratio when running Spark on Alluxio - posted by Bin Fan <fa...@gmail.com> on 2019/09/19 18:02:59 UTC, 0 replies.
- why spark oom off-heap? - posted by "jibiyr@qq.com" <ji...@qq.com> on 2019/09/20 05:25:11 UTC, 0 replies.
- Collections passed from driver to executors - posted by Dhrubajyoti Hati <dh...@gmail.com> on 2019/09/20 06:22:49 UTC, 4 replies.
- [Ask for help] How to manually submit offsetRanges - posted by Fangyuan Liu <Fa...@microsoft.com.INVALID> on 2019/09/20 12:23:36 UTC, 0 replies.
- graphx vs graphframes - posted by Nicolas Paris <ni...@riseup.net> on 2019/09/22 20:17:15 UTC, 0 replies.
- Kafka offset committer tool for structured streaming query - posted by Jungtaek Lim <ka...@gmail.com> on 2019/09/23 15:59:32 UTC, 0 replies.
- Google Cloud and Spark in the docker consideration for rreal time streaming data - posted by Mich Talebzadeh <mi...@gmail.com> on 2019/09/23 19:04:53 UTC, 0 replies.
- Efficient cosine similarity computation - posted by "Stevens, Clay" <Cl...@wolterskluwer.com> on 2019/09/23 20:20:03 UTC, 1 replies.
- PySpark with custom transformer project organization - posted by Femi Anthony <fe...@gmail.com> on 2019/09/23 21:13:40 UTC, 0 replies.
- Intermittently getting "Can not create the managed table error" while creating table from spark 2.4 - posted by abhijeet bedagkar <qa...@gmail.com> on 2019/09/25 06:57:03 UTC, 0 replies.
- Standalone Spark, How to find (driver's ) final status for an application - posted by Nilkanth Patel <ni...@gmail.com> on 2019/09/26 05:42:34 UTC, 0 replies.
- Urgent : Changes required in the archive - posted by Vishal Verma <vi...@exadatum.com> on 2019/09/26 06:53:52 UTC, 1 replies.
- [Spark SS] Spark-23541 Backward Compatibility on 2.3.2 - posted by "Ahn, Daniel" <da...@optum.com.INVALID> on 2019/09/26 19:39:25 UTC, 0 replies.
- Shuffle Spill to Disk - posted by Jack Kolokasis <ko...@ics.forth.gr> on 2019/09/28 19:45:58 UTC, 0 replies.
- Read text file row by row and apply conditions - posted by swetha kadiyala <sw...@gmail.com> on 2019/09/30 03:26:30 UTC, 2 replies.
- How to handle this use-case in spark-sql-streaming - posted by Shyam P <sh...@gmail.com> on 2019/09/30 10:51:10 UTC, 0 replies.
- Announcing .NET for Apache Spark 0.5.0 - posted by Terry Kim <yu...@gmail.com> on 2019/09/30 16:37:41 UTC, 1 replies.