user@spark.apache.org, 2020-04

You are viewing a plain text version of this content. The canonical link for it is here.

- MySQL query continually add IS NOT NULL onto a query even though I don't request it - posted by Hamish Whittal <ha...@cloud-fundis.co.za> on 2020/04/01 05:47:26 UTC, 1 replies.
- Spark Streaming on Compact Kafka topic - consumers 1 message per partition per batch - posted by Hrishikesh Mishra <sd...@gmail.com> on 2020/04/01 06:30:39 UTC, 2 replies.
- [ML] How to find the default hyperparameters for models in SparkML? - posted by Jing Lu <aj...@gmail.com> on 2020/04/01 23:43:28 UTC, 0 replies.
- [PySpark][K8s] Question about a file in `/etc/apache2` - posted by psschwei <ps...@gmail.com> on 2020/04/02 01:39:50 UTC, 0 replies.
- Spark 1.6 and ORC bucketed queries - posted by Manjunath Shetty H <ma...@live.com> on 2020/04/02 03:18:05 UTC, 0 replies.
- unsubscribe - posted by Alfredo Marquez <al...@gmail.com> on 2020/04/02 13:16:56 UTC, 10 replies.
- Unsubscribe - posted by Alfredo Marquez <al...@gmail.com> on 2020/04/02 13:59:25 UTC, 4 replies.
- Re: Unablee to get to_timestamp with Timezone Information - posted by Enrico Minack <ma...@Enrico.Minack.dev> on 2020/04/02 15:49:10 UTC, 1 replies.
- INTERVAL function not working - posted by Aakash Basu <aa...@gmail.com> on 2020/04/02 16:06:08 UTC, 0 replies.
- Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt - posted by jane thorpe <ja...@aol.com.INVALID> on 2020/04/03 01:44:58 UTC, 3 replies.
- unix_timestamp() equivalent in plain Spark SQL Query - posted by Aakash Basu <aa...@gmail.com> on 2020/04/03 03:47:41 UTC, 0 replies.
- Spark-3.0.0 GA - posted by Marshall Markham <mm...@precisionlender.com> on 2020/04/03 12:17:52 UTC, 3 replies.
- spark-submit exit status on k8s - posted by Marshall Markham <mm...@precisionlender.com> on 2020/04/03 12:23:44 UTC, 7 replies.
- Serialization or internal functions? - posted by em...@yeikel.com on 2020/04/04 18:07:11 UTC, 4 replies.
- (float(9)/5)*x + 32) when x = 12.8 - posted by jane thorpe <ja...@aol.com.INVALID> on 2020/04/05 03:33:42 UTC, 0 replies.
- pandas_udf is very slow - posted by Lian Jiang <ji...@gmail.com> on 2020/04/05 07:28:22 UTC, 3 replies.
- Spark, read from Kafka stream failing AnalysisException - posted by Sumit Agrawal <su...@continuity1.com> on 2020/04/05 15:09:03 UTC, 1 replies.
- How does spark sql evaluate case statements? - posted by kant kodali <ka...@gmail.com> on 2020/04/06 07:55:23 UTC, 3 replies.
- Security vulnerabilities due to Jackson Databind - posted by simonhampe <si...@iteratec.com> on 2020/04/06 08:07:55 UTC, 0 replies.
- Scala version compatibility - posted by Andrew Melo <an...@gmail.com> on 2020/04/06 19:49:45 UTC, 6 replies.
- Lifecycle of a map function - posted by Vadim Vararu <va...@adswizz.com> on 2020/04/07 08:44:13 UTC, 0 replies.
- IDE suitable for Spark - posted by Zahid Rahman <za...@gmail.com> on 2020/04/07 08:45:33 UTC, 6 replies.
- Spark Union Breaks Caching Behaviour - posted by Yi Huang <hu...@gmail.com> on 2020/04/07 16:44:35 UTC, 0 replies.
- Re: IDE suitable for Spark : Monitoring & Debugging Spark Jobs - posted by Som Lima <so...@gmail.com> on 2020/04/07 21:01:28 UTC, 0 replies.
- How to handle Null values in Array of struct elements in pyspark - posted by anbutech <an...@outlook.com> on 2020/04/08 06:10:18 UTC, 0 replies.
- [Pyspark] - Spark uses all available memory; unrelated to size of dataframe - posted by Daniel Stojanov <ma...@danielstojanov.com> on 2020/04/08 12:58:16 UTC, 2 replies.
- Can you view thread dumps on spark UI if job finished - posted by Ruijing Li <li...@gmail.com> on 2020/04/08 22:47:32 UTC, 3 replies.
- [Spark MLlib]: Multiple input dataframes and non-linear ML pipeline - posted by Qingsheng Ren <re...@gmail.com> on 2020/04/09 08:36:35 UTC, 1 replies.
- Re: Read Hive ACID Managed table in Spark - posted by amogh margoor <am...@gmail.com> on 2020/04/09 17:10:54 UTC, 1 replies.
- Driver pods stuck in running state indefinitely - posted by "Prudhvi Chennuru (CONT)" <pr...@capitalone.com.INVALID> on 2020/04/09 18:44:00 UTC, 1 replies.
- Fwd: How to import PySpark into Jupyter - posted by Yasir Elgohary <yg...@gmail.com> on 2020/04/10 11:04:49 UTC, 1 replies.
- Spark hangs while reading from jdbc - does nothing - posted by Ruijing Li <li...@gmail.com> on 2020/04/10 16:37:05 UTC, 2 replies.
- Re: [External Sender] Re: Driver pods stuck in running state indefinitely - posted by "Prudhvi Chennuru (CONT)" <pr...@capitalone.com.INVALID> on 2020/04/10 17:03:46 UTC, 1 replies.
- Spark Streaming not working - posted by Debabrata Ghosh <ma...@gmail.com> on 2020/04/10 17:34:37 UTC, 8 replies.
- COVID 19 data - posted by jane thorpe <ja...@aol.com.INVALID> on 2020/04/12 19:22:06 UTC, 0 replies.
- covid 19 Data [DISCUSSION] - posted by jane thorpe <ja...@aol.com.INVALID> on 2020/04/12 19:30:03 UTC, 3 replies.
- Spark interrupts S3 request backoff - posted by Lian Jiang <ji...@gmail.com> on 2020/04/13 02:43:07 UTC, 2 replies.
- Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting - posted by jane thorpe <ja...@aol.com.INVALID> on 2020/04/13 07:31:36 UTC, 1 replies.
- What is the best way to take the top N entries from a hive table/data source? - posted by yeikel valdes <em...@yeikel.com> on 2020/04/14 06:35:30 UTC, 4 replies.
- [Spark Core]: Does an executor only cache the partitions it requires for its computations or always the full RDD? - posted by zwithouta <ra...@web.de> on 2020/04/14 10:28:32 UTC, 1 replies.
- Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting - posted by Gabor Somogyi <ga...@gmail.com> on 2020/04/14 11:48:49 UTC, 11 replies.
- Question on writing batch synchronized incremental graph algorithms - posted by Kaan Sancak <ka...@gmail.com> on 2020/04/14 16:57:32 UTC, 0 replies.
- Is there any way to set the location of the history for the spark-shell per session? - posted by Yeikel <em...@yeikel.com> on 2020/04/14 18:20:52 UTC, 3 replies.
- Going it alone. - posted by jane thorpe <ja...@aol.com.INVALID> on 2020/04/14 19:36:50 UTC, 18 replies.
- Cross Region Apache Spark Setup - posted by Stone Zhong <st...@gmail.com> on 2020/04/14 20:31:26 UTC, 2 replies.
- Spark structured streaming - Fallback to earliest offset - posted by Ruijing Li <li...@gmail.com> on 2020/04/14 23:32:42 UTC, 5 replies.
- Save Spark dataframe as dynamic partitioned table in Hive - posted by Mich Talebzadeh <mi...@gmail.com> on 2020/04/15 23:48:18 UTC, 3 replies.
- [Structured Streaming] Checkpoint file compact file grows big - posted by "Ahn, Daniel" <da...@optum.com.INVALID> on 2020/04/16 00:19:24 UTC, 2 replies.
- Question about how parquet files are read and processed - posted by Yeikel <em...@yeikel.com> on 2020/04/16 03:00:32 UTC, 1 replies.
- Spark ORC store written timestamp as column - posted by Manjunath Shetty H <ma...@live.com> on 2020/04/16 04:47:31 UTC, 0 replies.
- wot no toggle ? - posted by jane thorpe <ja...@aol.com.INVALID> on 2020/04/16 06:10:48 UTC, 6 replies.
- How to pass a constant value to a partitioned hive table in spark - posted by Mich Talebzadeh <mi...@gmail.com> on 2020/04/16 07:49:40 UTC, 4 replies.
- Can I run Spark executors in a Hadoop cluster from a Kubernetes container - posted by ma...@gmail.com on 2020/04/16 12:26:49 UTC, 1 replies.
- [Spark SQL] AnalysisException: cannot resolve '`column_name`' given input columns - posted by Joshua Conlin <co...@gmail.com> on 2020/04/16 14:26:29 UTC, 0 replies.
- Spark structured streaming - performance tuning - posted by Srinivas V <sr...@gmail.com> on 2020/04/16 17:19:15 UTC, 3 replies.
- Get Size of a column in Bytes Pyspark Dataframe - posted by anbutech <an...@outlook.com> on 2020/04/16 18:47:22 UTC, 0 replies.
- Understanding spark structured streaming checkpointing system - posted by Ruijing Li <li...@gmail.com> on 2020/04/16 21:08:32 UTC, 2 replies.
- Re: Get Size of a column in Bytes for a Pyspark Dataframe - posted by Yeikel <em...@yeikel.com> on 2020/04/16 21:30:51 UTC, 0 replies.
- Using startingOffsets latest - no data from structured streaming kafka query - posted by Ruijing Li <li...@gmail.com> on 2020/04/17 00:13:43 UTC, 3 replies.
- Memory allocation - posted by Pat Ferrel <pa...@occamsmachete.com> on 2020/04/17 20:07:33 UTC, 1 replies.
- Spark stuck at removing broadcast variable - posted by Alchemist <al...@gmail.com> on 2020/04/18 19:22:11 UTC, 2 replies.
- [Spark SQL] [Beginner] Dataset[Row] collect to driver throw java.io.EOFException: Premature EOF: no length prefix available - posted by "maqy1995@outlook.com" <45...@qq.com> on 2020/04/20 02:32:12 UTC, 0 replies.
- [Spark SQL] issue about diffrence in memory size between DataFrame and RDD - posted by Lyx <11...@qq.com> on 2020/04/20 03:02:50 UTC, 0 replies.
- Re: Typed datataset from Avro generated classes? - posted by Elkhan Dadashov <el...@gmail.com> on 2020/04/20 08:47:48 UTC, 0 replies.
- Using P4J Plugins with Spark - posted by Shashanka Balakuntala <sh...@gmail.com> on 2020/04/21 06:03:04 UTC, 1 replies.
- Spark Structure Streaming | FileStreamSourceLog not deleting list of input files | Spark -2.4.0 - posted by Pappu Yadav <py...@gmail.com> on 2020/04/21 11:23:14 UTC, 1 replies.
- Spark Mongodb connector hangs indefinitely, not working on Amazon EMR - posted by Daniel Stojanov <ma...@danielstojanov.com> on 2020/04/22 02:10:08 UTC, 0 replies.
- is RosckDB backend available in 3.0 preview? - posted by kant kodali <ka...@gmail.com> on 2020/04/22 02:31:35 UTC, 3 replies.
- Re: Deadlock using Barrier Execution - posted by wuyi <yi...@databricks.com> on 2020/04/22 06:25:37 UTC, 0 replies.
- Can I collect Dataset[Row] to driver without converting it to Array [Row]? - posted by maqy <45...@qq.com> on 2020/04/22 08:05:51 UTC, 3 replies.
- 回复: Can I collect Dataset[Row] to driver without converting it to Array [Row]? - posted by maqy <45...@qq.com> on 2020/04/22 08:23:36 UTC, 0 replies.
- [Structured Streaming] Connecting to Kafka via a Custom Consumer / Producer - posted by Patrick McGloin <mc...@gmail.com> on 2020/04/22 09:19:34 UTC, 0 replies.
- 回复: [Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available - posted by maqy <45...@qq.com> on 2020/04/22 11:53:58 UTC, 0 replies.
- Error while reading hive tables with tmp/hidden files inside partitions - posted by Dhrubajyoti Hati <dh...@gmail.com> on 2020/04/22 13:45:27 UTC, 4 replies.
- 回复：[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available - posted by Tang Jinxin <xi...@gmail.com> on 2020/04/22 15:16:35 UTC, 1 replies.
- 回复: Can I collect Dataset[Row] to driver without converting it toArray [Row]? - posted by maqy <45...@qq.com> on 2020/04/22 15:24:28 UTC, 0 replies.
- 回复: 回复：[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available - posted by maqy <45...@qq.com> on 2020/04/22 15:40:42 UTC, 0 replies.
- Spark Adaptive configuration - posted by Tzahi File <tz...@ironsrc.com> on 2020/04/22 16:30:42 UTC, 0 replies.
- pyspark working with a different Python version than the cluster - posted by Odon Copon <od...@gmail.com> on 2020/04/22 17:02:37 UTC, 1 replies.
- 回复：Can I collect Dataset[Row] to driver without converting it toArray [Row]? - posted by Tang Jinxin <xi...@gmail.com> on 2020/04/22 23:31:17 UTC, 0 replies.
- Datasource V2- Heavy Metadata Query - posted by ch...@cmartinit.co.uk on 2020/04/23 06:13:19 UTC, 0 replies.
- 回复: 回复：Can I collect Dataset[Row] to driver without converting it toArray [Row]? - posted by maqy <45...@qq.com> on 2020/04/23 12:17:14 UTC, 0 replies.
- 30000 partitions vs 1000 partitions with Coalescing - posted by dev nan <dn...@gmail.com> on 2020/04/23 18:41:01 UTC, 1 replies.
- Why when writing Parquet files, columns are converted to nullable? - posted by Julien Benoit <ju...@guesttoguest.com> on 2020/04/24 08:03:07 UTC, 0 replies.
- Re: Spark ORC store written timestamp as column - posted by ZHANG Wei <we...@outlook.com> on 2020/04/24 11:12:51 UTC, 0 replies.
- [Structured Streaming] Event-Time ordering of two Kafka topics with different message volumes - posted by eiise <ei...@maibornwolff.de.INVALID> on 2020/04/24 11:25:39 UTC, 0 replies.
- [pyspark] Load a master data file to spark ecosystem - posted by Arjun Chundiran <ar...@gmail.com> on 2020/04/24 15:17:15 UTC, 8 replies.
- Re: [Meta] Moderation request diversion? - posted by Jeff Evans <je...@gmail.com> on 2020/04/24 17:38:35 UTC, 2 replies.
- Watch "Airbus makes more of the sky with Spark - Jesse Anderson & Hassene Ben Salem" on YouTube - posted by Zahid Rahman <za...@gmail.com> on 2020/04/25 03:07:56 UTC, 3 replies.
- Copyright Infringment - posted by Som Lima <so...@gmail.com> on 2020/04/25 13:42:29 UTC, 6 replies.
- Static and dynamic partition loads in Hive table through Spark - posted by Mich Talebzadeh <mi...@gmail.com> on 2020/04/26 12:14:37 UTC, 0 replies.
- Reading Hadoop Archive from Spark - posted by To Quoc Cuong <to...@yahoo.com.INVALID> on 2020/04/27 11:15:30 UTC, 0 replies.
- SparkLauncher reliability and scalability - posted by mhd wrk <mh...@gmail.com> on 2020/04/27 15:38:05 UTC, 1 replies.
- [Announcement] Analytics Zoo 0.8 release - posted by Jason Dai <ja...@gmail.com> on 2020/04/28 02:20:06 UTC, 1 replies.
- [Structured Streaming] NullPointerException in long running query - posted by lec ssmi <sh...@gmail.com> on 2020/04/28 05:52:32 UTC, 5 replies.
- Spark stable release for Hadoop 3 - posted by Piper Spark <pi...@gmail.com> on 2020/04/28 08:14:49 UTC, 0 replies.
- Structured Streaming using Kafka Avro Record in 2.3.0 - posted by HARSH TAKKAR <ta...@gmail.com> on 2020/04/28 11:46:39 UTC, 0 replies.
- Unsubscribe... - posted by Yasir Elgohary <yg...@gmail.com> on 2020/04/28 12:17:56 UTC, 0 replies.
- Converting a date to milliseconds with time zone in Scala - posted by Mich Talebzadeh <mi...@gmail.com> on 2020/04/28 16:15:20 UTC, 7 replies.
- Spark 2.3 and Kafka client library version - posted by "Ahn, Daniel" <da...@optum.com.INVALID> on 2020/04/28 20:53:52 UTC, 1 replies.
- Re: Converting a date to milliseconds with time zone in Scala with fixed date str - posted by Som Lima <so...@gmail.com> on 2020/04/28 22:09:24 UTC, 0 replies.
- - posted by Zeming Yu <ze...@gmail.com> on 2020/04/28 23:01:35 UTC, 0 replies.
- How can I add extra mounted disk to HDFS - posted by Chetan Khatri <ch...@gmail.com> on 2020/04/28 23:18:20 UTC, 2 replies.
- India Most Dangerous : USA Religious Freedom Report - posted by Zahid Amin <ka...@mail.com> on 2020/04/29 06:16:24 UTC, 0 replies.
- Re: India Most Dangerous : USA Religious Freedom Report - posted by Deepak Sharma <de...@gmail.com> on 2020/04/29 06:17:37 UTC, 0 replies.
- Re: India Most Dangerous : USA Religious Freedom Report out Today - posted by Zahid Amin <ka...@mail.com> on 2020/04/29 06:22:35 UTC, 0 replies.
- OFFICIAL USA REPORT TODAY India Most Dangerous : USA Religious Freedom Report out TODAY - posted by Zahid Amin <ka...@mail.com> on 2020/04/29 06:26:00 UTC, 4 replies.
- Filtering on multiple columns in spark - posted by Mich Talebzadeh <mi...@gmail.com> on 2020/04/29 07:45:26 UTC, 6 replies.
- Re: Converting a date to milliseconds with time zone in Scala Eclipse IDE - posted by Som Lima <so...@gmail.com> on 2020/04/29 08:47:29 UTC, 0 replies.
- Spark 3 Release - posted by Michael Edwards <mi...@coreplex.co.uk> on 2020/04/29 10:18:23 UTC, 0 replies.
- Lightbend Scala professional training & certification - posted by Mich Talebzadeh <mi...@gmail.com> on 2020/04/29 10:28:39 UTC, 4 replies.
- On spam messages - posted by Sean Owen <sr...@apache.org> on 2020/04/29 13:18:04 UTC, 1 replies.
- https://www.lausanne.org/content/lga/2019-05/the-rise-of-hindu-fundamentalism?gclid=Cj0KCQjwy6T1BRDXARIsAIqCTXpmVG-8QJwiOSTVH8fkhRXj3QXUufApRXbPJUTpLlZ4f4wWgFNlPVkaAndGEALw_wcB - posted by James Mitchel <ja...@aol.com.INVALID> on 2020/04/29 20:11:56 UTC, 0 replies.
- What is a VPN ? freedom from natzi owen censorship - posted by James Mitchel <ja...@aol.com.INVALID> on 2020/04/29 20:16:41 UTC, 0 replies.
- Trump and modi butcher of Gujarat as Allies. Modi was banned to enter by US courts - posted by James Mitchel <ja...@aol.com.INVALID> on 2020/04/29 20:22:37 UTC, 0 replies.
- Spark job stuck at s3a-file-system metrics system started - posted by Aniruddha P Tekade <at...@binghamton.edu> on 2020/04/29 23:53:49 UTC, 0 replies.
- [Structured Streaminig] multiple queries in one application - posted by lec ssmi <sh...@gmail.com> on 2020/04/30 01:56:07 UTC, 0 replies.
- Left Join at SQL query gets planned as inner join - posted by Roland Johann <ro...@phenetic.io.INVALID> on 2020/04/30 15:06:18 UTC, 4 replies.
- Lockdown since 5th August 2019 10,000,000 Kashmiri by 900,000 Indian Soldiers - posted by Winston Churchill <yh...@appraiser.net> on 2020/04/30 18:55:31 UTC, 0 replies.
- Re: Lockdown since 5th August 2019 10,000,000 Kashmiri by 900,000 Indian Soldiers - posted by Sean Owen <sr...@apache.org> on 2020/04/30 18:58:29 UTC, 0 replies.