You are viewing a plain text version of this content. The canonical link for it is here.
- Re: Spark SQL API taking longer time than DF API. - posted by neeraj bhadani <bh...@gmail.com> on 2019/04/01 08:44:09 UTC, 3 replies.
- Re: [spark sql performance] Only 1 executor to write output? - posted by Mike Chan <mi...@gmail.com> on 2019/04/01 09:12:19 UTC, 0 replies.
- Re: Spark Kafka Batch Write guarantees - posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com> on 2019/04/01 16:13:10 UTC, 1 replies.
- Re: Understanding State Store storage behavior for the Stream Deduplication function - posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com> on 2019/04/01 16:20:32 UTC, 2 replies.
- Re: Spark streaming error - Query terminated with exception: assertion failed: Invalid batch: a#660,b#661L,c#662,d#663,,… 26 more fields != b#1291L - posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com> on 2019/04/01 16:26:40 UTC, 0 replies.
- [Spark ML] [Pyspark] [Scenario Beginner] [Level Beginner] - posted by Steve Pruitt <bp...@opentext.com> on 2019/04/01 16:39:19 UTC, 1 replies.
- MLLIB , Does Spark support Canopy Clustering ? - posted by Alok Bhandari <al...@gmail.com> on 2019/04/02 12:57:35 UTC, 0 replies.
- Load Time from HDFS - posted by Jack Kolokasis <ko...@ics.forth.gr> on 2019/04/02 14:06:30 UTC, 2 replies.
- Issues with Spark Streaming checkpointing of Kafka topic content - posted by Dmitry Goldenberg <dg...@kmwllc.com> on 2019/04/02 15:39:15 UTC, 1 replies.
- Re: How to extract data in parallel from RDBMS tables - posted by "Surendra , Manchikanti" <su...@gmail.com> on 2019/04/02 18:07:58 UTC, 1 replies.
- Logging DataFrame API pipelines - posted by Magnus Nilsson <ma...@gmail.com> on 2019/04/02 22:43:17 UTC, 0 replies.
- Re: Upcoming talks on BigDL and Analytics Zoo this week - posted by Jason Dai <ja...@gmail.com> on 2019/04/03 13:21:45 UTC, 0 replies.
- CfP VHPC19: HPC Virtualization-Containers: Paper due May 1, 2019 (extended) - posted by VHPC 19 <vh...@gmail.com> on 2019/04/03 16:38:00 UTC, 0 replies.
- Question about relationship between number of files and initial tasks(partitions) - posted by Arthur Li <ar...@flipp.com> on 2019/04/04 01:37:08 UTC, 4 replies.
- dropDuplicate on timestamp based column unexpected output - posted by Chetan Khatri <ch...@gmail.com> on 2019/04/04 04:51:53 UTC, 9 replies.
- Why "spark-streaming-kafka-0-10" is still experimental? - posted by Doaa Medhat <do...@gmail.com> on 2019/04/04 07:52:01 UTC, 0 replies.
- pickling a udf - posted by Adaryl Wakefield <ad...@hotmail.com> on 2019/04/04 10:11:59 UTC, 2 replies.
- Why does this spark-shell invocation get suspended due to tty output? - posted by Jeff Evans <je...@gmail.com> on 2019/04/04 16:21:11 UTC, 0 replies.
- reporting use case - posted by Prasad Bhalerao <pr...@gmail.com> on 2019/04/04 18:48:11 UTC, 4 replies.
- Re: Re: reporting use case - posted by "Hall, Steven" <St...@nike.com> on 2019/04/04 20:38:29 UTC, 0 replies.
- Qn about decision tree apache spark java - posted by Serena S Yuan <su...@gmail.com> on 2019/04/04 21:36:20 UTC, 1 replies.
- Re: How shall I configure the Spark executor memory size and the Alluxio worker memory size on a machine? - posted by Bin Fan <fa...@gmail.com> on 2019/04/05 04:29:28 UTC, 1 replies.
- [ANNOUNCE] Announcing Apache Spark 2.4.1 - posted by DB Tsai <db...@dbtsai.com.INVALID> on 2019/04/05 05:59:03 UTC, 0 replies.
- combineByKey - posted by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2019/04/05 07:11:18 UTC, 3 replies.
- Is there any spark API function to handle a group of companies at once in this scenario? - posted by Shyam P <sh...@gmail.com> on 2019/04/05 09:50:59 UTC, 4 replies.
- Checking if cascading graph computation is possible in Spark - posted by Basavaraj <ra...@gmail.com> on 2019/04/05 11:35:46 UTC, 1 replies.
- Re: Checking if cascading graph computation is possible in Spark - posted by Jason Nerothin <ja...@gmail.com> on 2019/04/05 16:43:35 UTC, 2 replies.
- How to retrieve multiple columns values (in one row) to variables in Spark Scala method - posted by Mich Talebzadeh <mi...@gmail.com> on 2019/04/05 19:28:52 UTC, 2 replies.
- writing into oracle database is very slow - posted by Lian Jiang <ji...@gmail.com> on 2019/04/06 14:59:36 UTC, 6 replies.
- Observing DAGScheduler Log Messages - posted by M Bilal <mb...@gmail.com> on 2019/04/07 16:04:41 UTC, 2 replies.
- Spark driver crashed with internal error - posted by Manu Zhang <ow...@gmail.com> on 2019/04/08 03:00:05 UTC, 0 replies.
- Parallelize Join Problem - posted by Pa...@telekom.de on 2019/04/08 15:41:09 UTC, 2 replies.
- Re: spark-sklearn - posted by Sudhir Babu Pothineni <sb...@gmail.com> on 2019/04/08 18:43:06 UTC, 3 replies.
- - posted by Siddharth Reddy <si...@gmail.com> on 2019/04/08 18:53:51 UTC, 0 replies.
- Spark2: Deciphering saving text file name - posted by Subash Prabakar <su...@gmail.com> on 2019/04/09 00:54:43 UTC, 1 replies.
- Structured streaming flatMapGroupWithState results out of order messages when reading from Kafka - posted by Akila Wajirasena <ak...@gmail.com> on 2019/04/09 09:37:02 UTC, 2 replies.
- Refresh parquet metadata on Spark Thrift Server - posted by Tomasz Krol <pa...@gmail.com> on 2019/04/09 16:05:13 UTC, 0 replies.
- Unable to broadcast a very large variable - posted by V0lleyBallJunki3 <ve...@gmail.com> on 2019/04/10 09:06:31 UTC, 6 replies.
- How to print DataFrame.show(100) to text file at HDFS - posted by Chetan Khatri <ch...@gmail.com> on 2019/04/13 13:10:13 UTC, 4 replies.
- Offline state manipulation tool for structured streaming query - posted by Jungtaek Lim <ka...@gmail.com> on 2019/04/13 14:13:35 UTC, 0 replies.
- ApacheCon NA 2019 Call For Proposal and help promoting Spark project - posted by Felix Cheung <fe...@hotmail.com> on 2019/04/13 16:50:27 UTC, 1 replies.
- Best Practice for Writing data into a Hive table - posted by Debabrata Ghosh <ma...@gmail.com> on 2019/04/13 16:59:48 UTC, 1 replies.
- --jars vs --spark.executor.extraClassPath vs --spark.driver.extraClassPath - posted by rajat kumar <ku...@gmail.com> on 2019/04/15 04:01:58 UTC, 3 replies.
- How to speedup your Spark ML training - posted by ch...@inaccel.com on 2019/04/15 12:21:57 UTC, 0 replies.
- [GraphX] Preserving Partitions when reading from HDFS - posted by M Bilal <mb...@gmail.com> on 2019/04/15 15:28:22 UTC, 2 replies.
- JvmPauseMonitor - posted by Eugene Koifman <eu...@workday.com> on 2019/04/15 16:52:22 UTC, 1 replies.
- K8s-Spark client mode : Executor image not able to download application jar from driver - posted by Nikhil Chinnapa <ni...@renovite.com> on 2019/04/16 08:20:19 UTC, 4 replies.
- Fwd: Issue with spark while reading from avro file - posted by Prateek Rajput <pr...@flipkart.com.INVALID> on 2019/04/16 11:38:22 UTC, 1 replies.
- Reading RDD by (key, data) from s3 - posted by Gorka Bravo Martinez <go...@cern.ch> on 2019/04/16 12:47:53 UTC, 1 replies.
- How to use same SparkSession in another app? - posted by Rishikesh Gawade <ri...@gmail.com> on 2019/04/16 17:57:21 UTC, 1 replies.
- Dynamic executor scaling spark/Kubernetes - posted by purna pradeep <pu...@gmail.com> on 2019/04/16 21:20:24 UTC, 0 replies.
- Re: [External Sender] How to use same SparkSession in another app? - posted by Femi Anthony <ol...@capitalone.com> on 2019/04/17 02:56:53 UTC, 0 replies.
- An alternative logic to collaborative filtering works fine but we are facing run time issues in executing the job - posted by Balakumar iyer S <ba...@gmail.com> on 2019/04/17 04:12:47 UTC, 1 replies.
- Boto3 library send to pyspark - posted by Gorka Bravo Martinez <go...@cern.ch> on 2019/04/17 07:11:10 UTC, 4 replies.
- Spark job running for long time - posted by rajat kumar <ku...@gmail.com> on 2019/04/17 13:22:16 UTC, 4 replies.
- Re: cache table vs. parquet table performance - posted by Bin Fan <fa...@gmail.com> on 2019/04/18 05:34:40 UTC, 0 replies.
- autoBroadcastJoinThreshold not working as expected - posted by Mike Chan <mi...@gmail.com> on 2019/04/18 09:44:45 UTC, 2 replies.
- [Spark SQL]: Slow insertInto overwrite if target table has many partitions - posted by Juho Autio <ju...@rovio.com> on 2019/04/18 13:45:38 UTC, 8 replies.
- Difference between Checkpointing and Persist - posted by Subash Prabakar <su...@gmail.com> on 2019/04/18 17:49:12 UTC, 3 replies.
- Spark-submit and no java log file generated - posted by Mann Du <ma...@gmail.com> on 2019/04/18 23:20:10 UTC, 0 replies.
- BigDL and Analytics Zoo talks at upcoming Spark+AI Summit and Strata London - posted by Jason Dai <ja...@gmail.com> on 2019/04/18 23:35:36 UTC, 1 replies.
- Re: Error: NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT while running a Spark-Hive Job - posted by rajiv shah <ra...@gigaspaces.com> on 2019/04/19 18:05:24 UTC, 0 replies.
- Not able to convert Image binary to an image - posted by swastik mittal <sm...@ncsu.edu> on 2019/04/20 04:51:35 UTC, 0 replies.
- Feature engineering ETL for machine learning - posted by Subash Prabakar <su...@gmail.com> on 2019/04/20 14:13:02 UTC, 0 replies.
- toDebugString - RDD Logical Plan - posted by kanchan tewary <ka...@gmail.com> on 2019/04/20 17:40:07 UTC, 2 replies.
- repartition in df vs partitionBy in df - posted by "kumar.rajat20del" <ku...@gmail.com> on 2019/04/20 18:48:46 UTC, 5 replies.
- How to execute non-timestamp-based aggregations in spark structured streaming? - posted by Stephen Boesch <ja...@gmail.com> on 2019/04/20 21:17:37 UTC, 1 replies.
- Re: Difference between 'cores' config params: spark submit on k8s - posted by Li Gao <li...@gmail.com> on 2019/04/20 21:43:33 UTC, 0 replies.
- Writing to Aerospike from Spark with bulk write with user authentication fails - posted by Mich Talebzadeh <mi...@gmail.com> on 2019/04/21 10:30:00 UTC, 1 replies.
- Usage of Explicit Future in Spark program - posted by Chetan Khatri <ch...@gmail.com> on 2019/04/21 18:58:00 UTC, 0 replies.
- Use derived column for other derived column in the same statement - posted by Rishi Shah <ri...@gmail.com> on 2019/04/22 03:15:52 UTC, 2 replies.
- Structured Streaming initialized with cached data or others - posted by "shicheng31604@gmail.com" <sh...@gmail.com> on 2019/04/22 10:00:59 UTC, 1 replies.
- Connecting to Spark cluster remotely - posted by Rishikesh Gawade <ri...@gmail.com> on 2019/04/22 14:52:43 UTC, 2 replies.
- Update / Delete records in Parquet - posted by Chetan Khatri <ch...@gmail.com> on 2019/04/22 19:01:37 UTC, 3 replies.
- Spark LogisticRegression got stuck on dataset with millions of columns - posted by Qian He <hq...@gmail.com> on 2019/04/23 00:02:45 UTC, 3 replies.
- can't download 2.4.1 sourcecode - posted by yutaochina <hd...@163.com> on 2019/04/23 03:54:31 UTC, 2 replies.
- spark 2.4.1 -> 3.0.0-SNAPSHOT mllib - posted by Koert Kuipers <ko...@tresata.com> on 2019/04/23 22:38:03 UTC, 0 replies.
- spark stddev() giving '?' as output how to handle it ? i.e replace null/0 - posted by Shyam P <sh...@gmail.com> on 2019/04/24 06:28:03 UTC, 1 replies.
- Handle empty partitions in pyspark - posted by kanchan tewary <ka...@gmail.com> on 2019/04/24 06:31:30 UTC, 0 replies.
- DataFrameWriter does not adjust spark.sql.session.timeZone offset while writing orc files - posted by Shubham Chaurasia <sh...@gmail.com> on 2019/04/24 10:29:42 UTC, 3 replies.
- Handle Null Columns in Spark Structured Streaming Kafka - posted by SNEHASISH DUTTA <in...@gmail.com> on 2019/04/24 14:24:23 UTC, 4 replies.
- 'No plan for EventTimeWatermark' error while using structured streaming with column pruning (spark 2.3.1) - posted by kineret M <ki...@gmail.com> on 2019/04/24 14:30:53 UTC, 0 replies.
- Is it possible to obtain the full command to be invoked by SparkLauncher? - posted by Jeff Evans <je...@gmail.com> on 2019/04/24 20:54:26 UTC, 4 replies.
- RDD vs Dataframe & when to persist - posted by Rishi Shah <ri...@gmail.com> on 2019/04/25 02:50:05 UTC, 0 replies.
- [pyspark] Use output of one aggregated function for another aggregated function within the same groupby - posted by Rishi Shah <ri...@gmail.com> on 2019/04/25 03:07:22 UTC, 1 replies.
- Different query result between spark thrift server and spark-shell - posted by Jun Zhu <ju...@vungle.com.INVALID> on 2019/04/25 09:00:03 UTC, 1 replies.
- Re: unsubscribe - posted by Song Yang <so...@gmail.com> on 2019/04/27 11:23:09 UTC, 3 replies.
- This MapR-DB Spark Connector with Secondary Indexes - posted by Mich Talebzadeh <mi...@gmail.com> on 2019/04/27 16:33:16 UTC, 0 replies.
- Spark SQL met "Block broadcast_xxx not found" - posted by Xilang Yan <xi...@gmail.com> on 2019/04/28 02:55:29 UTC, 0 replies.
- spark hive concurrency - posted by CPC <ac...@gmail.com> on 2019/04/29 08:45:34 UTC, 1 replies.
- Getting EOFFileException while reading from sequence file in spark - posted by Prateek Rajput <pr...@flipkart.com.INVALID> on 2019/04/29 08:56:57 UTC, 2 replies.
- Spark 2.4.1 on Kubernetes - DNS resolution of driver fails - posted by Olivier Girardot <o....@lateral-thoughts.com> on 2019/04/29 12:42:45 UTC, 0 replies.
- handling skewness issues - posted by rajat kumar <ku...@gmail.com> on 2019/04/29 16:33:11 UTC, 0 replies.
- Issue with offset management using Spark on Dataproc - posted by Austin Weaver <au...@flyrlabs.com> on 2019/04/29 17:04:25 UTC, 4 replies.
- Re: [EXT] handling skewness issues - posted by Michael Mansour <Mi...@symantec.com> on 2019/04/29 21:13:22 UTC, 1 replies.
- Anaconda installation with Pyspark on cloudera managed server - posted by Rishi Shah <ri...@gmail.com> on 2019/04/30 04:21:23 UTC, 0 replies.
- Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server - posted by Rishi Shah <ri...@gmail.com> on 2019/04/30 04:31:46 UTC, 1 replies.
- spark.sql.hive.exec.dynamic.partition description - posted by Mike Chan <mi...@gmail.com> on 2019/04/30 05:03:02 UTC, 0 replies.
- Koalas show data in IDE or pyspark - posted by Achilleus 003 <ac...@gmail.com> on 2019/04/30 06:45:58 UTC, 1 replies.
- Fwd: How to specify number of Partition using newAPIHadoopFile() - posted by Vatsal Patel <va...@flipkart.com.INVALID> on 2019/04/30 13:20:25 UTC, 1 replies.
- Spark Structured Streaming | Highly reliable de-duplication strategy - posted by Akshay Bhardwaj <ak...@gmail.com> on 2019/04/30 14:00:28 UTC, 0 replies.
- Turning off Jetty Http Options Method - posted by Ankit Jain <an...@gmail.com> on 2019/04/30 20:31:16 UTC, 2 replies.
- Best notebook for developing for apache spark using scala on Amazon EMR Cluster - posted by V0lleyBallJunki3 <ve...@gmail.com> on 2019/04/30 21:26:50 UTC, 0 replies.