user@spark.apache.org, 2020-02

You are viewing a plain text version of this content. The canonical link for it is here.

- [Structured Streaming] Domain data refresh with flatMapGroupsWithState - posted by Ashutosh Joshi <jo...@gmail.com> on 2020/02/01 03:19:04 UTC, 0 replies.
- Re: Extract value from streaming Dataframe to a variable - posted by Jungtaek Lim <ka...@gmail.com> on 2020/02/03 02:21:28 UTC, 0 replies.
- Re: How to prevent and track data loss/dropped due to watermark during structure streaming aggregation - posted by Jungtaek Lim <ka...@gmail.com> on 2020/02/03 03:06:40 UTC, 1 replies.
- Best way to read batch from Kafka and Offsets - posted by Ruijing Li <li...@gmail.com> on 2020/02/03 08:38:40 UTC, 12 replies.
- shuffle mathematic formulat - posted by asma zgolli <zg...@gmail.com> on 2020/02/04 11:57:35 UTC, 1 replies.
- Committer to use if "spark.sql.sources.partitionOverwriteMode": 'dynamic' - posted by edge7 <e....@live.com> on 2020/02/04 16:39:35 UTC, 0 replies.
- Data locality - posted by Karthik Srinivas <ka...@gmail.com> on 2020/02/05 00:33:36 UTC, 0 replies.
- [ spark-streaming ] - Data Locality issue - posted by Karthik Srinivas <ka...@gmail.com> on 2020/02/05 04:59:06 UTC, 0 replies.
- SparkAppHandle can not stop application in yarn client mode - posted by Zhang Victor <zh...@outlook.com> on 2020/02/05 09:02:37 UTC, 0 replies.
- subscribe - posted by Cool Joe <zh...@gmail.com> on 2020/02/06 01:35:46 UTC, 0 replies.
- dataframe null safe joins given a list of columns - posted by Marcelo Valle <ma...@ktech.com> on 2020/02/06 12:45:11 UTC, 0 replies.
- Spark Application Dynamic IP and Path assign. - posted by Vijayant Kumar <Vi...@mavenir.com.INVALID> on 2020/02/07 11:06:51 UTC, 0 replies.
- [ANNOUNCE] Announcing Apache Spark 2.4.5 - posted by Dongjoon Hyun <do...@gmail.com> on 2020/02/09 01:22:45 UTC, 5 replies.
- Ceph / Lustre VS hdfs comparison - posted by Nicolas PARIS <ni...@riseup.net> on 2020/02/12 13:31:04 UTC, 0 replies.
- Questions about count() performance with dataframes and parquet files - posted by Ashley Hoff <as...@gmail.com> on 2020/02/13 04:08:22 UTC, 8 replies.
- Re: Start a standalone server as root and use it with user accounts - posted by WranglingData <as...@gmail.com> on 2020/02/13 04:51:39 UTC, 0 replies.
- Environment variable for deleting .sparkStaging - posted by Debabrata Ghosh <ma...@gmail.com> on 2020/02/13 13:06:11 UTC, 1 replies.
- Spark 2.4.4 with Hive 2.3.6 - posted by Vinod Kancharana <vi...@gmail.com> on 2020/02/14 17:09:35 UTC, 0 replies.
- Spark 2.4.4 has bigger memory impact than 2.3? - posted by Ruijing Li <li...@gmail.com> on 2020/02/15 15:37:33 UTC, 0 replies.
- Connected components using GraphFrames is significantly slower than GraphX? - posted by kant kodali <ka...@gmail.com> on 2020/02/16 11:41:06 UTC, 0 replies.
- Apache Arrow support for Apache Spark - posted by Subash Prabakar <su...@gmail.com> on 2020/02/17 07:41:05 UTC, 1 replies.
- [ML] [How-to]: How to unload the loaded W2V model in Pyspark? - posted by Zhefu PENG <pe...@gmail.com> on 2020/02/17 13:10:49 UTC, 0 replies.
- Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods - posted by Mich Talebzadeh <mi...@gmail.com> on 2020/02/17 14:43:30 UTC, 9 replies.
- Better way to debug serializable issues - posted by Ruijing Li <li...@gmail.com> on 2020/02/18 10:01:59 UTC, 2 replies.
- unsubscribe - posted by ju...@free.fr on 2020/02/19 13:40:26 UTC, 2 replies.
- Spark Streaming job having issue with Java Flight Recorder (JFR) - posted by Pramod Biligiri <pr...@gmail.com> on 2020/02/20 12:12:28 UTC, 0 replies.
- CBO not working? - posted by Aelur Sadgod <ae...@gmail.com> on 2020/02/20 13:36:51 UTC, 1 replies.
- Integration testing Framework Spark SQL Scala - posted by Ruijing Li <li...@gmail.com> on 2020/02/21 02:09:10 UTC, 1 replies.
- Spark RDD ouput path for data lineage - posted by ard3nte <g....@gmail.com> on 2020/02/21 09:40:40 UTC, 0 replies.
- Serialization error when using scala kernel with Jupyter - posted by Nikhil Goyal <no...@gmail.com> on 2020/02/21 19:28:32 UTC, 1 replies.
- PowerIterationClustering - posted by Monish R <mo...@gmail.com> on 2020/02/22 00:48:46 UTC, 0 replies.
- Does dataframe spark API write/create a single file instead of directory as a result of write operation. - posted by Kshitij <ks...@gmail.com> on 2020/02/22 05:20:26 UTC, 5 replies.
- [Spark SQL] NegativeArraySizeException When Parse InternalRow to DTO Field with Type Array[String] - posted by "Proust (Feng Guizhou) [Travel Search & Discovery]" <pf...@coupang.com> on 2020/02/23 08:08:41 UTC, 3 replies.
- setting initial state for mapGroupsWithState - posted by dpristin <dp...@gmail.com> on 2020/02/24 15:41:22 UTC, 0 replies.
- [SPARK Dependencies] Security Vulnerability with Xerces version < 2.12 - posted by Anthony PONCET <an...@gmail.com> on 2020/02/24 17:05:07 UTC, 0 replies.
- [Spark SQL] Memory problems with packing too many joins into the same WholeStageCodegen - posted by Jianneng Li <ji...@workday.com> on 2020/02/25 02:15:36 UTC, 5 replies.
- What options do I have to handle third party classes that are not serializable? - posted by yeikel valdes <em...@yeikel.com> on 2020/02/25 16:23:25 UTC, 0 replies.
- Re: What options do I have to handle third party classes that are not serializable? - posted by Jeff Evans <je...@gmail.com> on 2020/02/25 16:25:26 UTC, 0 replies.
- Standard practices for building dashboards for spark processed data - posted by Aniruddha P Tekade <at...@binghamton.edu> on 2020/02/26 01:23:33 UTC, 2 replies.
- [SPARK-30957][SQL] Null-safe variant of Dataset.join(Dataset[_], Seq[String]) - posted by Enrico Minack <ma...@Enrico.Minack.dev> on 2020/02/26 09:07:31 UTC, 0 replies.
- Re: [External Email] Re: Standard practices for building dashboards for spark processed data - posted by Aniruddha P Tekade <at...@binghamton.edu> on 2020/02/26 19:16:35 UTC, 0 replies.
- Spark join: grouping of records having same value for a particular column in same partition - posted by ARAVIND ARUMUGHAM SETHURATHNAM <as...@vrbo.com.INVALID> on 2020/02/26 21:54:25 UTC, 0 replies.
- dropDuplicates and watermark in structured streaming - posted by lec ssmi <sh...@gmail.com> on 2020/02/27 10:30:10 UTC, 4 replies.
- Unsubscribe - posted by Phillip Pienaar <ph...@gmail.com> on 2020/02/27 10:31:05 UTC, 0 replies.
- Convert each partition of RDD to Dataframe - posted by Manjunath Shetty H <ma...@live.com> on 2020/02/27 13:29:07 UTC, 7 replies.
- Spark Streaming: Aggregating values across batches - posted by Something Something <ma...@gmail.com> on 2020/02/27 23:17:18 UTC, 1 replies.
- Compute the Hash of each row in new column - posted by Chetan Khatri <ch...@gmail.com> on 2020/02/28 12:56:10 UTC, 2 replies.
- Structured Streaming: mapGroupsWithState UDT serialization does not work - posted by Bryan Jeffrey <br...@gmail.com> on 2020/02/28 14:39:09 UTC, 9 replies.
- Pyspark Convert Struct Type to Map Type - posted by anbutech <an...@outlook.com> on 2020/02/28 16:29:36 UTC, 0 replies.
- Aggregating values by a key field in Spark Streaming - posted by Something Something <ma...@gmail.com> on 2020/02/28 21:59:11 UTC, 0 replies.
- In Spark Streaming, Direct Kafak Consumers are not evenly distrubuted across executors - posted by Hrishikesh Mishra <sd...@gmail.com> on 2020/02/29 04:05:19 UTC, 0 replies.
- configuration error - posted by Zahid Rahman <za...@gmail.com> on 2020/02/29 17:23:38 UTC, 1 replies.
- setup pom.xml - posted by Zahid Rahman <za...@gmail.com> on 2020/02/29 23:04:52 UTC, 0 replies.