user@spark.apache.org, 2018-11

You are viewing a plain text version of this content. The canonical link for it is here.

- Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster? - posted by "Zhang, Yuqi" <Yu...@Teradata.com> on 2018/11/01 00:55:11 UTC, 2 replies.
- Rack Awareness in Spark - posted by RuiyangChen <rc...@illinois.edu> on 2018/11/01 02:30:50 UTC, 0 replies.
- Spark Structured Streaming handles compressed files - posted by Lian Jiang <ji...@gmail.com> on 2018/11/01 03:29:43 UTC, 1 replies.
- How to use Dataset forEachPartion and groupByKey together - posted by Kuttaiah Robin <ku...@gmail.com> on 2018/11/01 06:15:44 UTC, 0 replies.
- Re: Apache Spark orc read performance when reading large number of small files - posted by Jörn Franke <jo...@gmail.com> on 2018/11/01 07:19:54 UTC, 1 replies.
- Re: SIGBUS (0xa) when using DataFrameWriter.insertInto - posted by alexzautke <al...@googlemail.com> on 2018/11/01 08:09:48 UTC, 0 replies.
- Fwd: use spark cluster in java web service - posted by onmstester onmstester <on...@zoho.com.INVALID> on 2018/11/01 08:12:51 UTC, 1 replies.
- StackOverflowError for simple map - posted by Chris Olivier <cj...@apache.org> on 2018/11/01 20:12:55 UTC, 0 replies.
- StackOverflowError for simple map (not to incubator mailing list) - posted by Chris Olivier <cj...@apache.org> on 2018/11/01 20:17:57 UTC, 0 replies.
- Would Spark can read file from S3 which are Client-Side Encrypted KMS–Managed Customer Master Key (CMK) ? - posted by mytramesh <tu...@gmail.com> on 2018/11/01 22:27:35 UTC, 0 replies.
- [PySpark Profiler]: Does empty profile mean no execution in Python Interpreter? - posted by Alex <al...@unexpectedeof.net> on 2018/11/02 03:00:34 UTC, 0 replies.
- Re: how to use cluster sparkSession like localSession - posted by Daniel de Oliveira Mantovani <da...@gmail.com> on 2018/11/02 03:57:54 UTC, 4 replies.
- Spark Listeners for getting dataset partition information in streaming application - posted by Kuttaiah Robin <ku...@gmail.com> on 2018/11/02 08:59:06 UTC, 0 replies.
- Pyspark create RDD of dictionary - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/11/02 14:42:52 UTC, 2 replies.
- Multiply Matrix to it's transpose get undesired output - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/11/02 19:47:55 UTC, 0 replies.
- Is it possible to customize Spark TF-IDF implementation - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/11/02 21:14:03 UTC, 0 replies.
- Read Avro Data using Spark Streaming - posted by Divya Narayan <na...@gmail.com> on 2018/11/03 03:33:26 UTC, 2 replies.
- How to avoid long-running jobs blocking short-running jobs - posted by conner <mi...@gmail.com> on 2018/11/03 09:04:01 UTC, 4 replies.
- Is there any Spark source in Java - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/11/03 17:41:56 UTC, 5 replies.
- Spark 2.4.0 artifact in Maven repository - posted by Bartosz Konieczny <ba...@gmail.com> on 2018/11/04 15:14:08 UTC, 2 replies.
- [Spark SQL] INSERT OVERWRITE to a hive partitioned table (pointing to s3) from spark is too slow. - posted by ehbhaskar <eh...@gmail.com> on 2018/11/05 06:58:13 UTC, 4 replies.
- Shuffle write explosion - posted by Yichen Zhou <zh...@gmail.com> on 2018/11/05 07:41:52 UTC, 3 replies.
- How to use the Graphframe PageRank method with dangling edges? - posted by Alexander Czech <al...@googlemail.com.INVALID> on 2018/11/05 10:20:16 UTC, 0 replies.
- Re: mLIb solving linear regression with sparse inputs - posted by Robineast <Ro...@xense.co.uk> on 2018/11/05 12:08:24 UTC, 0 replies.
- Modifying pyspark sources - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/11/05 13:38:01 UTC, 0 replies.
- Re: Drawing Big Data tech diagrams using Pen Tablets - posted by Mich Talebzadeh <mi...@gmail.com> on 2018/11/05 17:44:35 UTC, 0 replies.
- Equivalent of emptyDataFrame in StructuredStreaming - posted by Arun Manivannan <ar...@arunma.com> on 2018/11/05 23:29:14 UTC, 2 replies.
- [Spark SQL] Couldn't save dataframe with null columns to S3. - posted by ehbhaskar <eh...@gmail.com> on 2018/11/06 01:02:10 UTC, 0 replies.
- SPARK-25959 - Difference in featureImportances results on computed vs saved models - posted by Suraj Nayak <sn...@gmail.com> on 2018/11/07 03:04:00 UTC, 0 replies.
- How to increase the parallelism of Spark Streaming application？ - posted by JF Chen <da...@gmail.com> on 2018/11/07 07:27:48 UTC, 6 replies.
- [Spark-Core] Long scheduling delays (1+ hour) - posted by bsikander <be...@gmail.com> on 2018/11/07 10:08:09 UTC, 4 replies.
- DB2 Sequence - Error while invoking - posted by ☼ R Nair <ra...@gmail.com> on 2018/11/07 13:37:50 UTC, 0 replies.
- How does shuffle operation work in Spark? - posted by Joe <jo...@net2020.org> on 2018/11/07 16:25:16 UTC, 0 replies.
- subscribe - posted by Vein Kong <ga...@yahoo.com.INVALID> on 2018/11/07 22:41:07 UTC, 0 replies.
- Happy Diwali everyone!!! - posted by Xiao Li <ga...@gmail.com> on 2018/11/07 23:09:38 UTC, 0 replies.
- spark 2.2.x - Broadcasthashjoin is not happening even after checkpointing - posted by Nirav Patel <np...@xactlycorp.com> on 2018/11/08 00:12:42 UTC, 0 replies.
- - posted by JF Chen <da...@gmail.com> on 2018/11/08 08:19:10 UTC, 0 replies.
- StorageLevel: OffHeap - posted by Jack Kolokasis <ko...@ics.forth.gr> on 2018/11/08 12:35:32 UTC, 0 replies.
- Is dataframe write blocking? what can be done for fair scheduler? - posted by "ramannanda9@gmail.com" <ra...@gmail.com> on 2018/11/08 15:18:41 UTC, 0 replies.
- Is Dataframe write blocking? - posted by Ramandeep Singh Nanda <ra...@gmail.com> on 2018/11/08 15:38:11 UTC, 0 replies.
- Re: [ANNOUNCE] Announcing Apache Spark 2.4.0 - posted by Wenchen Fan <cl...@gmail.com> on 2018/11/08 18:26:44 UTC, 9 replies.
- The mailing list went down due to the spam server issues - posted by Xiao Li <li...@databricks.com> on 2018/11/08 18:59:40 UTC, 0 replies.
- Spark event logging with s3a - posted by David Hesson <da...@arcadia.io> on 2018/11/08 21:36:12 UTC, 0 replies.
- [Spark on K8s] Scaling experiences sharing - posted by Li Gao <li...@gmail.com> on 2018/11/09 16:26:17 UTC, 0 replies.
- What is BDV in Spark Source - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/11/09 19:06:44 UTC, 1 replies.
- [Spark-SQL] - Creating Hive Metastore Parquet table from Avro schema - posted by pradeepbaji <pr...@gmail.com> on 2018/11/09 19:44:53 UTC, 0 replies.
- Questions on Python support with Spark - posted by Arijit Tarafdar <Ar...@live.com> on 2018/11/09 22:04:08 UTC, 1 replies.
- Scala: The Util is not accessible in def main - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/11/11 10:12:32 UTC, 1 replies.
- writing to local files on a worker - posted by Steve Lewis <lo...@gmail.com> on 2018/11/11 22:13:39 UTC, 4 replies.
- Re: about LIVY-424 - posted by lk_spark <lk...@163.com> on 2018/11/12 06:56:01 UTC, 0 replies.
- FW: Spark2 and Hive metastore - posted by Ирина Шершукова <ir...@gmail.com> on 2018/11/12 08:12:14 UTC, 1 replies.
- question about barrier execution mode in Spark 2.4.0 - posted by Joe <jo...@net2020.org> on 2018/11/12 15:33:35 UTC, 0 replies.
- Bucketing - posted by Sai <sa...@gmail.com> on 2018/11/12 17:43:54 UTC, 0 replies.
- Failed to convert java.sql.Date to String - posted by lu...@china-inv.cn on 2018/11/13 08:48:35 UTC, 0 replies.
- [Spark SQL] Does Spark group small files - posted by Yann Moisan <ya...@gmail.com> on 2018/11/13 20:28:20 UTC, 2 replies.
- inferred schemas for spark streaming from a Kafka source - posted by Colin Williams <co...@gmail.com> on 2018/11/13 20:32:21 UTC, 0 replies.
- [ANNOUNCE] Apache Bahir 2.1.3 Released - posted by Luciano Resende <lr...@apache.org> on 2018/11/13 21:07:01 UTC, 0 replies.
- [ANNOUNCE] Apache Bahir 2.2.2 Released - posted by Luciano Resende <lr...@apache.org> on 2018/11/13 21:08:30 UTC, 0 replies.
- [ANNOUNCE] Apache Toree 0.3.0-incubating Released - posted by Luciano Resende <lr...@apache.org> on 2018/11/13 21:43:42 UTC, 0 replies.
- [SPARK-SQL] Writing partitioned parquet requires huge amounts of memory - posted by "Lienhart, Pierre (DI IZ) - AF (ext)" <pi...@airfrance.fr> on 2018/11/14 15:56:32 UTC, 0 replies.
- [Spark SQL] [Spark 2.4.0] Performance regression when reading parquet files from S3 - posted by Yann Moisan <ya...@gmail.com> on 2018/11/14 20:07:29 UTC, 0 replies.
- Measure Serialization / De-serialization Time - posted by Jack Kolokasis <ko...@ics.forth.gr> on 2018/11/15 13:54:41 UTC, 0 replies.
- [Spark SQL] [Spark 2.4.0] v1 -> struct(v1.e) fails - posted by François Sarradin <fs...@gmail.com> on 2018/11/15 14:46:59 UTC, 1 replies.
- How to address seemingly low core utilization on a spark workload? - posted by Vitaliy Pisarev <vi...@biocatch.com> on 2018/11/15 14:51:14 UTC, 12 replies.
- Testing Apache Spark applications - posted by Om...@sony.com on 2018/11/15 17:44:49 UTC, 3 replies.
- Using columnSimilarity with threshold result in greater than one - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/11/15 17:56:14 UTC, 0 replies.
- Using cosinSimilarity method for getting pairwise documents similarity - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/11/15 18:03:55 UTC, 0 replies.
- spark in jupyter cannot find a class in a jar - posted by Lian Jiang <ji...@gmail.com> on 2018/11/15 23:46:43 UTC, 0 replies.
- Re: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to Case class - posted by "Rico B." <in...@ricobergmann.de> on 2018/11/16 09:04:12 UTC, 0 replies.
- Delta Logic in Spark - posted by Mahender Sarangam <ma...@outlook.com> on 2018/11/17 11:23:49 UTC, 0 replies.
- [Spark Structued Streaming]: Read kafka offset from a timestamp - posted by puneetloya <pu...@gmail.com> on 2018/11/18 00:52:55 UTC, 1 replies.
- CVE-2018-17190: Unsecured Apache Spark standalone executes user code - posted by Sean Owen <sr...@apache.org> on 2018/11/18 15:36:40 UTC, 0 replies.
- streaming pdf - posted by Nicolas Paris <ni...@riseup.net> on 2018/11/18 22:29:00 UTC, 4 replies.
- Spark DataSets and multiple write(.) calls - posted by "Dipl.-Inf. Rico Bergmann" <in...@ricobergmann.de> on 2018/11/19 08:03:32 UTC, 6 replies.
- Pre build for apache 2.4 broken - posted by b-moisson <b-...@legallais.com> on 2018/11/19 09:54:50 UTC, 0 replies.
- Regression of external shuffle service spark 2.3 vs spark 2.2 - posted by "igor.berman" <ig...@gmail.com> on 2018/11/19 12:15:13 UTC, 0 replies.
- exhaustive list of configuration options - posted by Shiyuan <gs...@gmail.com> on 2018/11/20 00:14:56 UTC, 0 replies.
- PySpark Streaming and Secured Kafka. - posted by "Ramaswamy, Muthuraman" <Mu...@viasat.com> on 2018/11/20 01:32:51 UTC, 0 replies.
- Is there any window operation for RDDs in Pyspark? like for DStreams - posted by zakhavan <za...@unm.edu> on 2018/11/20 18:48:32 UTC, 0 replies.
- Monthly Apache Spark Newsletter - posted by Ankur Gupta <an...@outlook.com> on 2018/11/21 03:30:39 UTC, 0 replies.
- spark-sql force parallel union - posted by onmstester onmstester <on...@zoho.com.INVALID> on 2018/11/21 04:34:09 UTC, 3 replies.
- Structured Streaming to file sink results in illegal state exception - posted by Magnus Nilsson <ma...@kth.se> on 2018/11/21 09:02:48 UTC, 0 replies.
- Structured Streaming restart results in illegal state exception - posted by Magnus Nilsson <ma...@gmail.com> on 2018/11/21 12:21:58 UTC, 0 replies.
- Spark 2.3.0 with HDP Got completely successfully but status FAILED with error - posted by Chetan Khatri <ch...@gmail.com> on 2018/11/21 18:38:03 UTC, 0 replies.
- Casting nested columns and updated nested struct fields. - posted by Colin Williams <co...@gmail.com> on 2018/11/22 02:25:49 UTC, 2 replies.
- How to Keep Null values in Parquet - posted by Chetan Khatri <ch...@gmail.com> on 2018/11/22 02:29:24 UTC, 2 replies.
- spark2.3.2 "load data inpath /hrds/tablename-* " can't use * for A class of files - posted by yutaochina <hd...@163.com> on 2018/11/22 02:40:50 UTC, 1 replies.
- Show function name in Logs for PythonUDFRunner - posted by Abdeali Kothari <ab...@gmail.com> on 2018/11/22 09:04:10 UTC, 2 replies.
- [Spark ORC | SQL | Hive] Buffer size too small when using filterPushdown predicate=True (ref.: SPARK-25145) - posted by Bjørnar Jensen <bj...@norceresearch.no> on 2018/11/23 10:36:38 UTC, 0 replies.
- Zookeeper and Spark deployment for standby master - posted by Akila Wajirasena <ak...@gmail.com> on 2018/11/26 06:25:09 UTC, 1 replies.
- Encoding not working when using a map / mapPartitions call - posted by ccaspanello <cc...@gmail.com> on 2018/11/26 17:15:10 UTC, 0 replies.
- Re: [Spark SQL]: Does Spark SQL 2.3+ suppor UDT? - posted by Suny Tyagi <su...@gmail.com> on 2018/11/26 17:41:37 UTC, 1 replies.
- Spark column combinations and combining multiple dataframes (pyspark) - posted by Christopher Petrino <ch...@gmail.com> on 2018/11/26 17:55:29 UTC, 0 replies.
- Spark Streaming - posted by Siva Samraj <sa...@gmail.com> on 2018/11/27 05:44:35 UTC, 2 replies.
- Spark Streaming join taking long to process - posted by Abhijeet Kumar <ab...@sentienz.com> on 2018/11/27 08:15:31 UTC, 3 replies.
- Re: Job hangs in blocked task in final parquet write stage - posted by Conrad Lee <co...@parsely.com> on 2018/11/27 11:29:01 UTC, 5 replies.
- Re: Upgrading spark history server, no logs showing. - posted by bbarks <br...@pm.me> on 2018/11/27 18:57:41 UTC, 0 replies.
- PySpark Direct Streaming : SASL Security Compatibility Issue - posted by "Ramaswamy, Muthuraman" <Mu...@viasat.com> on 2018/11/28 03:15:05 UTC, 0 replies.
- spark unsupported conversion to Stringtype error - posted by JF Chen <da...@gmail.com> on 2018/11/28 07:32:07 UTC, 0 replies.
- Re: PySpark Direct Streaming : SASL Security Compatibility Issue - posted by Gabor Somogyi <ga...@gmail.com> on 2018/11/28 09:11:56 UTC, 0 replies.
- Do we need to kill a spark job every time we change and deploy it? - posted by Mina Aslani <as...@gmail.com> on 2018/11/28 18:44:10 UTC, 1 replies.
- Spark streaming join on yarn - posted by Abhijeet Kumar <ab...@sentienz.com> on 2018/11/28 22:25:47 UTC, 1 replies.
- Java: pass parameters in spark sql query - posted by Mann Du <ma...@gmail.com> on 2018/11/28 23:55:01 UTC, 1 replies.
- Caused by: java.io.NotSerializableException: com.softwaremill.sttp.FollowRedirectsBackend - posted by James Starks <su...@protonmail.com.INVALID> on 2018/11/29 14:45:24 UTC, 3 replies.
- Spark 2.4.0 worker can't find work/app/folderNo directory for logs - posted by flyingmeatball <ka...@gmail.com> on 2018/11/29 19:03:21 UTC, 1 replies.
- Spark and Zookeeper HA failures - posted by Mark Bidewell <mb...@gmail.com> on 2018/11/30 01:44:53 UTC, 0 replies.
- Convert RDD[Iterrable[MyCaseClass]] to RDD[MyCaseClass] - posted by James Starks <su...@protonmail.com.INVALID> on 2018/11/30 14:02:03 UTC, 0 replies.
- 回复：Do we need to kill a spark job every time we change and deploy it? - posted by 965 <10...@qq.com> on 2018/11/30 14:31:51 UTC, 0 replies.
- 回复：Java: pass parameters in spark sql query - posted by 965 <10...@qq.com> on 2018/11/30 14:39:16 UTC, 0 replies.