You are viewing a plain text version of this content. The canonical link for it is here.
- Use SparkContext in Web Application - posted by Girish Vasmatkar <gi...@hotwaxsystems.com> on 2018/10/01 06:48:48 UTC, 5 replies.
- Unable to read multiple JSON.Gz File. - posted by Mahender Sarangam <ma...@outlook.com> on 2018/10/01 08:59:40 UTC, 2 replies.
- Re: Time-Series Forecasting - posted by Mina Aslani <as...@gmail.com> on 2018/10/01 14:16:03 UTC, 0 replies.
- Re: Pyspark Partitioning - posted by Gourav Sengupta <go...@gmail.com> on 2018/10/01 16:22:19 UTC, 2 replies.
- Re: BroadcastJoin failed on partitioned parquet table - posted by Wenchen Fan <cl...@gmail.com> on 2018/10/02 02:37:03 UTC, 0 replies.
- How to do sliding window operation on RDDs in Pyspark? - posted by zakhavan <za...@unm.edu> on 2018/10/02 16:30:27 UTC, 4 replies.
- Re: CSV parser - how to parse column containing json data - posted by Nirav Patel <np...@xactlycorp.com> on 2018/10/02 20:59:49 UTC, 1 replies.
- How to read remote HDFS from Spark using username? - posted by Aakash Basu <aa...@gmail.com> on 2018/10/03 07:02:43 UTC, 3 replies.
- Restarting a failed Spark streaming job running on top of a yarn cluster - posted by jcgarciam <jc...@gmail.com> on 2018/10/03 12:21:28 UTC, 0 replies.
- Back to SQL - posted by Olivier Girardot <o....@lateral-thoughts.com> on 2018/10/03 17:41:08 UTC, 1 replies.
- How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2? - posted by kant kodali <ka...@gmail.com> on 2018/10/03 23:37:15 UTC, 1 replies.
- Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed - posted by Jianshi Huang <ji...@gmail.com> on 2018/10/04 09:19:29 UTC, 9 replies.
- Re: java.lang.IllegalArgumentException: requirement failed: BLAS.dot(x: Vector, y:Vector) was given Vectors with non-matching sizes - posted by hager <lo...@yahoo.com> on 2018/10/04 14:53:55 UTC, 0 replies.
- subscribe - posted by Sushil Chaudhary <su...@capitalone.com> on 2018/10/04 18:48:18 UTC, 0 replies.
- Spark 2.3.1 leaves _temporary dir back on s3 even after write to s3 is done. - posted by "sushil.chaudhary" <su...@capitalone.com> on 2018/10/04 20:01:41 UTC, 0 replies.
- PySpark structured streaming job throws socket exception - posted by mmuru <mm...@gmail.com> on 2018/10/04 20:59:39 UTC, 2 replies.
- Where is the DAG stored before catalyst gets it? - posted by Jean Georges Perrin <jg...@jgp.net> on 2018/10/04 22:36:16 UTC, 1 replies.
- Recreate Dataset from list of Row in spark streaming application. - posted by Kuttaiah Robin <ku...@gmail.com> on 2018/10/05 13:22:44 UTC, 2 replies.
- Unsubscribe - posted by Donni Khan <pr...@googlemail.com.INVALID> on 2018/10/05 14:36:34 UTC, 0 replies.
- [PySpark join] Resolved attribute(s) missing from... Attribute(s) with the same name appear in the operation - posted by "Buckler, Christine" <Ch...@nordstrom.com> on 2018/10/05 19:59:06 UTC, 0 replies.
- Re: error in job - posted by Muthu Jayakumar <ba...@gmail.com> on 2018/10/06 12:17:50 UTC, 0 replies.
- Executor hang - posted by 阎志涛 <to...@tendcloud.com> on 2018/10/07 12:24:12 UTC, 1 replies.
- 答复: Executor hang - posted by 阎志涛 <to...@tendcloud.com> on 2018/10/07 22:21:51 UTC, 1 replies.
- Target java version not set when building spark with tags/v2.4.0-rc2 - posted by Shubham Chaurasia <sh...@gmail.com> on 2018/10/08 05:45:35 UTC, 0 replies.
- Error while upserting ElasticSearch from Spark 2.2 - posted by Deepak Sharma <de...@gmail.com> on 2018/10/08 09:13:43 UTC, 0 replies.
- td - posted by 冯 远森 <wi...@hotmail.com> on 2018/10/08 15:10:57 UTC, 0 replies.
- CSV parser - is there a way to find malformed csv record - posted by Nirav Patel <np...@xactlycorp.com> on 2018/10/08 18:57:09 UTC, 3 replies.
- 答复: 答复: Executor hang - posted by 阎志涛 <to...@tendcloud.com> on 2018/10/09 01:16:10 UTC, 0 replies.
- SparkR issue - posted by ayan guha <gu...@gmail.com> on 2018/10/09 06:20:45 UTC, 1 replies.
- DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state - posted by Shubham Chaurasia <sh...@gmail.com> on 2018/10/09 06:31:25 UTC, 7 replies.
- Internal Spark class is not registered by Kryo - posted by Lijun Cao <64...@qq.com> on 2018/10/09 11:22:03 UTC, 5 replies.
- Spark internal class is not registered by Kryo - posted by Lijun Cao <64...@qq.com> on 2018/10/09 11:25:53 UTC, 0 replies.
- Any way to see the size of the broadcast variable? - posted by V0lleyBallJunki3 <ve...@gmail.com> on 2018/10/09 15:44:04 UTC, 2 replies.
- Spark on YARN not utilizing all the YARN containers available - posted by kant kodali <ka...@gmail.com> on 2018/10/09 17:20:36 UTC, 6 replies.
- Does spark.streaming.concurrentJobs still exist? - posted by kant kodali <ka...@gmail.com> on 2018/10/09 18:50:37 UTC, 0 replies.
- Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063. - posted by zakhavan <za...@unm.edu> on 2018/10/09 19:46:10 UTC, 0 replies.
- [K8S] Option to keep the executor pods after job finishes - posted by Li Gao <li...@gmail.com> on 2018/10/09 21:17:10 UTC, 1 replies.
- PySpark Streaming : Accessing the Remote Secured Kafka - posted by "Ramaswamy, Muthuraman" <Mu...@viasat.com> on 2018/10/10 01:31:12 UTC, 0 replies.
- sparksql exception when using regexp_replace - posted by 付涛 <78...@qq.com> on 2018/10/10 08:57:36 UTC, 1 replies.
- Triangle Apache Spark Meetup - posted by Jean Georges Perrin <jg...@jgp.net> on 2018/10/10 09:54:13 UTC, 0 replies.
- Bad Message 413 Request Entity too large - Spark History UI through Knox - posted by Theyaa Matti <th...@gmail.com> on 2018/10/10 13:43:37 UTC, 0 replies.
- getBytes : save as pdf - posted by ☼ R Nair <ra...@gmail.com> on 2018/10/10 15:30:07 UTC, 1 replies.
- Process Million Binary Files - posted by Joel D <ga...@gmail.com> on 2018/10/10 21:56:04 UTC, 2 replies.
- re: yarn resource overcommit: cpu / vcores - posted by Peter Liu <pe...@gmail.com> on 2018/10/11 19:35:05 UTC, 0 replies.
- [Spark Structured Streaming] Running out of disk quota due to /work/tmp - posted by subramgr <su...@gmail.com> on 2018/10/11 20:30:04 UTC, 0 replies.
- Classic logistic regression missing !!! (Generalized linear models) - posted by pikufolgado <pi...@gmail.com> on 2018/10/11 22:46:03 UTC, 1 replies.
- Spark Structured Streaming resource contention / memory issue - posted by Patrick McGloin <mc...@gmail.com> on 2018/10/12 09:39:28 UTC, 3 replies.
- Timestamp Difference/operations - posted by Paras Agarwal <pa...@datametica.com> on 2018/10/12 14:01:03 UTC, 4 replies.
- Code review and Coding livestreams today - posted by Holden Karau <ho...@pigscanfly.ca> on 2018/10/12 16:10:16 UTC, 0 replies.
- SparkSQL read Hive transactional table - posted by daily <19...@qq.com> on 2018/10/13 05:37:27 UTC, 4 replies.
- Redeploying spark streaming application aborts because of checkpoint issue - posted by Kuttaiah Robin <ku...@gmail.com> on 2018/10/14 09:02:40 UTC, 0 replies.
- kerberos auth for MS SQL server jdbc driver - posted by Foster Langbein <fo...@riskfrontiers.com> on 2018/10/15 07:03:15 UTC, 4 replies.
- Support nested keys in DataFrameWriter.bucketBy - posted by Dávid Szakállas <da...@gmail.com> on 2018/10/15 13:58:50 UTC, 0 replies.
- Spark seems to think that a particular broadcast variable is large in size - posted by Venkat Dabri <ve...@gmail.com> on 2018/10/15 18:56:18 UTC, 4 replies.
- unsubscribe - posted by Vamshi Talla <va...@hotmail.com> on 2018/10/15 23:26:22 UTC, 0 replies.
- SocketTimeoutException with spark-r and using latest R version - posted by Thijs Haarhuis <th...@oranggo.com> on 2018/10/16 06:56:37 UTC, 0 replies.
- Pyspark Window orderBy - posted by mhussain <mu...@gmail.com> on 2018/10/16 09:10:41 UTC, 0 replies.
- Re: [External Sender] Pyspark Window orderBy - posted by Femi Anthony <ol...@capitalone.com> on 2018/10/16 12:48:44 UTC, 1 replies.
- Cached data not showing up in Storage tab - posted by Venkat Dabri <ve...@gmail.com> on 2018/10/16 15:21:56 UTC, 0 replies.
- [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs - posted by Patrick Brown <pa...@gmail.com> on 2018/10/16 16:33:40 UTC, 8 replies.
- Application crashes when encountering oracle timestamp - posted by rishmanisation <ri...@gmail.com> on 2018/10/16 23:15:27 UTC, 0 replies.
- Writing dataframe to vertica - posted by Nikhil Goyal <no...@gmail.com> on 2018/10/16 23:23:29 UTC, 0 replies.
- 回复: SparkSQL read Hive transactional table - posted by daily <as...@foxmail.com> on 2018/10/17 00:41:20 UTC, 0 replies.
- What eactly is Function shipping? - posted by kant kodali <ka...@gmail.com> on 2018/10/17 03:06:01 UTC, 0 replies.
- Re: [External Sender] Writing dataframe to vertica - posted by Femi Anthony <ol...@capitalone.com> on 2018/10/17 04:49:14 UTC, 1 replies.
- Spark In Memory Shuffle - posted by thomas lavocat <th...@univ-grenoble-alpes.fr> on 2018/10/17 13:11:32 UTC, 5 replies.
- FW: Pyspark: set Orc Stripe.size on dataframe writer issue - posted by "Somasundara, Ashwin" <As...@fmr.com.INVALID> on 2018/10/17 13:13:52 UTC, 0 replies.
- [PySpark SQL]: SparkConf does not exist in the JVM - posted by takao <ta...@focaldata.com> on 2018/10/17 16:38:13 UTC, 0 replies.
- performance of IN clause - posted by Jayesh Lalwani <ja...@capitalone.com> on 2018/10/17 21:03:03 UTC, 1 replies.
- FP-Growth clarification for Market Basket Analysis - posted by aditipatel <ad...@hotwaxsystems.com> on 2018/10/18 06:30:07 UTC, 0 replies.
- Encoding issue reading text file - posted by Masf <ma...@gmail.com> on 2018/10/18 14:33:15 UTC, 0 replies.
- Re: Spark In Memory Shuffle / 5403 - posted by Peter Liu <pe...@gmail.com> on 2018/10/18 15:07:20 UTC, 4 replies.
- Mean over window with minimum number of rows - posted by Sumona Routh <su...@gmail.com> on 2018/10/18 17:59:45 UTC, 0 replies.
- [Spark-GraphX] Conductance, Bridge Ratio & Diameter - posted by Thodoris Zois <th...@gmail.com> on 2018/10/18 21:24:09 UTC, 0 replies.
- [Spark for kubernetes] Azure Blob Storage credentials issue - posted by Oscar Bonilla <os...@gmail.com> on 2018/10/19 08:02:40 UTC, 1 replies.
- Spark 2.3.2 : No of active tasks vastly exceeds total no of executor cores - posted by Shing Hing Man <ma...@yahoo.com.INVALID> on 2018/10/19 23:41:36 UTC, 3 replies.
- [Spark streaming] - posted by Leigh Stewart <ag...@gmail.com> on 2018/10/21 20:12:58 UTC, 0 replies.
- Structured Streaming: stream-stream join with several equality conditions in a disjunction - posted by WILSON Frank <Fr...@uk.thalesgroup.com> on 2018/10/22 11:00:56 UTC, 0 replies.
- Writing to vertica from spark - posted by Nikhil Goyal <no...@gmail.com> on 2018/10/22 18:48:29 UTC, 1 replies.
- Triggering sql on Was S3 via Apache Spark - posted by Om...@sony.com on 2018/10/23 07:53:15 UTC, 7 replies.
- Re: ALS block settings - posted by evanzamir <za...@gmail.com> on 2018/10/23 18:56:02 UTC, 0 replies.
- 1 - posted by twinmegami <tw...@gmail.com> on 2018/10/24 09:11:16 UTC, 0 replies.
- How to write DataFrame to single parquet file instead of multiple files under a folder in spark? - posted by mithril <tw...@gmail.com> on 2018/10/24 09:18:25 UTC, 0 replies.
- CVE-2018-11804: Apache Spark build/mvn runs zinc, and can expose information from build machines - posted by Sean Owen <sr...@apache.org> on 2018/10/24 16:28:35 UTC, 0 replies.
- Re: Watermarking without aggregation with Structured Streaming - posted by sanjay_awat <sa...@yahoo.com> on 2018/10/24 17:55:53 UTC, 2 replies.
- Error - Dropping SparkListenerEvent because no remaining room in event queue - posted by karan alang <ka...@gmail.com> on 2018/10/24 22:57:13 UTC, 2 replies.
- Does Spark have a plan to move away from sun.misc.Unsafe? - posted by kant kodali <ka...@gmail.com> on 2018/10/25 00:07:14 UTC, 1 replies.
- Having access to spark results - posted by Affan Syed <as...@an10.io> on 2018/10/25 07:28:43 UTC, 1 replies.
- Re: [External Sender] Having access to spark results - posted by Femi Anthony <ol...@capitalone.com> on 2018/10/25 07:34:03 UTC, 1 replies.
- Spark SQL Error - posted by Sai Kiran Kodukula <sa...@gmail.com> on 2018/10/25 14:36:14 UTC, 0 replies.
- External shuffle service on K8S - posted by 曹礼俊 <ca...@gmail.com> on 2018/10/26 09:14:52 UTC, 4 replies.
- [PySpark] Sharing testing library and requesting feedback - posted by Matt Hagy <ma...@liveramp.com> on 2018/10/26 13:31:34 UTC, 0 replies.
- conflicting version question - posted by Nathan Kronenfeld <nk...@uncharted.software> on 2018/10/26 13:44:48 UTC, 2 replies.
- java vs scala for Apache Spark - is there a performance difference ? - posted by karan alang <ka...@gmail.com> on 2018/10/26 22:04:11 UTC, 6 replies.
- Is spark not good for ingesting into updatable databases? - posted by ravidspark <ra...@gmail.com> on 2018/10/27 00:10:01 UTC, 3 replies.
- SIGBUS (0xa) when using DataFrameWriter.insertInto - posted by alexzautke <al...@googlemail.com> on 2018/10/27 15:52:00 UTC, 3 replies.
- structured streaming bookkeeping formats - posted by Koert Kuipers <ko...@tresata.com> on 2018/10/27 19:28:29 UTC, 0 replies.
- Processing Flexibility Between RDD and Dataframe API - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/10/28 14:50:04 UTC, 3 replies.
- Number of rows divided by rowsPerBlock cannot exceed maximum integer - posted by Soheil Pourbafrani <so...@gmail.com> on 2018/10/28 20:44:31 UTC, 0 replies.
- [GraphX] - OOM Java Heap Space - posted by Thodoris Zois <th...@gmail.com> on 2018/10/28 20:54:00 UTC, 0 replies.
- [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster? - posted by "Zhang, Yuqi" <Yu...@Teradata.com> on 2018/10/29 01:29:22 UTC, 7 replies.
- dremel paper example schema - posted by lchorbadjiev <lu...@gmail.com> on 2018/10/29 13:25:08 UTC, 8 replies.
- Java Spark to Python spark integration - posted by Manohar Rao <ma...@gmail.com> on 2018/10/30 09:05:22 UTC, 0 replies.
- unsubsribe - posted by Mohan Palavancha <mo...@gmail.com> on 2018/10/30 09:57:03 UTC, 3 replies.
- Event Hubs properties kvp-value adds " to strings - posted by Magnus Nilsson <ma...@kth.se> on 2018/10/31 10:10:04 UTC, 0 replies.
- Iterator of KeyValueGroupedDataset.flatMapGroupsWithState function - posted by "Antonio Murgia - antonio.murgia2@studio.unibo.it" <an...@studio.unibo.it> on 2018/10/31 10:43:49 UTC, 1 replies.
- I want run deep neural network on Spark - posted by hager <lo...@yahoo.com> on 2018/10/31 14:09:26 UTC, 2 replies.
- Apache Spark orc read performance when reading large number of small files - posted by gpatcham <gp...@gmail.com> on 2018/10/31 17:23:47 UTC, 2 replies.